CN115145253A

CN115145253A - End-to-end automatic driving method and system and training method of automatic driving model

Info

Publication number: CN115145253A
Application number: CN202110279719.3A
Authority: CN
Inventors: 陈林昱; 闫春香; 王玉龙
Original assignee: Guangzhou Automobile Group Co Ltd
Current assignee: Guangzhou Automobile Group Co Ltd
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2022-10-04

Abstract

The invention discloses a training method of an end-to-end automatic driving model, which comprises the following steps: step S10, collecting laser radar data through a laser radar arranged on a vehicle, and obtaining real-time driving data and a timestamp of the vehicle; s11, aligning the acquired laser radar data and the real-time driving data according to the time stamp; performing coordinate conversion on the laser radar data, and adjusting the laser radar data into a grid map of an overhead projection with a preset size; step S12, forming a training set by combining the grid map data, the real-time data and the map navigation data, inputting the training set into a selected deep neural network for training, and obtaining a trained end-to-end automatic driving model; and S13, testing and adjusting the end-to-end automatic driving model in a test set and a real vehicle. The invention also discloses an end-to-end automatic driving method and system. The invention can realize end-to-end automatic driving and has high safety and universality.

Description

End-to-end automatic driving method and system and training method of automatic driving model

Technical Field

The invention relates to the technical field of automatic driving, in particular to an end-to-end automatic driving method and system and an automatic driving model training method.

Background

The deep learning method is more and more accepted and valued by academic circles, and is also more applied to the field of automatic driving. The end-to-end deep learning method is proved to be effective, and the advantage is that better automatic driving behavior can be obtained through learning a large amount of data.

However, in the existing end-to-end training method, a picture shot by a front-view camera is generally used as input, and the method has the defects that because too many elements are contained in an original picture, the extraction is difficult only by depending on a network, the application effect of end-to-end is greatly influenced by the installation positions of different vehicle cameras, a neural network needs to be designed and trained separately for different vehicle types, and the universality is poor.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide an end-to-end automatic driving method, system and training method of automatic driving model, which can realize end-to-end automatic driving and have high safety and universality.

To solve the above technical problem, as an aspect of the present invention, a method for training an end-to-end autopilot model is provided, which includes the following steps:

step S10, collecting laser radar data through a laser radar arranged on a vehicle, and obtaining real-time driving data and a timestamp of the vehicle; the real-time driving data at least comprises: steering wheel angle, vehicle speed, accelerator pedal depth, and brake pedal depth data;

in a step S11, the process is carried out, the collected laser radar data and the real-time driving data are synchronized according to the time stamps; performing coordinate conversion on the laser radar data, adjusting the laser radar data into a grid map with a predetermined size of overhead projection, and filling at least height information of each point into each grid of the grid map;

step S12, forming a training set by combining the grid map data, the real-time data and the map navigation data, inputting the training set into a selected deep neural network for training, and obtaining a trained end-to-end automatic driving model;

and S13, testing and adjusting the end-to-end automatic driving model in a test set and a real vehicle.

Wherein the step S11 further includes:

step S110, carrying out down-sampling processing on point cloud data acquired by a laser radar, dividing the point cloud data into a plurality of cubes with preset sizes, calculating the centroids of all points in each cube, and replacing all points in each cube with the centroids;

step S111, projecting the point cloud data subjected to down-sampling processing onto an XY plane, comparing the distance between the front and rear rays in the X direction with a preset threshold value to determine whether the point cloud data is a road point, and removing road point information;

step S112, carrying out coordinate conversion on the point cloud data without the road surface point information to form a bird' S-eye view image; and screening out an interested area in the aerial view image, filling height value data, and forming a grid map of an overhead projection with a preset size.

Wherein, in said step S112, the height value data is filled in the following manner:

and acquiring height value data in the laser radar data, normalizing the height value data to be between 0 and 255, rounding the normalized result, and filling the normalized result into each grid.

Wherein the step S12 further includes:

inputting the grid map into a deep neural network, wherein the deep neural network is a ResNet50 CNN network;

cutting and extracting a navigation interface image to obtain a navigation path map image with a preset size, inputting the navigation path map image into the deep neural network, performing convolution processing on three layers of 3 × 3 CNN convolution kernels in the navigation path map image, and expanding a convolution result to obtain a one-dimensional high-dimensional feature;

inputting the depth data of the accelerator pedal and the depth data of the brake pedal into the deep neural network, converting the depth data of the accelerator pedal and the depth data of the brake pedal through one-hot coding, combining the depth data of the accelerator pedal and the depth data of the brake pedal with one-dimensional high-dimensional characteristics corresponding to the navigation path diagram, and commonly accessing the depth data of the accelerator pedal and the depth data of the brake pedal into the network in a full connection layer of ResNet 50; the labels trained by the deep neural network are steering wheel angles and vehicle speed information;

and repeating the training process for the selected training set to obtain a trained end-to-end automatic driving model.

Accordingly, as another aspect of the present invention, there is also provided an end-to-end automatic driving method, comprising the steps of:

step S20, collecting real-time laser radar data through a laser radar arranged on a vehicle;

step S21, carrying out coordinate conversion on laser radar data, adjusting the laser radar data into a grid map with a preset size of overhead projection, and filling at least height information of each point into each grid of the grid map;

step S22, inputting the grid map data and the navigation data into a pre-trained end-to-end automatic driving model, and outputting the predicted current steering wheel angle and vehicle speed information;

and S23, controlling the automatic driving of the vehicle according to the predicted current steering wheel angle and the vehicle speed information.

Wherein the step S21 further comprises:

step S210, down-sampling point cloud data acquired by a laser radar, dividing the point cloud data into a plurality of cubes with preset sizes, calculating the centroids of all points in each cube, and replacing all points in each cube with the centroids;

step S211, projecting the point cloud data subjected to down-sampling processing onto an XY plane, comparing the distance of the front and back rays in the X direction with a preset threshold value to determine whether the point cloud data is a road point, and removing road point information;

step S212, carrying out coordinate conversion on the point cloud data without the road surface point information to form an aerial view image; and screening out an interested area in the aerial view image, filling height value data, and forming a grid map of an overhead projection with a preset size.

Wherein, in said step S212, the height value data is filled in the following way:

Accordingly, as a further aspect of the present invention, there is also provided an end-to-end autopilot system, comprising:

the radar data acquisition module is used for acquiring real-time laser radar data through a laser radar arranged on a vehicle;

the grid map obtaining module is used for carrying out coordinate conversion on the laser radar data, adjusting the laser radar data into a grid map with a preset size of overhead projection, and filling at least height information of each point into each grid of the grid map;

the learning processing module is used for inputting the grid map data and the navigation data into a pre-trained end-to-end automatic driving model and outputting the predicted current steering wheel angle and vehicle speed information;

and the automatic driving control unit is used for controlling the automatic driving of the vehicle according to the predicted current steering wheel angle and the vehicle speed information.

Wherein the grid map obtaining module further comprises:

the sampling processing unit is used for carrying out down-sampling processing on the point cloud data acquired by the laser radar, dividing the point cloud data into a plurality of cubes with preset sizes, calculating the centroids of all points in each cube, and replacing all points in each cube with the centroids;

the road surface point removing unit is used for projecting the point cloud data subjected to down-sampling processing onto an XY plane, then comparing the distance of the front ray and the rear ray in the X direction with a preset threshold value to determine whether the point cloud data is a road surface point or not, and removing road surface point information;

the conversion processing unit is used for carrying out coordinate conversion on the point cloud data without the road surface point information to form an aerial view image; and screening out an interested area in the aerial view image, filling height value data, and forming a grid map of an overhead projection with a preset size.

Wherein the conversion processing unit further comprises:

and the filling unit is used for obtaining height value data in the laser radar data, normalizing the height value data to be between 0 and 255, rounding the normalization result and filling the normalization result into each grid.

The embodiment of the invention has the following beneficial effects:

the invention provides an end-to-end automatic driving method and system and an automatic driving model training method. The method comprises the steps of scanning by a laser radar, converting projection into an overlooked two-dimensional grid map, outputting a predicted steering wheel corner and speed by combining other input conditions (an accelerator pedal, a brake pedal and the like) through an end-to-end deep learning neural network on the basis of the grid map, completing an automatic driving task, having good safety, improving the classification prediction accuracy of a model and reducing the training difficulty;

in the embodiment of the invention, the grid map is used as input, which is equivalent to that the environmental characteristics of the vehicle are extracted in advance and are directly transmitted to the neural network in a higher latitude form, so that the model can be better trained to achieve the driving effect, and the grid map is a top view, so that the difference of vehicle consistency can be eliminated, and the data of different vehicles can be easily applied to training data; thereby improving its versatility.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is within the scope of the present invention for those skilled in the art to obtain other drawings based on the drawings without inventive exercise.

FIG. 1 is a schematic main flow chart illustrating an embodiment of a method for training an end-to-end autopilot model according to the present invention;

FIG. 2 is a schematic diagram of the present invention in plan view with ground points removed;

FIG. 3 is a schematic flow chart illustrating an embodiment of an end-to-end autopilot method provided by the present invention;

FIG. 4 is a schematic structural diagram of an embodiment of an end-to-end autopilot system provided by the present invention;

FIG. 5 is a schematic diagram of a grid map obtaining module shown in FIG. 4;

fig. 6 is a schematic structural diagram of the conversion processing unit in fig. 5.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.

Fig. 1 is a schematic main flow chart illustrating an embodiment of a method for training an end-to-end autopilot model according to the present invention. Referring to fig. 2 together, in this embodiment, the method for training an end-to-end autopilot model includes the following steps:

step S10, collecting laser radar data through a laser radar (such as a 64-line laser radar) arranged on a vehicle, and obtaining real-time driving data and a timestamp of the vehicle; the real-time driving data at least comprises: steering wheel angle, vehicle speed, accelerator pedal depth, and brake pedal depth data; it is understood that, in the case of the laser radar data being point cloud data, each point includes three-dimensional coordinate information ((X, Y, Z coordinate information), and in some cases, color information, reflection intensity information, echo number information, and the like.

S11, aligning the collected laser radar data and the real-time driving data according to the time stamp; performing coordinate conversion on the laser radar data, adjusting the laser radar data into a grid map with a predetermined size of overhead projection, and filling at least height information of each point in each grid of the grid map;

in a specific example, the step S11 further includes:

step S110, carrying out down-sampling processing on the point cloud data acquired by the laser radar, dividing the point cloud data into a plurality of cubes with preset sizes (such as 20cm x 20cm), calculating centroids of all points in each cube, and replacing all points in each cube with the centroids;

it can be understood that, because the amount of point cloud data transmitted by 64-line lidar at a time is huge, if the point cloud data is directly processed, 60000 points need to be processed, and computing resources are consumed. The down-sampling method adopted in the embodiment of the invention is as follows: a20cm-20cm (parameters can be adjusted according to the situation) cube is set in the space, the centroids of all points in the cube are calculated, and then the centroids are used for replacing the points in the cube. By using the down-sampling method, the number of points of the laser radar can be reduced to 1/5 of the original number, and the using effect is not influenced.

it can be understood that ground data cleaning is required because the point cloud data contains a large amount of ground point data, and the ground information is an interference item for subsequent neural network training. As shown in fig. 2, the ground segmentation method is as follows: and projecting the point cloud onto an XY plane, and then judging whether the point cloud is a road surface point by comparing the distances of the front and back rays in the X direction. As can be seen from fig. 2, if projected onto an object, the distance in the X direction of the two X rays will become small (e.g., the top two rays, which are projected onto an object). By setting a comparison threshold, the ground segmentation of point cloud data can be completed, and the ground point data is removed, so that the data volume is reduced.

It can be understood that the point cloud coordinates are based on a vehicle coordinate system, and the origin of coordinates of a general aerial view image is in the upper left corner, so that coordinate transformation is required; meanwhile, the sensing distance of the laser radar is far, the wide part can have a scanning distance of hundreds of meters, and an interested area needs to be screened out according to project requirements;

also, in some examples, the height value data may be populated in the following manner:

In addition, the converted images can be visualized, the accuracy of top view projection is checked, the images are uniformly adjusted to 512x512, and the training processing of a subsequent neural network is facilitated.

in a specific example, the step S12 further includes:

end-to-end automatic driving, because the output is directly the angle of the steering wheel and the speed of the vehicle, in order to solve the problem of where the vehicle is driving, the navigation information needs to be simultaneously accessed into the neural network, and a better navigation guidance effect can be carried out. Cutting and extracting a navigation interface image to obtain a navigation path map image with a preset size (such as 64x 64), inputting the navigation path map image into the deep neural network, performing convolution processing on three layers of 3 x 3 CNN convolution kernels in the deep neural network, and unfolding a convolution result through flatten to obtain a one-dimensional high-dimensional feature;

inputting the depth data of the accelerator pedal and the depth data of the brake pedal into the deep neural network, performing one-hot code conversion, merging (such as concat merging) with the one-dimensional high-dimensional characteristics corresponding to the navigation path diagram, and commonly accessing the networks in a full connection layer of ResNet 50; wherein, the Label (Label) trained by the deep neural network is the information of the steering wheel angle and the vehicle speed;

The grid map processed by the method has better training advantages, because the scanning result of the laser radar is kept to the greatest extent in the processing process, the identification effect is not relied on, the ground second is removed, the interference information is effectively removed, the input grid information is more effective, the information around the vehicle can be completely included in the generation of the overhead view, and the method has more comprehensive image range compared with the traditional camera image identification.

In the embodiment of the invention, the one-hot coding is adopted, the data of the depth of the accelerator pedal and the depth of the brake pedal of the vehicle are input, the information of the network can be better optimized at a higher latitude, more vehicle related information is provided for the network, and the neural network can be helped to obtain a better learning effect.

In the embodiment of the invention, the traditional picture input is changed into the grid map input. The grid map is also a form of a picture, and the basic training mode is also consistent, but compared with picture input, the grid map is converted from laser radar data, has more clear information and clear distance information, and eliminates the requirement on vehicle consistency after being converted into a top view.

As shown in fig. 3, a main flow diagram of an embodiment of an end-to-end automatic driving method provided by the present invention is shown. In this embodiment, the end-to-end automatic driving method includes the following steps:

step S21, performing coordinate conversion on the laser radar data, adjusting the laser radar data into a grid map with a preset size of overhead projection, and filling at least height information of each point into each grid of the grid map;

in a specific example, the step S21 further includes:

step S210, carrying out down-sampling processing on point cloud data acquired by a laser radar, dividing the point cloud data into a plurality of cubes with preset sizes, calculating the centroids of all points in each cube, and replacing all points in each cube with the centroids;

step S212, performing coordinate conversion on the point cloud data without the road surface point information to form a bird' S-eye view image; and screening out an interested area in the aerial view image, filling height value data, and forming a grid map of an overhead projection with a preset size.

Step S22, inputting the grid map data and the navigation data into an end-to-end automatic driving model which is trained in advance by adopting the method described in the figure 1, and outputting the predicted current steering wheel angle and vehicle speed information;

For more details, reference may be made to the foregoing description of fig. 1, which is not repeated herein.

It can be understood that in the embodiment of the invention, the traditional end-to-end automatic driving method for picture input is improved, and the input picture is replaced by the grid map with more clear and obvious information. The grid map is also a form of a picture, and the difference from the conventional picture is that three channels are changed into one channel. Through the training of supervised learning, the deep neural network continuously adjusts free parameters (in a back propagation mode) in the training to finally obtain a trained end-to-end model, the model can automatically output the steering wheel angle and the vehicle speed according to the input of a grid map, and the automatic driving behavior of the vehicle is controlled through a specific actuator.

Fig. 4 is a schematic structural diagram illustrating an embodiment of an end-to-end autopilot system according to the present invention. As shown in fig. 5 and 6, in the present embodiment, the end-to-end autopilot system 1 includes:

a radar data acquisition module 10 for acquiring real-time lidar data by a lidar disposed on a vehicle;

the grid map obtaining module 11 is configured to perform coordinate conversion on the laser radar data, adjust the laser radar data into a grid map of an overhead projection with a predetermined size, and at least fill height information of each point in each grid of the grid map;

the learning processing module 12 is configured to input the grid map data and the navigation data into a pre-trained end-to-end automatic driving model, and output a predicted current steering wheel angle and vehicle speed information;

and the automatic driving control unit 13 is used for controlling the automatic driving of the vehicle according to the predicted current steering wheel angle and the vehicle speed information.

As shown in fig. 5, in a specific example of the present invention, the grid map obtaining module 11 further includes:

a sampling processing unit 110, configured to perform down-sampling processing on point cloud data acquired by a laser radar, divide the point cloud data into a plurality of cubes with predetermined sizes, calculate centroids of all points in each cube, and then replace all points in each cube with the centroids;

a road surface point removing unit 111 for projecting the point cloud data subjected to the down-sampling processing onto an XY plane, then comparing the distance in the X direction between the front and rear rays with a predetermined threshold to determine whether the point cloud data is a road surface point, and removing road surface point information;

a conversion processing unit 112, configured to perform coordinate conversion on the point cloud data from which the road surface point information is removed, so as to form an aerial view image; and screening out an interested area in the aerial view image, filling height value data, and forming a grid map of an overhead projection with a preset size.

As shown in fig. 6, in a specific example of the present invention, the conversion processing unit 112 further includes:

and a filling unit 113 for obtaining the height value data in the laser radar data, normalizing the height value data to be between 0 and 255, rounding the normalization result, and filling into each grid.

For more details, reference may be made to the foregoing description of fig. 3, which is not repeated herein.

The embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, the grid map is used as input, which is equivalent to that the environmental characteristics of the vehicle are extracted in advance, and the environmental characteristics are directly transmitted to the neural network in a higher latitude form, so that a model can be trained better to achieve a driving effect, and the grid map is a top view, so that the difference of vehicle consistency can be eliminated, and the data of different vehicles can be easily applied to training data; thereby improving its versatility.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method for training an end-to-end automatic driving model is characterized by comprising the following steps:

2. The method of claim 1, wherein the step S11 further comprises:

3. The method of claim 2, characterized in that in said step S112, the height value data is filled in the following way:

4. The method according to any of claims 1 to 3, wherein said step S12 further comprises:

step S120, inputting the grid map into a deep neural network, wherein the deep neural network is a ResNet50 CNN network;

step S121, cutting and extracting a navigation interface image to obtain a navigation path map image with a preset size, inputting the navigation path map image into the deep neural network, performing convolution processing on three layers of 3 x 3 CNN convolution kernels in the navigation path map image, and expanding a convolution result to obtain a one-dimensional high-dimensional feature;

step S122, inputting the depth data of the accelerator pedal and the depth data of the brake pedal into the deep neural network, merging the data with the one-dimensional high-dimensional characteristics corresponding to the navigation path map through one-hot code conversion, and commonly accessing the data into the network in the full connection layer of the ResNet 50; the labels of the deep neural network training are steering wheel angle and vehicle speed information;

and step S123, repeating the training process for the selected training set to obtain a trained end-to-end automatic driving model.

5. An end-to-end autopilot method, comprising the steps of:

step S22, inputting the grid map data and the navigation data into an end-to-end automatic driving model which is trained in advance by adopting the method of any one of claims 1 to 4, and outputting the predicted current steering wheel angle and vehicle speed information;

6. The method of claim 5, wherein the step S21 further comprises:

step S212, performing coordinate conversion on the point cloud data without the road surface point information to form a bird' S-eye view image; and screening out an interested area from the bird's-eye view image, filling height value data, and forming a grid map of top projection with a preset size.

7. The method according to claim 6, characterized in that in said step S212, the height value data is filled in the following way:

8. An end-to-end autopilot system, comprising:

the grid map obtaining module is used for performing coordinate conversion on the laser radar data, adjusting the laser radar data into a grid map with a preset size of overhead projection, and filling at least height information of each point into each grid of the grid map;

9. The system of claim 8, wherein the grid map obtaining module further comprises:

10. The system of claim 9, wherein the conversion processing unit further comprises:

and the filling unit is used for obtaining the height value data in the laser radar data, normalizing the height value data to be between 0 and 255, rounding the normalization result and filling the normalization result into each grid.