CN111582049A

CN111582049A - ROS-based self-built unmanned vehicle end-to-end automatic driving method

Info

Publication number: CN111582049A
Application number: CN202010299263.2A
Authority: CN
Inventors: 罗韬; 徐桐; 于瑞国; 徐天一; 赵满坤; 王建荣; 冯定文
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-04-16
Filing date: 2020-04-16
Publication date: 2020-08-25

Abstract

The invention discloses an end-to-end automatic driving method based on ROS self-built unmanned vehicles, which comprises the following steps: data acquisition is carried out through a deployed camera, and a training data set is expanded in a data augmentation mode; training an end-to-end automatic driving model of the deep neural network; introducing a deep separable convolution, and comparing the improved model with an algorithm of an original SSD model without the deep separable convolution; discarding the maximum feature map output in a multistage feature detection layer group of the deep neural network, and comparing the maximum feature map with the original SSD on the basis of the maximum feature map; adjusting the number of prior frames pre-configured for each convolution layer in the multi-level characteristic detection layer group, and adjusting the size proportion parameter of the minimum prior frame; and deploying the trained network to an unmanned vehicle experiment platform for testing. The invention improves the overall perception capability of the model to the indoor environment by increasing the receptive field of the shallow network under the condition of ensuring to meet the energy consumption requirement of the low-computation-force platform.

Description

ROS-based self-built unmanned vehicle end-to-end automatic driving method

Technical Field

The invention relates to the field of target detection technology and end-to-end automatic driving, in particular to an end-to-end automatic driving method for a self-built unmanned vehicle based on ROS (robot operating system).

Background

He Kaiming et al propose a deep residual error network for image recognition, which introduces a "residual error module" in a deep convolutional neural network for the first time, and each residual error module performs a correlation operation on original input information, and then simultaneously superimposes the original input information and an output obtained after the operation to obtain a final output of the whole residual error module. Experiments prove that the introduction of the residual error module can effectively inhibit the gradient dispersion problem caused by network deepening. Compared with the prior deep convolutional neural network, the deep residual error network improves the depth of the network, but has smaller parameters, thereby having excellent performance and higher training efficiency.

To achieve the final goal of autonomous driving, it is first necessary for an autonomous vehicle to be able to perceive the surrounding environment like a human being and to quickly respond. Wherein, sensor and hardware equipment that autopilot system is commonly used include: cameras, lidar, electromagnetic wave radar, odometers, accelerometers, global positioning systems, and inertial navigation units. The sensors and the hardware equipment form the whole hardware system of the automatic driving and the bottom foundation of each functional module.

Currently, the interior of an autopilot system is generally divided into two subsystems according to functions: a perception subsystem and a decision subsystem. The sensing subsystem is responsible for fusing data acquired by the various sensors and hardware equipment, and forms an internal modeling of the surrounding environment in the system by combining various currently known information, and is divided into the following 3 functional modules: the system comprises a positioning module, a map module and a target detection and tracking module. After the perception subsystem establishes an internal model of the surrounding environment, the decision subsystem correctly and rapidly responds according to the requirements of a driver, the current road conditions, the vehicle conditions and the requirements of relevant laws and regulations to ensure that the vehicle can run safely and stably, and the perception subsystem is divided into the following 5 modules: the device comprises a path planning module, a behavior selection module, a motion planning module, an obstacle avoidance module and a control module.

The amount of data and the amount of computational consumption generated by an automatic driving system are huge, and the current solution to the problem is to perform partial simulation tests by building a small-size model car. However, volume and size limitations may result in performance compromises, such as reducing the overall computing power of the system. The method based on the deep neural network is most commonly used for target detection and tracking, the requirement on the hardware computing power is high, and the accuracy, speed and delay of detection and identification are influenced by the system computing power. Therefore, in the case of computational compromise, the real-time frame rate of detection of these algorithms will be further reduced. Meanwhile, automatic driving has very high requirements on the overall architecture design, module design and the like of the software system. This means that research efforts require the breakdown of the autopilot task into individual small problems such as target detection, path planning, dynamic obstacle avoidance, etc. These subproblems add technical difficulty and coordination, and the current technology can achieve optimal results for a single subproblem, but cannot necessarily ensure global optimization after the final results are combined.

Disclosure of Invention

The invention provides an end-to-end automatic driving method of a self-built unmanned vehicle based on ROS, and provides a light-weight deep neural network model for expanding a receptive field, under the condition that the energy consumption requirement of a low computational platform is met, the whole perception capability of the model to an indoor environment is improved by increasing the receptive field of a shallow network, which is described in detail as follows:

an end-to-end automatic driving method based on ROS self-built unmanned vehicles, the method comprising:

data acquisition is carried out through a deployed camera, and a training data set is expanded in a data augmentation mode;

training an end-to-end automatic driving model of the deep neural network;

introducing a deep separable convolution, and comparing the improved model with an algorithm of an original SSD model without the deep separable convolution;

discarding the maximum feature map output in a multistage feature detection layer group of the deep neural network, and comparing the maximum feature map with the original SSD on the basis of the maximum feature map;

adjusting the number of prior frames pre-configured for each convolution layer in the multi-level characteristic detection layer group, and adjusting the size proportion parameter of the minimum prior frame; and deploying the trained network to an unmanned vehicle experiment platform for testing.

The training of the end-to-end automatic driving model of the deep neural network specifically comprises the following steps:

the deep neural network reads the color image of the training data set and outputs a prediction steering value; calculating to obtain an error by using the real steering value and the predicted steering value; and the error is reversely transmitted back to the deep neural network through a back propagation algorithm so as to update each weight value in the neural network, and the next prediction of the network is expected to be closer to the true value.

Further, the introducing of the depth separable convolution specifically includes:

the standard convolution layer is replaced by two separated depth convolution layers and point-by-point convolution layers, and the convolution layers of the feature extraction part and the multi-stage feature detection layer group part are improved, so that the parameter quantity is reduced.

The improvement on the convolution layers of the feature extraction part and the multi-stage feature detection layer group part is specifically as follows:

respectively convolving three channels of the input image by using three convolution kernels of 3 x 3 to obtain feature maps corresponding to the three channels;

then using 64 point-by-point convolution checks of 1 × 1 to perform convolution operation on the three characteristic graphs to obtain 64 output characteristic graphs;

the Conv2 is replaced with the depth convolution layer Conv2_ dw and the dwell convolution layer Conv2_ pw, and finally the pooling operation with pooling kernel size of 2 × 2 and step size of 2 is performed.

The technical scheme provided by the invention has the beneficial effects that:

1. the invention explores that under the condition that the overall performance, particularly the computing power is limited, a deep convolutional neural network is adopted as a core algorithm to carry out target detection and identification; the invention improves the network structure of the prior SSD (single-emitting multi-box detector) algorithm, limits the types of objects to be identified to several types of objects which are common in the driving process of an automatic driving automobile, and detects a trained model on a self-built small unmanned automobile through the improvement;

2. the invention simultaneously adjusts the detection capability of the model for extremely small objects, and preferentially ensures the identification and detection effects for common large-size objects such as pedestrians, traffic signs and the like in the driving process of the automobile;

3. the present invention explores the possibility of end-to-end autopilot using a single deep convolutional neural network in a particular scenario. Since deep convolutional neural networks are typically used for image classification problems. The invention tries to modify the deep convolutional neural network and then transplants the modified deep convolutional neural network to a self-built small unmanned vehicle, and tests whether the deep convolutional neural network can complete the real-time end-to-end automatic driving task under a specific scene under the condition of limited calculation capacity.

Drawings

FIG. 1 is a schematic diagram of an end-to-end autopilot system architecture;

fig. 2 is a schematic diagram of the terrain of the experimental site.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.

Example 1

An end-to-end automatic driving method based on ROS self-built unmanned vehicle, referring to FIG. 1, the method comprises:

step 101: data acquisition is carried out through a deployed camera, and the vehicle is artificially controlled to generate deviation through a handle during acquisition and is corrected;

step 102: expanding a training data set in a data augmentation mode;

step 103: training an end-to-end automatic driving model of the deep neural network;

step 104: introducing a deep separable convolution, and comparing the improved model with an algorithm of an original SSD (single-shot multi-box detector) model without the deep separable convolution;

step 105: discarding the maximum feature map output in a multistage feature detection layer group of the deep neural network, and comparing the maximum feature map with the original SSD on the basis of the maximum feature map;

step 106: adjusting the number of prior frames pre-configured for each convolution layer in the multi-level characteristic detection layer group, adjusting the size proportion parameter of the minimum prior frame at the same time, and comparing the size proportion parameter with the original model on the basis of the size proportion parameter;

step 107: and deploying the trained network to the unmanned vehicle experiment platform for testing.

In one embodiment, step 101 is to collect data through a deployed camera, and manually control the vehicle to generate offset and correct the offset through a handle during collection, and the specific steps are as follows:

data acquisition was performed using a color image recording rate of 10 frames per second. In order to ensure the synchronization of the handle data and the image data in time during the recording process, the embodiment synchronizes the handle and the camera thread in a critical zone manner. 3 cameras are installed in front of the vehicle based on the unmanned system (for example, model number DAVE-2 system), wherein the cameras in the front left and right are offset cameras. When data are collected, the vehicle is manually controlled through the handle to generate deviation and correct the deviation, the collected data are processed, and the steering value in the deviation generation stage is set as the opposite number of the steering value.

In one embodiment, step 102 expands the training data set by data augmentation, which includes the following steps:

the training data set is expanded by carrying out data amplification through four modes of horizontal turning, color dithering, gray level processing and fuzzy processing, so that overfitting is avoided, and the accuracy of the algorithm is improved.

In one embodiment, step 103 performs end-to-end automatic driving model training of the deep neural network, and includes the following specific steps:

In one embodiment, step 104 introduces a depth separable convolution as follows:

separable convolution is introduced, and the standard convolutional layer is replaced by two separated convolutional layers, one is a depth convolutional layer, and the other is a point-by-point convolutional layer. Improving convolution layers of a feature extraction part and a multi-level feature detection layer group part, and reducing parameters of the convolution layers, wherein the specific operation is that three convolution kernels of 3 x 3 are used for respectively performing convolution (Conv1_ dw) on three channels of an input image to obtain feature maps corresponding to the three channels; then, the three feature maps are subjected to convolution operation (Conv1_ pw) by using 64 point-by-point convolution kernels with 1 × 1 points, and 64 output feature maps are obtained. Similarly, Conv2 is replaced by a depth convolution layer Conv2_ dw and a stagnation point convolution layer Conv2_ pw. Finally, the pooling operation with pooling kernel size of 2 x2 and step size of 2 is performed. And in all embodiments the standard convolutional layer is modified in this way to a depth separable convolutional layer. And comparing the improved model with the original model by the related performance indexes.

In one embodiment, step 105 discards larger feature maps in a multi-level feature detection packet of the deep neural network, which includes the following steps:

and discarding the feature maps with larger sizes in the multistage feature detection layer group of the deep neural network, and comparing the feature maps with the original model on the basis of the feature maps with the larger sizes.

In one embodiment, the step 106 adjusts the pre-configured prior frame number and the size ratio parameter of the minimum prior frame, and includes the following specific steps:

and adjusting the number of prior frames pre-configured for each convolution layer in the multi-level characteristic detection layer group, and adjusting the size of the minimum prior frame at the same time, so as to compare the minimum prior frame with the original model on the basis of the prior frame.

In one example, step 107 deploys the trained network model to the unmanned vehicle platform constructed by the invention, and encapsulates the network model into ROS nodes. In order to avoid experiment failure caused by extreme illumination conditions, the experiment in the part is carried out in an indoor environment with controllable illumination. The test is carried out indoors by adopting continuous cruising, and the evaluation index, namely the system controlled time ratio, provided by an Nvidia research team is taken as the main standard of judgment.

Example 2

The invention provides an end-to-end automatic driving method based on ROS self-built unmanned vehicle, as shown in fig. 1 and 2, an end-to-end automatic driving framework of the product of the invention comprises the following steps:

step 201: the deployed cameras are used for data acquisition, and the cameras adopted by the invention can acquire full high-definition image data of 30 frames per second. Because the platform is limited and the driving speed of the unmanned vehicle is low, in order to reduce redundant data, the recording frame rate of the color image is reduced to 10 frames per second, and the handle and the camera thread are synchronized by adopting a critical zone mode. The invention installs 3 cameras in front of the vehicle to collect the image data of the front, left and right, the latter two are offset cameras. The image shot by the offset camera is similar to the image shot by the camera right in front when the vehicle deviates from the center line of the road, so that the method can be used for training the offset correction capability of the neural network model. When data are collected, the handle is used for manually controlling the vehicle to generate deviation and correcting the deviation, the collected data are processed, and a steering value generated by the deviation is set as an opposite number of the steering value.

Step 202: the data set is expanded in a data augmentation mode to avoid overfitting and improve the accuracy of the algorithm, and the data augmentation mode is as follows:

horizontally overturning: turning the picture in a mirror image mode, and setting the turning value of the picture as an opposite number;

color dithering: randomly adjusting the color saturation, brightness and contrast of the image to simulate various illumination conditions;

gray level processing: converting the color image into a gray image, and enhancing the learning of boundary information;

fuzzy processing: and carrying out random fuzzy processing on the image, and simulating the conditions of camera shaking and the like.

Step 203: performing end-to-end automatic driving model training of a deep neural network, reading a color image of a data set by the deep neural network, and outputting a predicted steering value; calculating to obtain an error by using the real steering value and the predicted steering value; the weights in the neural network are updated using a back propagation algorithm.

Step 204: and introducing deep separable convolution to compress the overall parameter quantity and the calculated quantity in the neural network. Training is carried out under the condition that other parameters are not changed, the test data set is used for repeated test for 10 times, and the average accuracy of detection of various objects is obtained.

The calculation of the average accuracy of the mAP is shown as formula (1), formula (2) and formula (3).

Equation (1) is the number of correct predictions (True locations) for a given image of category i divided by the total target number in the image category. Formula (2), i.e. the average accuracy of an i class, is the number of all images on the verification set with respect to the accuracy value of class i and/or with the target of class i. Equation (3), mAP, is the sum of the average accuracies of all classes divided by all classes.

Step 205: and discarding the feature maps with larger sizes in the multistage feature detection layer group of the deep neural network, and comparing the feature maps with the original model on the basis of the feature maps with the larger sizes.

Step 206: the number of the prior frames pre-configured for each convolution layer in the multi-level feature detection layer group is adjusted, because the number of the prior frames is too large, the output predicted value is too large, and the calculation resource is wasted; if the number of the prior frames is too small, the difference between the output predicted value and the actual value is large, and even errors occur. And simultaneously adjusting the size parameter of the prior frame, wherein the specific size of the prior frame is calculated as shown in a formula (4):

where m is the number of feature map steps, and in this embodiment, m is 5, s_kRepresents the proportional size of the prior frame in the corresponding feature map, and s_minAnd s_maxThe minimum and maximum values are represented, respectively, in this example, 0.1 and 0.9, respectively, based on which the correlation performance index comparison is performed with the original model, i.e., the original SSD model.

Step 207: and deploying the trained network model to the unmanned vehicle platform constructed by the invention, disguising as an ROS node, and testing. The effect of the invention is evaluated and verified by calculating the control rate through continuous cruise test. Equation (5) is a control rate calculation equation.

Wherein, C_RRepresents the overall vehicle control rate; n represents the number of human interventions; t represents the time required for a human intervention to adjust the vehicle to a normal state, which is taken to be 6 seconds according to the Nvidia study; t represents the total time of vehicle travel.

The invention constructs a complete low-power-consumption small ROS self-building unmanned vehicle platform. The platform adopts a four-wheel differential frame as a basic motion platform of the automatic driving platform, uses Jetson TX2 as an upper layer calculation and control platform, and is matched with related sensor equipment to construct a small automatic driving experiment platform. In view of the performance requirements of the actual platform, the SSD algorithm is improved, and an improved lightweight SSD model is provided, the prediction delay of the model is reduced from 486.6 milliseconds to 162 milliseconds, and the frame rate of real-time monitoring is improved to 9.6 frames per second on average. The light-weight deep neural network model for expanding the receptive field is provided by the invention, the ROS end-to-end automatic driving system is constructed based on the model, the indoor environment can be accurately and rapidly identified, the end-to-end automatic driving is realized, the controlled time ratio of the end-to-end automatic driving system of the model is 97.8%, and compared with the model of Nvidia, the controlled time ratio is improved by 8.7%, and the convergence rate is higher.

Example 3

The following examples are presented to demonstrate the feasibility of the embodiments of examples 1 and 2, and are described in detail below:

in the experiment of the ROS-based self-built unmanned vehicle-based end-to-end automatic driving system, the step 104-106 corresponds to an experiment I, an experiment II and an experiment III. And analyzing the experimental results. All experiments were mixed trained and tested using the Pascal VOC 2007 and Pascal VOC2012 data sets.

Experiment one is to introduce deep separable convolution, compress the overall parameters and the computation in the SSD algorithm neural network. Under the condition that other parameters are unchanged, the two data sets are used for mixed training, the prediction data set is used for repeating the test for ten times, and the average accuracy of the detection of various objects is obtained and is shown in table 1. From experimental data, after the deep separable convolution operation is introduced, the detection performance of the network on various targets is reduced to a certain extent, and the average detection accuracy (mAP) of various targets is reduced from 78.1% to 71.2%. However, the average delay of the detection of a single picture by the model also drops from 486.6 ms to 194.2 ms.

The results of experiment two are shown in table 2. Experiment two abandons the maximum size feature map output by the feature extraction network, removes the feature map from the candidate feature map set, and retrains the model. As can be seen from the table, after the layer feature map is removed, the overall detection performance of the model for all categories is slightly reduced, and the detection accuracy of four categories, such as water bottles, cats, people, and potted plants, is greatly reduced. Analysis of the data set revealed that these classes of targets were smaller in size in the Pascal VOC data set and accounted for a lower percentage of the test pictures. Abandoning the large-size characteristic diagram, causing the detection precision of the model to the small-size object to be reduced, and conforming to the experimental result. Meanwhile, discarding the feature map results in a reduction in the number of bounding boxes that the model needs to regress, and thus the prediction delay of the network is further reduced to 162 milliseconds.

Through comparison of the first experiment and the second experiment, the modified model can be found to have reduced average detection accuracy on small-size objects. The prior box minimum size ratio of the rejected feature map was found by analysis to be 0.1. Therefore, the minimum prior frame proportion of 0.1 is added to the feature map with the maximum size in the multi-level feature map, the model is retrained, and the experimental result is shown in table 3. As can be seen from the data in the table, adding a prior box with a size ratio of 0.1 improves the average accuracy of the model for all classes by about 1.2%. The average prediction delay of a single model picture in the experiment is kept at about 162 milliseconds, so that the fact that the addition of a single small-size prior frame has no influence on the overall prediction quantity of the model is proved, and the average accuracy of model detection can be improved slightly.

Table 4 is a comparison of relevant important performance indexes before and after model improvement, which shows that introducing deep separable convolution and discarding large-size feature maps reduces the computation load of the model, relieves the pressure of the GPU, and enables the model to better implement real-time target detection tasks. As can be seen from table 4, the prediction delay of the model is reduced from more than 500 ms to 162 ms, and the frame rate of real-time target detection is also increased to about 9.6 frames per second. The experiment result shows that the invention introduces the depth separable convolution on the basis of the SSD algorithm, reduces the prediction delay of the model after optimizing and adjusting the multi-level characteristic diagram, and meets the requirement of real-time target detection on a low-computation-force platform.

Table 1 experiment one: average accuracy after introducing separable convolution operation

Table 2 experiment two: discarding average accuracy before and after feature map output by convolutional layer Conv4_3

Table 3 experiment three: adding a ratio of 0.1 of average accuracy before and after the prior frame to Conv7

TABLE 4 lightweight deep neural network model parameter List for extended receptive fields

TABLE 5 comparison of Main Performance indicators before and after model modification

Table 4 is a list of lightweight neural network model parameters for the extended receptive field used. The relevant performance index pairs are given in table 5 after performing the modifications of experiment one, experiment two and experiment three. The conclusion can be drawn that the depth separable convolution is introduced and the large-size characteristic diagram is abandoned, so that the operation amount of the model is reduced, the pressure on a GPU is relieved, and the model can better realize a real-time target detection task on an Nvidia Jetson TX2 platform.

The experimental environment adopted by the invention for carrying out the end-to-end experiment is shown in figure 2, and the specific route is shown by a black thick solid line in the figure: the left lower corner in the drawing is used as a starting point, the automobile runs upwards to a passageway between the tables and the chairs and runs to the other end of the room and then returns to the starting point around a lower gate passage between the tables and the chairs and the wall. The test is carried out by using a cruise ten-turn method. The total time of continuous test of ten cruising circles is 834 seconds, and the control rate of the experimental vehicle can be calculated to be 97.8 percent after the ten cruising circles are manually intervened for 3 times in the midway. The adaptability of the deep neural network model adopted in the method under the environment can be better reflected.

In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.

Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. An end-to-end automatic driving method based on ROS self-built unmanned vehicles is characterized by comprising the following steps:

training an end-to-end automatic driving model of the deep neural network;

2. The ROS-based end-to-end automatic driving method for the self-built unmanned vehicle according to claim 1, wherein the training of the deep neural network end-to-end automatic driving model specifically comprises:

3. The ROS-based self-built unmanned vehicle-based end-to-end automatic driving method according to claim 1, wherein the introduced deep separable convolution specifically comprises:

4. The method for end-to-end automatic driving based on the ROS self-built unmanned vehicle as claimed in claim 3, wherein the improvement of the convolution layer of the feature extraction part and the multi-stage feature detection layer group part is specifically as follows: