CN111208818B

CN111208818B - Intelligent vehicle prediction control method based on visual space-time characteristics

Info

Publication number: CN111208818B
Application number: CN202010012552.XA
Authority: CN
Inventors: 吴天昊; 程洪; 黄瑞; 詹惠琴; 周润发
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-01-07
Filing date: 2020-01-07
Publication date: 2023-03-07
Anticipated expiration: 2040-01-07
Also published as: CN111208818A

Abstract

The invention discloses an intelligent vehicle prediction control method based on visual space-time characteristics, which comprises the steps of firstly constructing a steering wheel corner prediction network, wherein the steering wheel corner prediction network comprises a space characteristic extraction network, N space-time characteristic extraction modules and a space-time characteristic map fusion prediction module, obtaining characteristic maps of different scales and different time steps by the space characteristic extraction network, respectively extracting space-time characteristics from the characteristic maps of each scale by the space-time characteristic extraction modules, fusing the space-time characteristics of different scales together by the space-time characteristic map fusion prediction module to predict steering wheel corners, predicting the moment to be predicted after training the steering wheel corner prediction network, and carrying out exponential weighted averaging on the predicted value and the historical predicted value of the steering wheel corners to obtain the final predicted value of the steering wheel corners. The invention can effectively extract the spatio-temporal information in the continuous image frames, and fuses the spatio-temporal information with different scales, thereby greatly improving the prediction control precision of the intelligent vehicle.

Description

Intelligent vehicle prediction control method based on visual space-time characteristics

Technical Field

The invention belongs to the technical field of intelligent vehicle control, and particularly relates to an intelligent vehicle prediction control method based on visual spatiotemporal characteristics.

Background

The intelligent vehicle end-to-end decision method is characterized in that the deviation of a vehicle can be automatically corrected according to the faced condition when the vehicle runs in a lane. The traditional intelligent vehicle end-to-end decision method generally needs the following steps: the sensor module composed of cameras obtains images of a road ahead, the images are sent to the sensing module to detect lane lines in the images, and then the steering wheel turning angle degree required for keeping the lane lines at the current moment is calculated according to the relationship among the lane lines, the vehicle state, the vehicle pose and the vehicle running direction. The intelligent vehicle end-to-end decision method based on deep learning is characterized in that a plurality of steps in the traditional method are integrally understood to be a model, the model can directly receive information such as images from a sensor and the like to calculate the steering wheel corner required at the current moment, and due to the strong fitting capacity of a deep network, the relation between the road image characteristics and the steering wheel corner can be directly learned based on the deep learning algorithm.

Due to the strong fitting capability and generalization capability of the convolutional neural network, the convolutional neural network has excellent performance in tasks such as image classification, image segmentation, target detection and behavior prediction. The essence of the lane line keeping method is a mapping relation among the relative relation between the vehicle pose, the driving direction and the lane line and the vehicle and the corresponding steering wheel corner, and the essence of the intelligent vehicle end-to-end decision algorithm based on deep learning is that the network can fit the mapping relation in a high-dimensional space through training a deep network, so that the method has the capability of calculating the steering wheel corner according to the image.

Patent publication No. CN108227707A introduces an automatic driving method based on a laser radar and an end-to-end deep learning method, which includes the following steps: converting driving environment information acquired by a laser radar into a depth map in real time; generating a corresponding data label pair according to a corresponding matching rule, and taking the data label pair as training data; and inputting training data into the constructed deep convolution neural network model for training, and obtaining the control quantity of the vehicle through the deep convolution neural network model. The method can utilize the depth map obtained by the laser radar to make an end-to-end decision, but the control of the vehicle is a continuous process, a longer continuous relation should be considered in the process of automatic driving of the vehicle, and a pure depth convolution neural network lacks the capability of extracting time sequence dependence between continuous frames.

The patent with the publication number of CN109581928A introduces an intelligent vehicle end-to-end decision method and system facing to a highway scene, and the method mainly provides a concept of utilizing transfer learning to expand a database, for a convolutional neural network, more data means stronger robustness, and the robustness of an algorithm facing to different scenes can be enhanced when a model is trained in different databases by utilizing the transfer learning, so that the anti-interference capability of the algorithm is stronger. As the network utilizes more data in the training process, the phenomenon of overfitting of the network can be avoided, and the phenomenon that the network shows low deviation and high variance on a test set is relieved. The method of this patent, while improving performance, still lacks consideration for a continuous vehicle control process.

A patent with publication number CN109656134A introduces an intelligent vehicle end-to-end decision method based on a spatio-temporal joint recurrent neural network, in an algorithm used by the patent, a long and short time memory network is utilized to extract a time dependency relationship between continuous data frames, the intelligent vehicle end-to-end decision method proposed in the patent lacks rationality for fusion of time sequence information and spatial feature information, in the method, the long and short time memory network is utilized to extract the time dependency information between the continuous data frames, the information has a large amount of redundant information when being subjected to joint calculation with an image frame, and the simple utilization of the long and short time memory network destroys two-dimensional features in the image, thereby losing some information in the link.

Patent publication No. CN109615064A introduces an intelligent vehicle end-to-end decision method based on a space-time feature fusion recurrent neural network, in the method, a convolutional neural network and a long-time memory network are used for extracting space features and time dependence information respectively, in the method, four different ways are adopted to try to fuse two features, namely feature addition, feature subtraction, feature multiplication and feature cascade methods, and finally a better effect is obtained in the feature cascade fusion method. Although the method utilizes a feature cascading mode to improve the accuracy of an end-to-end decision network to some extent, no matter which method mentioned in the method is lack of rationality, and the extraction of the spatial feature and the extraction of the temporal feature in the method are still two separate processes.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide an intelligent vehicle prediction control method based on visual space-time characteristics.

In order to realize the aim, the intelligent vehicle prediction control method based on the visual space-time characteristics comprises the following steps:

s1: the method comprises the following steps of constructing a steering wheel corner prediction network, wherein the steering wheel corner prediction network comprises a spatial feature extraction network, N space-time feature extraction modules and a time feature map fusion prediction module, wherein:

the input of the spatial feature extraction network is a front road image detected by an intelligent vehicle, the front road image detected by the intelligent vehicle at the current moment t and the front K frames of front road images are sequentially input into the spatial feature extraction network according to time, feature maps of the last N layers of the spatial feature extraction network are respectively output to the corresponding nth space-time feature extraction module, and the feature map of the last nth layer corresponding to the moment t-K is recorded as F _t-k,n Wherein K =0,1, \8230;, K, N =1,2, \8230;, N;

each space-time feature extraction module comprises a first convolution layer, a convolution long-time and short-time memory network, a second convolution layer and a third convolution layer, wherein:

the convolution kernel size of the first convolution layer is 1 × 1, and the input feature map F _t-k,n Reducing dimension, outputting to convolution time memory network, and recording first convolution layer outputCharacteristic diagram F 'of' _t-k,n The size is W multiplied by H multiplied by L;

the input of the convolution length time memory network is a combined feature map, and the combined feature map is composed of feature maps F' _t-k,n A characteristic diagram F' output by the convolution long-time and short-time memory network corresponding to the previous frame of road image _t-k-1,n Spliced, the size of the obtained product is W multiplied by H multiplied by 2L, and when t-k-1 is less than 0, a characteristic diagram F ″ _t-k-1,n Wherein each pixel value is 0, the convolution long-time and short-time memory network is used for extracting the space-time characteristics in the combined characteristic diagram and outputting a characteristic diagram F ″ _t-k-1,n (ii) a K +1 feature maps F 'are sequentially added' _t-k,n Inputting the corresponding combined feature map into a convolution long-time and short-time memory network, and inputting the feature map F' corresponding to the current time t _t,n Outputting to the second convolution layer;

the convolution kernel size of the second convolution layer is 3 × 3, for the input feature map F ″) _t,n Performing convolution processing and outputting to a third convolution layer;

the convolution kernel size of the third convolution layer is 3 multiplied by 3, and after convolution processing is carried out on the feature graph output by the second convolution layer, the obtained feature graph is output to the feature graph fusion prediction module;

the space-time feature map fusion prediction module fuses feature maps of different scales output by the N space-time feature extraction modules and outputs steering wheel corner predicted values V at the current time t and M future times _t+m Wherein M =0,1, \ 8230;, M;

s2: acquiring a plurality of continuous road images in front of the intelligent vehicle and corresponding steering wheel corners, taking the road images in front as the input of a steering wheel corner prediction network, taking the steering wheel corners as expected output, and training the steering wheel corner prediction network;

s3: for the time t' to be predicted, sequentially inputting the front road image detected by the intelligent vehicle at the time t and the front K frames of front road images into the steering wheel corner prediction network according to time, and taking the obtained M +1 steering wheel corner prediction values as initial prediction values V of corresponding time _t′+m (ii) a Finally predicting values V 'of steering wheel rotation angles at the first Q moments' _t′-q Q =1,2, \ 8230;, Q, and M +1 initial predicted values V _t′+m Obtaining a sequence of predicted values of the steering wheel rotation angles according to time sequence, performing exponential weighted averaging on the sequence, and taking a result obtained by performing exponential weighted averaging at the time t '+ M as a final predicted value V' of the steering wheel rotation angle at the time t 'to be predicted' _t′ 。

The invention relates to an intelligent vehicle prediction control method based on visual space-time characteristics, which comprises the steps of firstly constructing a steering wheel corner prediction network, wherein the steering wheel corner prediction network comprises a space characteristic extraction network, N space-time characteristic extraction modules and a space-time characteristic map fusion prediction module, the space characteristic extraction network is used for obtaining characteristic maps of different scales and different time steps, the space-time characteristic extraction modules are used for respectively extracting space-time characteristics from the characteristic maps of each scale, then the space-time characteristic map fusion prediction module is used for fusing the space-time characteristics of different scales together to predict a steering wheel corner, after the steering wheel corner prediction network is trained, the time to be predicted is predicted, and the predicted value of the steering wheel corner and a historical predicted value are subjected to exponential weighted averaging to obtain the final predicted value of the steering wheel corner. The invention can effectively extract the spatio-temporal information in the continuous image frames, and fuses the spatio-temporal information with different scales, thereby greatly improving the prediction control precision of the intelligent vehicle.

Drawings

FIG. 1 is a block diagram of an embodiment of the present invention based on the visual spatiotemporal features of the intelligent vehicle predictive control method;

FIG. 2 is a block diagram of a steering wheel angle prediction network in accordance with the present invention;

FIG. 3 is a block diagram of a spatiotemporal feature extraction module according to the present invention;

FIG. 4 is a schematic structural diagram of a spatiotemporal feature map fusion prediction module in the present embodiment;

FIG. 5 is a comparison curve of the output value and the tag value of the present invention on the Udacity chanllenge II database;

FIG. 6 is a graph comparing the output value and the driver control value of the present invention in a campus environment.

Detailed Description

Specific embodiments of the present invention are described below in conjunction with the accompanying drawings so that those skilled in the art can better understand the present invention. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.

Examples

FIG. 1 is a block diagram of an embodiment of the intelligent vehicle predictive control method based on visual spatiotemporal characteristics according to the present invention. As shown in fig. 1, the intelligent vehicle predictive control method based on visual space-time characteristics of the present invention specifically includes the steps of:

s101: constructing a steering wheel corner prediction network:

and constructing a steering wheel corner prediction network. Fig. 2 is a block diagram of a steering wheel angle prediction network in accordance with the present invention. As shown in fig. 2, the present invention includes a spatial feature extraction network, N spatio-temporal feature extraction modules, and a temporal feature map fusion prediction module, and each module is described in detail below.

The input of the spatial feature extraction network is a front road image detected by an intelligent vehicle, the front road image detected by the intelligent vehicle at the current moment t and a front K frame of front road image (K +1 frames of front road images in total) are sequentially input into the spatial feature extraction network according to time, feature maps of the last N layers of the spatial feature extraction network are respectively output to a corresponding nth space-time feature extraction module, and the feature map of the last nth layer corresponding to the moment t-K is recorded as F _t-k,n Where K =0,1, \8230;, K, N =1,2, \8230;, N. In the embodiment, the spatial feature extraction network adopts a convolutional neural network part of an Nvidia-Pilot network, the total number of input front road images is 15, and feature maps of the last 4 layers are output, so that feature maps of 15 time steps on 4 different scales can be obtained by the spatial feature extraction network.

The space-time feature extraction module is used for extracting corresponding space-time features from the feature graph output by the space feature extraction network. FIG. 3 is a block diagram of a spatiotemporal feature extraction module according to the present invention. As shown in fig. 3, each spatio-temporal feature extraction module of the present invention includes a first convolutional layer, a convolutional long-short-time memory network, a second convolutional layer, and a third convolutional layer, respectively, where:

the convolution kernel size of the first convolution layer is 1 × 1, and the feature map F is input _t-k,n After dimensionality reduction is carried out, the output is transmitted to a convolution duration memory network, and a feature map F 'output by the first convolution layer is recorded' _t-k,n The size is W × H × L. The first convolution layer has the function of reducing the number of channels of the feature map, thereby reducing the calculation amount of the space-time feature extraction module. In the neural element of the convolution memory network of the embodiment, eight convolution layers and a plurality of fully connected layers exist, if the neural element is expanded in fifteen time steps, the calculated amount of the neural element is at least equivalent to the calculated amount of one hundred twenty convolution layers, so that the invention arranges one 1 × 1 convolution layer to reduce the parameter amount of the characteristic diagram, thereby reducing the calculated amount of the space-time characteristic extraction module.

The input of the convolution long-time and short-time memory network is a combined characteristic diagram, and the combined characteristic diagram is composed of a characteristic diagram F' _t-k,n A characteristic diagram F' output by the convolution long-time and short-time memory network corresponding to the previous frame of the front road image _t-k-1,n Spliced, the size of the obtained product is W multiplied by H multiplied by 2L, and when t-k-1 is less than 0, a characteristic diagram F ″ _t-k-1,n Each pixel value is 0, the convolution long-time and short-time memory network is used for extracting the space-time characteristics in the combined characteristic diagram and outputting a characteristic diagram F ″ _t-k-1,n (ii) a K +1 feature maps F 'are sequentially added' _t-k,n Inputting the corresponding combined feature map into a convolution long-time and short-time memory network, and inputting the feature map F' corresponding to the current time t _t,n And outputting to the second convolution layer. The convolution long-short-time memory network is different from a common long-short-time memory network in that the former replaces the operation of multiplying a forgetting gate, an input gate and an output gate in the latter by a one-dimensional feature vector and a weighted value in a cell state with the operation of convolving a two-dimensional feature map and a multi-channel convolution kernel, and due to the characteristic, the convolution long-short-time memory network can extract spatial information in continuous image timing dependence information at the same time, and does not destroy the original spatial structure of the image like the common long-short-time memory network.

The convolution kernel size of the second convolution layer is 3 × 3, and the feature map F ″, which is input _t,n And outputting the convolution processed result to a third convolution layer.

The convolution kernel size of the third convolution layer is 3 x 3, and after convolution processing is performed on the feature map output by the second convolution layer, the obtained feature map is output to the feature map fusion prediction module.

Two convolutional layers are arranged at the last of the space-time feature extraction module, and are processed by a convolutional long-time memory network, the time features inferred by fifteen time steps are fused into one time step in the obtained feature diagram, and the two convolutional layers are arranged to further screen and extract the features on the basis. Compared with the prior art, the space-time characteristic extraction module designed by the invention has more reasonable mode for extracting the space information and the time dependence information in the continuous images, and can simultaneously learn the space dependence relationship between the continuous frames under the condition of keeping the space characteristics of the two-dimensional characteristic diagram.

The space-time feature map fusion prediction module fuses feature maps of different scales output by the N space-time feature extraction modules and outputs steering wheel corner predicted values V at the current time t and M future times _t+m Wherein M =0,1, \8230;, M. In this embodiment, the spatio-temporal feature map fusion prediction module is implemented by using a full connection layer. FIG. 4 is a schematic structural diagram of the spatio-temporal feature map fusion prediction module in this embodiment. As shown in fig. 4, the spatio-temporal feature map fusion prediction module in this embodiment includes N fully-connected layers. Arranging the N scales of feature maps obtained by the N space-time feature extraction modules from small to large according to the scales, and respectively recording the length and the width of the nth feature map and the number of channels as A _n 、B _n 、C _n . The processing flow of the spatio-temporal feature map fusion prediction module to the N feature maps is as follows:

1) Expanding the characteristic diagram of the Nth scale into a characteristic vector F _N Dimension of the feature vector is A _N ×B _N ×C _N The output dimension is A after inputting the 1 st layer full connection layer _N-1 ×B _N-1 ×C _N-1 Characteristic vector f of _N 。

2) Let the full-connection layer number n' =1.

3) The feature vector f _N-n′+1 And characteristic vector F obtained by expanding characteristic diagram of N-N' scale _N-n′ Adding to obtain dimension A _N-n′ ×B _N-n′ ×C _N-n′ Feature vector F' _N-n′ 。

4) Judging whether N' < N-1, if yes, entering step 5), and if not, entering step 7).

5) Feature vector F' _N-n′ The output dimension is A after the n' +1 th layer full connection layer is input _N-n′-1 ×B _N-n′-1 ×C _N-n′-1 Characteristic vector f of _N-n′ 。

6) Let n '= n' +1, return to step 3).

7) The feature vector F 'at this time' _N-n′ Outputting predicted steering wheel angle values V at the current time t and M future times after being input to the N-th full-connection layer _t+m And forming a predicted steering wheel angle vector.

S102: training a steering wheel corner prediction network:

the method comprises the steps of obtaining a plurality of continuous road images in front of the intelligent vehicle and corresponding steering wheel angles, taking the road images in front as the input of a steering wheel angle prediction network, taking the steering wheel angles as the expected output, and training the steering wheel angle prediction network. The training data can be the data of an open database or can be acquired by self. The loss function employed during the training process is as follows:

wherein L is _t+m Root mean square error, lambda, of predicted and true values of the rotation angle at the time t + m _t+m Represents the weight corresponding to the time t + m, is set according to actual needs, and

s103: predicting the steering wheel angle:

for the time t' to be predicted, the intelligent vehicle is detected at the time tThe front road image and the front K frames of front road images are sequentially input into a steering wheel corner prediction network according to time, and the obtained M +1 steering wheel corner prediction values are used as initial prediction values V of corresponding moments _t′+m . In order to control the steering wheel of the smart car more smoothly, the final predicted values V 'of the steering wheel angles at the first Q moments' _t′-q Q =1,2, \8230;, Q, and M +1 initial predictors V _t′+m Arranging according to time sequence to obtain a sequence of predicted values of the steering wheel angle, performing exponential weighted averaging on the sequence, and taking the result obtained by performing exponential weighted averaging at the time t '+ M as the final predicted value V' of the steering wheel angle at the time t 'to be predicted' _t′ . The exponential weighted average is a common sequence data processing method, and the detailed process thereof is not described herein.

In order to better illustrate the technical effects of the invention, the invention is experimentally verified by using a specific example. The experimental environment verified by the experiment is as follows: the central processing unit is Intel (R) Xeon (R) CPU E5-2620v3@2.40GHz, the graphic processing unit is NVIDIA GTX Titan X, the display memory is 11GB, the operating system is Ubuntu 16.04LTS, the deep learning framework uses PyTorch1.0, the Python environment is Python3.7, the version of the ROS system is Kinetic, and the experimental vehicle is a steam FAW A70E.

The method is respectively tested on an end-to-end decision public database (Udacity-Change II database) of the intelligent vehicle and an actual vehicle, and the adopted evaluation index is root mean square error (the root mean square error refers to the square root of the square sum of the mean value of the errors of corresponding points of the predicted data and the original data). Since the wheel rotation angle value is stored in the Udacity-change ii database, the steering wheel rotation angle prediction value obtained by the method is converted into the wheel rotation angle value for comparison. FIG. 5 is a graph showing the comparison between the output value and the tag value of the present invention on the Udacity chanllenge II database. FIG. 6 is a graph comparing the output value and the driver control value of the present invention in a campus environment. The root mean square error RMSE between the steering wheel angle value obtained by the method and the actual value/reference value is 0.0491, so that the method can obtain the accurate steering wheel angle value.

Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims

1. An intelligent vehicle prediction control method based on visual space-time characteristics is characterized by comprising the following steps:

s1: the method comprises the following steps of constructing a steering wheel corner prediction network, wherein the steering wheel corner prediction network comprises a spatial feature extraction network, N space-time feature extraction modules and a space-time feature map fusion prediction module, wherein:

the convolution kernel size of the first convolution layer is 1 × 1, and the input feature map F _t-k,n After dimension reduction, outputting the obtained product to a long-time and short-time memory network, and recording a feature map F 'output by the first convolution layer' _t-k,n The size is W multiplied by H multiplied by L;

the input of the convolution length time memory network is a combined feature map, and the combined feature map is composed of feature maps F' _t-k,n A characteristic diagram F' output by the convolution long-time and short-time memory network corresponding to the previous frame of road image _t-k-1,n Spliced, the size of the obtained product is W multiplied by H multiplied by 2L, and when t-k-1 is less than 0, a characteristic diagram F ″ _t-k-1,n Each pixel value is 0, the convolution long-time and short-time memory network is used for extracting the space-time characteristics in the combined characteristic diagram and outputting a characteristic diagram F ″ _t-k-1,n (ii) a K +1 feature maps F 'are sequentially added' _t-k,n Inputting the corresponding combined feature map into a convolution long-time and short-time memory network, and inputting the feature map F' corresponding to the current time t _t,n Outputting to the second convolution layer;

the space-time feature map fusion prediction module fuses feature maps of different scales output by the N space-time feature extraction modules and outputs steering wheel corner predicted values V at the current time t and M future times _t+m Wherein M =0,1, \ 8230;, M; the spatio-temporal feature map fusion prediction module comprises N full-connected layers, the feature maps of N scales obtained by the N spatio-temporal feature extraction modules are arranged from small to large according to the scale, and the length, the width and the channel number of the nth feature map are respectively recorded as A _n 、B _n 、C _n The processing flow of the spatio-temporal feature map fusion prediction module to the N feature maps is as follows:

1) Expanding the characteristic diagram of the Nth scale into a characteristic vector F _N Dimension of the feature vector is A _N ×B _N ×C _N The output dimension is A after inputting the 1 st layer full connection layer _N-1 ×B _N-1 ×C _N-1 Characteristic vector f of _N ；

2) Making a serial number n' =1 of a full connection layer;

3) The feature vector f _N-n′+1 And a feature vector F obtained by expanding the feature map of the N-N' th scale _N-n′ Adding to obtain dimension A _N-n′ ×B _N-n′ ×C _N-n′ Feature vector F of _N ′ _-n′ ；

4) Judging whether N' < N-1, if yes, entering step 5), otherwise, entering step 7);

5) Feature vector F _N ′ _-n′ The dimension of output is A after the n' +1 layer full connection layer is input _N-n′-1 ×B _N-n′-1 ×C _N-n′-1 Characteristic vector f of _N-n′ ；

6) Let n '= n' +1, return to step 3);

7) Feature vector F at the moment _N ′ _-n′ After being input to the N-th full-connection layer, the predicted steering wheel angle values V at the current moment t and M future moments are output _t+m Forming a predicted value vector of the steering wheel angle;

s3: for the time t' to be predicted, sequentially inputting the front road image detected by the intelligent vehicle at the time t and the front road image of the previous K frames into the steering wheel corner prediction network according to time, and taking the obtained M +1 steering wheel corner prediction values as initial prediction values V of corresponding time _t′+m (ii) a The final predicted value V of the steering wheel rotation angle of the previous Q moments _t ″ _-q Q =1,2, \ 8230;, Q, and M +1 initial predicted values V _t′+m And according to time sequence, obtaining a predicted value sequence of the steering wheel angle, carrying out exponential weighted average on the sequence, and taking a result obtained by carrying out exponential weighted average on the time t '+ M as a final predicted value of the steering wheel angle at the time t' to be predicted.

2. The intelligent vehicle predictive control method of claim 1, wherein the spatial feature extraction network employs a convolutional neural network portion of an Nvidia-Pilot network.