CN113076815B

CN113076815B - Automatic driving direction prediction method based on lightweight neural network

Info

Publication number: CN113076815B
Application number: CN202110280774.4A
Authority: CN
Inventors: 王慧; 蒋朝根
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2022-09-27
Anticipated expiration: 2041-03-16
Also published as: CN113076815A

Abstract

The invention specifically discloses an automatic driving direction prediction method based on a lightweight neural network, which comprises the following steps: step 1: training a neural network model; and 2, step: testing the neural network model; in the process of training the neural network model, the acquired image is preprocessed, and the data is subjected to horizontal overturning, brightness adjustment, angle adjustment and data screening operation, so that a data set is enriched, training samples are added, and the training network model is better. And the EffNet network is combined with a BP neural network propagation algorithm to adjust the error between the predicted steering wheel rotation angle and the actual steering wheel rotation angle, so that the network budget requirement is reduced, and the method has an actual reference value and a great market prospect.

Description

Automatic driving direction prediction method based on lightweight neural network

Technical Field

The invention belongs to the technical field of automatic driving direction prediction, and particularly relates to an automatic driving direction prediction method based on a lightweight neural network.

Background

The automatic driving automobile technology mainly depends on artificial intelligence, and is assisted by various other technologies, such as millimeter wave radar ranging, laser radar ranging, GPS and the like. Effective environment perception and object detection are the premises of Safe Driving, the document 'Policy-Gradient and Actor-critical Based State retrieval Learning for safety of Autonomous Driving' proposes an environment perception framework for Autonomous Driving by using State Representation Learning (SRL), and the document [2] proposes a gravitation search algorithm for obstacle avoidance path planning in automatic Driving, so that the speed of response is increased, and accidents can be reduced. The document 'automatic driving vehicle obstacle avoidance path planning research based on an gravitation search algorithm' provides online driving style identification based on a two-stage multi-dimensional Gauss hidden Markov process (MGHMP), and personalized automatic driving is realized on a traffic road mixed by people and vehicles. The document 'Learning qualified automated Driving person with progressive Optimized learned Function' proposes a DRL algorithm of a person and a ring, and the Learning method has online Learning capability and environmental adaptability, so that the Driving experience is good. The light has a great influence on automatic driving, and a document "a deep left based image enhancement for automatic driving at night" proposes a generation pipeline method for converting a daytime image into a low-light image.

With the development and wide application of deep Learning, marius, b et al propose an End-to-End deep Learning automatic Driving technique "End to End Learning for Self-Driving Car" which accelerates the calculation speed, but has a high requirement for hardware configuration. The conventional convolutional neural network is used for predicting the rotation angle of the steering wheel, and the hard disk configuration is low, so that the operation speed is low, and the vehicle can be driven out of a lane. It is desirable for a vehicle to be automatically driven in life to react on the order of milliseconds or even subtleties, especially when the vehicle speed is fast or in critical situations.

Disclosure of Invention

In view of the above problems, the present invention aims to provide an automatic driving direction prediction method based on a lightweight neural network, which effectively reduces the consumed computation time, computation delay memory and computation requirements, and can significantly reduce the computation cost, so as to greatly improve the operation speed.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

an automatic driving direction prediction method based on a lightweight neural network comprises the following steps:

step 1: training a neural network model;

101: acquiring image data;

102: preprocessing an image;

103: the preprocessed image is transmitted into an EffNet network, and the EffNet network can generate an expected rotation angle according to the input image;

104: recording the angle of the rotating steering wheel in a manual mode, and preprocessing an image captured by the angle of the rotating steering wheel to generate an actual steering wheel rotating angle;

105: calculating the difference value between the expected rotation angle and the actual rotation angle of the steering wheel;

106: transmitting the difference value to an EffNet network through a BP neural network propagation algorithm, continuously updating the weight so as to enable the difference value between the expected rotation angle and the actual rotation angle of the steering wheel to be minimum, and storing an optimal training neural network model at the moment;

step 2: testing neural network models

201: capturing a current picture by using a central camera of an automobile in the Unity simulator, and transmitting the current picture to an EffNet network through a network socket to be used as input of the EffNet network;

202: the EffNet network predicts the rotating direction and angle of the steering wheel of the automobile according to the current picture, the predicted angle is transmitted to the simulator, the simulator controls the automobile to run according to the returned angle, the automobile continues to move forwards, and the front cambered surface is transmitted to the EffNet network in real time, so that the operation is repeated.

Preferably, in step 101, image data is obtained through a Unity simulator, wherein the Unity simulator comprises a left camera, a middle camera, a right camera, a manual mode and a training data set, the training data set is obtained, angle data, throttle data, speed data and brake data of a vehicle driven at the current moment can be respectively obtained, 24108 pictures are generated, each picture has a pixel of 320 × 160, and the pictures are saved in an IMG folder and csv files are generated and saved.

Preferably, in step 102, the image preprocessing comprises the following steps:

(1) image cutting;

(2) adjusting the brightness of the image;

(3) and adjusting the image angle.

Preferably, in the step (2), the image is first converted from RGB color space to HSV color space, where V represents brightness and HS represents chroma and saturation; then, keeping the value of HS unchanged, multiplying the value of V by a coefficient alpha, wherein the value range of the coefficient alpha is [0.1, 1 ]; and finally converting the HSV image into an RGB image.

Preferably, in the step (3) above, the image is horizontally flipped.

Preferably, in the

steps

1 and 2, in the EffNet network, a leak ReLU activation function is adopted, an adaptive matrix estimation optimizer is adopted, and epoch is 25 times; the batch image number batch _ size is set to 32 and the penalty function is a mean square error penalty function.

Preferably, in the step 106, the weight is changed by using a BP back propagation algorithm, and the weight between layers is adjusted by using a δ learning algorithm;

the learning algorithm of the connection weight of the output layer and the hidden layer is as follows:

wherein, delta is the learning efficiency, and delta belongs to [0, 1 ];

the weight of the network at the moment k +1 is:

w _jo (k+1)＝w _jo (k)+Δw _jo (2)

the hidden layer is connected with the input layer by a weight w _ij Learning algorithm

The weight of the network at the moment k +1 is

w _ij (k+1)＝w _ij (k)+Δw _ij (5)

In order to avoid oscillation and slow convergence speed in the learning process of the weight, that is, adding the momentum factor α makes the momentum factor α belong to [0, 1 ]:

w _jo (k+1)＝w _jo (k)+Δw _jo +α*(w _jo (k)-w _jo *(k-1) (6)

w _ij (k+1)＝w _ij (k)+Δw _ij +α*(w _ij (k)-w _ij *(k-1)) (7)。

the invention has the beneficial effects that: according to the automatic driving direction prediction method based on the lightweight neural network, the acquired image is preprocessed, and the data is subjected to horizontal overturning, brightness adjustment, angle adjustment and data screening operation, so that a data set is enriched, training samples are added, and a training network model is better. And the EffNet network is combined with a BP neural network propagation algorithm to adjust the error between the predicted steering wheel rotation angle and the actual steering wheel rotation angle, so that the network budget requirement is reduced, and the method has an actual reference value and a great market prospect.

Drawings

FIG. 1 is a flow chart of a training neural network;

FIG. 2 is a flow chart of an autopilot test;

FIG. 3 shows an image sample; (a) capturing a picture by the left camera; (b) the middle camera captures a picture; (c) capturing a picture by the right camera;

FIG. 4 shows the distribution of the output angle y;

FIG. 5 shows a left camera capturing a picture;

FIG. 6 is a diagram showing a comparison of a conventional convolutional neural network and a deep separable convolutional network; (a) a normal convolutional network, (b) a deep separable network;

FIG. 7 shows the Leaky Relu activation function;

FIG. 8 is a graph of the EffNet network model loss function;

FIG. 9 is a graph of the loss function of a convolutional neural network.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the following description will be made with reference to the accompanying drawings and embodiments.

System frame

For the detection of the transmission road, image feature extraction is carried out manually, the position and the angle of a road boundary are recorded, and then the angle and the position of a rotating steering wheel are judged according to a geometric method. The invention uses the lightweight convolution neural network, so that the input is an image, the output is an angle, the design is divided into two parts, the first part is a training neural network, and the second part is a test network model. The invention provides an automatic driving direction prediction method based on a lightweight neural network, which comprises the following specific steps as shown in figures 1 and 2:

step 1: training a neural network model;

101: acquiring image data;

102: preprocessing an image;

105: calculating a difference value between the expected rotation angle and the actual rotation angle of the steering wheel;

106: transmitting the difference value to an EffNet network through a BP neural network propagation algorithm, and continuously updating the weight so as to minimize the difference value between the expected rotation angle and the actual rotation angle of the steering wheel, and at the moment, storing an optimal training neural network model;

step 2: testing neural network models

In step 101, image data is obtained through a Unity simulator, which has three left, middle and right cameras for capturing images, and a manual mode is pressed for obtaining a training data set, so that angle, throttle, speed and brake data of a driving car at the current moment can be respectively obtained, 24108 pictures are generated, each picture has pixels of 320 × 160, and the pictures are saved under an IMG folder and csv files are generated for saving. Fig. 3 shows the images captured by the left, middle and right cameras, and the arc of rotation of the steering wheel is 0.06176. And the automatic driving mode is that python interacts with the Unity simulator through a socket, and a 4567 port of the simulator is called to test the training model, so that automatic driving is realized.

In step 102, image pre-processing comprises the steps of:

(1) image segmentation

Not all images are used and it is necessary to draw a region of interest (ROI) of the user. Therefore, the arrangement is that 20 pixels are cut from bottom to top, namely, the car head is partially removed, and 20 pixels are cut from top to bottom, namely, the sky is partially removed, and the cut image is (20:140,0: 320).

(2) Image brightness adjustment

The method is expected to lead the neural network to learn the angle of the steering wheel to rotate no matter the light intensity of the automatic driving automobile is strong or weak, the automatic driving automobile can be driven normally, the change of the brightness can seriously influence the judgment of the direction of the neural network, and the robustness of the network to different environments can be improved by adjusting the brightness of the image. Since not so many data sets are collected, images of different brightness are captured, and therefore light brightness adjustment is required. In this design, based on the HSV spatial brightness adjustment algorithm. The image brightness is adjusted as follows: firstly, converting an image from an RGB color space into an HSV color space, wherein V represents brightness, and HS represents chroma and saturation; then, keeping the value of HS unchanged, multiplying the value of V by a coefficient alpha, wherein the value range of the coefficient alpha is [0.1, 1 ]; and finally converting the HSV image into an RGB image.

(3) Image angle adjustment

Statistically, the highest frequency of 0 angle occurrences is about 20 times the number of left and right turns, with more positive angles (1900) than negative angles (1775). Fig. 4 shows the distribution of the output angles y. It can be seen that 95% of the data sets at angle 0 need to be randomly removed to balance the data. Similarly, since the right-turn angle is larger than the left-turn angle, the image is horizontally flipped to balance the left-turn and right-turn times.

In the automatic driving test using the simulator, it is the image captured by the central camera that is transmitted to the python file through the socket and then the angle of output is transmitted back to the simulator, so it is the picture captured by the central camera that needs attention. But each captured picture is captured by the left camera and the right camera together. The test data set needs to be enriched with pictures captured by the left and right cameras. Fig. 5 is a scene of a picture captured by the left camera.

The black line is a road boundary line, the black rectangle is a trolley, the point 0 is a point in the center of the road, the point C represents a position point of a central camera, the point L is a position point of a left camera, a line parallel to the trolley is led out from the point O to the point H, the point C is connected with the point H, the segment CH is a position where the automobile is expected to travel, the point L is connected with the point H, and the segment LH is a road section where the automobile is expected to travel according to a picture captured by the left camera.

In the same way, the picture captured by the right camera

Introduction to network architecture

1. BP neural network algorithm

The BP neural network consists of an input layer, an output layer and a plurality of hidden layers, each layer is provided with a plurality of nodes, the connection state of the nodes between the layers is embodied by weight, and the basic idea is a gradient descent method. Each node in the BP neural network is used as a sensor and consists of an input value, a weight, a bias, an activation function and an output.

In the design process, the weight is changed by using a BP back propagation algorithm, the back propagation algorithm is adopted, and the weight between layers is adjusted by using a delta learning algorithm.

According to the gradient descent method, the learning algorithm of the weight is as follows:

connection weight w of output layer and hidden layer _jo The learning algorithm is as follows:

wherein, delta is the learning efficiency, and delta belongs to [0, 1 ];

the weight of the network at the moment k +1 is:

w _jo (k+1)＝w _jo (k)+Δw _jo (2)

The weight of the network at the moment k +1 is

w _ij (k+1)＝w _ij (k)+Δw _ij (5)

w _jo (k+1)＝w _jo (k)+Δw _jo +α*(w _jo (k)-w _jo *(k-1) (6)

w _ij (k+1)＝w _ij (k)+Δw _ij +α*(w _ij (k)-w _ij *(k-1)) (7)

in this item a sigmoid activation function is used. The BP neural network can realize arbitrary approximation to the function and has strong nonlinear mapping capability.

2. Lightweight neural network EffNet

Two technical points are applied in the EffNet network model: 1) depth Separable Convolution (Depthwise Separable Convolition); 2) spatial Separable Convolution (Spatial Separable Convolution).

1) Depth Separable Convolution (Depthwise Separable Convolation)

Fig. 6 is a diagram comparing a general convolutional neural network with a depth separable convolutional network, in which (a) is the general convolutional network and (b) is the depth separable network. Assuming that a picture format of 10 × 3 is input in the conventional convolutional neural network, the convolution operation is performed with 5 × 3 of the convolution kernel, and finally a feature map of 6 × 1 is output, and the required calculation amount is 10 × 3 × 5 × 3 ═ 22500. In the depth separable convolutional network, the input image size was still 10 × 3 convolutional networks, convolved with three 5 × 1 convolutional kernels to produce 6 × 3 outputs, and then convolved with 1 × 3 convolutional kernel with 6 × 3 convolutional layers to obtain one convolution of 6 × 1, the amount of computation required was 10 × 15 × 1 × 3+6 × 3, 7608, and the amount of computation was reduced to 66.2% of the original.

In the conventional Relu activation function, in the case that x is less than 0, y is 0, so the descent gradient is 0, and therefore, no matter how this network is trained, the weight of the convolution kernel and the weight of the node of the connection depth network are not changed. Since Leaky Relume has a certain slope in the case of x <0, the value of y will still change with the change of x. For the Leaky ReLU activation function, even if the value input in the middle stage of a certain stage is less than 0, the descending gradient is not 0 due to a certain inclination, so that the network can be optimized by obtaining the gradient and slowly changing the network under the condition of continuously inputting negative values. Fig. 7 is a diagram of the leak Relu activation function.

2) Spatial Separable Convolution (Spatial Separable Convolation)

The most common case is to split 1 convolution kernel of 3 x 3 into convolution kernels of 3 x 1 and 1 x 3.

The common convolution calculation amount is 3 ² *M ² C, where M is the length/width of the input image and C is the number of channels of the input image. Convolution with a convolution kernel of 3 x 1 using spatially separable convolution, the calculated quantity being 3 x M ² C, then convolved with a convolution kernel of 1 x 3, the calculated quantity being 3 x M ² C, total calculated amount is 6M ² C. It follows that the amount of computation using convolution decomposition is 2/3 the amount of computation of ordinary convolution.

Comparison with end-to-end convolutional neural network

TABLE 1 network architecture

In an end-to-end convolutional neural network, a ReLU activation function is adopted, a Stochastic Gradient Descent (SGD) optimizer is adopted, and epoch is 50 times. In the EffNet network, a Leaky ReLU activation function is adopted, an adaptive matrix estimation (Adam) optimizer is adopted, and epoch is 25 times. The number of batch images batch _ size is set to 32 and the penalty function is a mean squared error penalty function (mean squared error).

In order to prevent the network from generating an overfitting phenomenon, a Dropout layer is utilized to randomly inactivate neurons in the network design process, and the generalization capability of the neural network is improved. By utilizing the regularization, batch normalization and Early Stopping function (Early Stopping), the overfitting phenomenon is well solved.

In the process of training the network model, in order to efficiently read data and solve the problem that images occupy memory except for data enhancement operation, a Batch generator (Batch generator) of python is called, and the generator calls the set number of images at one time according to requirements instead of reading all image memory into a hard disk at one time. The batch generator does not need to generate all image enhanced images in advance, does not occupy too much hard disk space, and increases the time required for reading hard disk files.

Optimizer comparisons

Stochastic gradient descent algorithm (SGD)

The gradient descent algorithm is a first-order optimal algorithm, also called the steepest descent method, and aims to find a local minimum of a function. The gradient descent method is based on the observation that if the real-valued function f (x) is differentiable and defined at point a, the function f (x) is in the opposite direction of the gradient at point a

The decrease is fastest.

μ is the learning rate, θ is the network parameter, and J (θ) is a value representing the loss function. The loss function in this design is a mean square error loss function, so:

the mini batch gradient descent adopted by the design combines the advantages of batch processing and random gradient descent methods, weakens the target function oscillation, is more stable, and is easy to realize hardware acceleration. The method has the disadvantages that the traditional mini batch processing can not ensure convergence, when the learning rate is too small, the convergence is very slow, the learning rate is too high, the vibration is easy, even the convergence cannot be realized, all parameters are not proper when the same learning rate is used, and the learning rate of the design is 0.01.

Adaptive matrix optimizer (Adam)

The Adam optimizer is currently the most widely used optimizer, which records the sum of squared gradients over time.

Updating a first-order matrix:

β ₁ *m _t-1 +(1-β ₁ )*g _t ＝m _t

updating a second-order matrix:

correcting the first-order matrix deviation:

correcting the second order matrix deviation:

updating the weight value:

wherein m is _t Represents a gradient, m _t-1 Represents the gradient at the previous time, g _t Being the gradient at the current time, beta ₁ 、β ₂ The representative exponential decay rates were 0.9 and 0.999, respectively.

The Adam optimizer just accumulates the gradient square value of a past period of time, the step length is not required to be set at all, in order to achieve the purpose, a class momentum using strategy is adopted, and the learning rate of the design is 0.001.

Performance testing

The method comprises the steps of utilizing a keras as a deep learning frame, taking python version 3.7, taking an EffNet network and a convolutional neural network as training network models, taking a BP neural network as a weight value adjusting algorithm, and generating image data by a Unity simulator.

Of 24108 pictures, 20% of the images were selected as test data, i.e., 4821 pictures were selected as test data, and the size of the input picture was 128 × 3.

The quality of the neural network structure can be tested in two ways: 1) simulator test 2) is judged by a loss function.

Through testing in the simulator, it can be seen that when the model trained by the EffNet network and the traditional convolution neural network model respectively carry out automatic driving tests on the vehicle, the EffNet network has the characteristics of higher running speed, smoother driving effect and more sensitive response speed of a curve.

And judging through a loss function, wherein fig. 8 is an EffNet network model loss function graph, and fig. 9 is a common convolutional neural network loss function graph.

As can be seen from the loss function graph, the EffNet network iterates 25 times to reduce the loss value to 0.1165, the test loss value to 0.0753, and the end-to-end convolutional neural network iterates 50 times to reduce the loss value to 0.1031, and the test loss value to 0.0615. EffNet increases the loss value by 0.0134 and the test loss value by 0.0138, as compared to a convolutional neural network.

Comparing the network models, the quantity of the parameters generated by the convolution neural network of the end-to-end is 50563, the size of the network is 453kb, the quantity of the parameters generated by the EffNet network is 12777, the size of the network is 335kb, and the EffNet is compared with the convolution neural network, so that the quantity of the parameters is reduced by 74.73%, and the size of the network is reduced by 26%.

In conclusion, the EffNet network improves the calculation efficiency, improves the operation speed of the model, has strong generalization capability and robustness, has the accuracy approximately equal to that of the end-to-end convolution neural network, has better training effect, reduces the calculation cost, and can efficiently operate on embedded mobile hardware.

Claims

1. An automatic driving direction prediction method based on a lightweight neural network is characterized by comprising the following steps:

step 1: training a neural network model; the method specifically comprises the following steps of,

101: acquiring image data; acquiring image data through a Unity simulator, wherein the Unity simulator is provided with a left camera, a middle camera, a right camera and a left camera, a middle camera and a right camera which are used for capturing pictures, pressing a manual mode to acquire a training data set, respectively acquiring angle, accelerator, speed and brake data of a driving automobile at the current moment, generating 24108 pictures, wherein the pixel of each picture is 320 × 160, storing the pictures under an IMG folder, and generating csv files for storage;

102: preprocessing an image;

and 2, step: testing the neural network model; the method specifically comprises the following steps of,

202: the EffNet network predicts the rotation direction and angle of the steering wheel of the automobile according to the current picture, and transmits the predicted angle to the Unity simulator, so that the Unity simulator controls the automobile to run according to the transmitted angle, the automobile continues to move forwards, and the front cambered surface is transmitted to the EffNet network in real time, and the operation is repeated;

the image preprocessing in step 102 comprises the following steps:

(1) image cutting;

(2) adjusting the brightness of the image;

(3) adjusting an image angle;

in the step (2), firstly, converting the image from an RGB color space into an HSV color space, wherein V represents brightness, and HS represents chroma and saturation; then, keeping the value of HS unchanged, multiplying the value of V by a coefficient alpha, wherein the value range of the coefficient alpha is [0.1, 1 ]; finally, converting the HSV image into an RGB image;

in the step (3), horizontally turning the image;

in the steps 1 and 2, in the EffNet network, a Leaky ReLU activation function is adopted, an adaptive matrix estimation optimizer is adopted, and the epoch is 25 times; the batch image quantity batch _ size is set to 32 and the penalty function is a mean square error penalty function.

2. The automatic driving direction prediction method based on the lightweight neural network as claimed in claim 1, wherein in step 106, the weight is changed by using a BP back propagation algorithm, the weight between layers is adjusted by using a δ learning algorithm;

wherein, delta is the learning efficiency, and delta belongs to [0, 1 ];

the weight of the network at the moment k +1 is:

w _jo (k+1)＝w _jo (k)+Δw _jo (2)

The weight of the network at the moment k +1 is

w _ij (k+1)＝w _ij (k)+Δw _ij (5)

In order to avoid oscillation and slow convergence speed in the learning process of the weight, that is, adding the momentum factor α so that the momentum factor a belongs to [0, 1 ]:

w _jo (k+1)＝w _jo (k)+Δw _jo +α*(w _jo (k)-w _jo *(k-1) (6)

w _ij (k+1)＝w _ij (k)+Δw _ij +a*(w _ij (k)-w _ij *(k-1)) (7)。