CN113486568A

CN113486568A - Vehicle control dynamic simulation learning algorithm based on surround vision

Info

Publication number: CN113486568A
Application number: CN202110369507.4A
Authority: CN
Inventors: 王燕清; 石朝侠
Original assignee: Nanjing Xiaozhuang University
Current assignee: Nanjing Xiaozhuang University
Priority date: 2021-04-06
Filing date: 2021-04-06
Publication date: 2021-10-08

Abstract

Aiming at the problem that the simulation learning model under the current condition in the automatic driving field has poor performance in a dynamic environment, the vehicle control dynamic simulation learning algorithm based on the surround vision is provided. The model firstly extracts image characteristics of continuous 4 frames of forward images by using a residual error network, and then obtains a fusion characteristic vector by the image characteristics through an LSTM network. And combining the fused feature vector with the side image features extracted by the residual error network to obtain the dynamic environment feature vector. And (3) aiming at different navigation conditions, predicting the vehicle speed and the steering wheel angle by using different decision networks, and finally realizing the longitudinal control of the vehicle by using a proportional-integral control method. The DSCIL model successfully solves the problem of error mapping of low speed and low acceleration, completely eliminates the condition of low acceleration caused by low speed, and extracts a dynamic obstacle state by using multi-frame forward visual information, thereby improving the performance of the model in a dynamic driving environment.

Description

Vehicle control dynamic simulation learning algorithm based on surround vision

Technical Field

The invention relates to the technical field of automatic driving, in particular to a vehicle control dynamic simulation learning algorithm based on surrounding vision.

Background

In the process of human driving, the traffic condition in front is determined mainly by means of visual information, the motion state of an automobile is determined by means of a speedometer, and the driving behavior of a human is simulated by means of a neural network model while the driving record of the human and sensor information are collected for supervised learning. In 2005, Lecun et al constructed an end-to-end model DAVE with 6 layers of convolutional neural networks, and conducted training of the neural networks in a supervised learning manner, and research showed that the model has good robustness in a field environment. In 2016, NVIDIA trains a convolutional neural network model to predict the angle of a steering wheel by collecting real vehicle driving data, the model can obtain a steering angle according to an image transmitted by a front camera, and the vehicle can run under various road conditions, and the feasibility of an end-to-end control method is proved by the result, which is named as DAVE-2. An end-to-end automatic driving model CIL is proposed by Dosovitskiy and the like in 2017, and experimental results show that the network CIL branching according to navigation information can effectively utilize the navigation information. Codevila et al in 2019 propose that the CILRS model achieves good effect of environmental perception using ResNet34 and the problem of abnormal parking sometimes in the CIL model is eliminated to some extent by speed prediction.

The above models are all input by taking the current picture as a network, and only the road track and the position information of the obstacle at the current moment can be obtained by depending on the current picture. In fact, during driving, human beings can judge the movement trend and the movement speed of the dynamic obstacle to decide the driving strategy based on visual information of a past period, which is indispensable for driving in a real scene. However, with the above model, it is difficult to obtain the motion state of the dynamic obstacle, which is not imaginable in real driving. As for the end-to-end automatic driving model that makes driving decisions by using single frame image information, the model is referred to as a Static emulation Learning (SIL) model, and corresponds to a Dynamic emulation Learning (DIL) model, which is an end-to-end automatic driving model that makes driving decisions by using multi-frame visual information as network input, as shown in fig. 1. This is the same logic as the human driver's perception of the environment, determining the driving strategy by determining the trajectory and speed of the dynamic obstacle from historical visual information.

Disclosure of Invention

1. Dynamic surround vision network model architecture

In the simulation automatic driving model based on the visual information, most models only utilize single-frame visual information, and in the actual driving process, a human driver needs historical visual information to avoid dynamic obstacles, so that safe driving is completed. The invention provides a Dynamic Surround-vision simulation Learning network to improve driving performance, the model is named as DSCIL (Dynamic Surround-view simulation Learning), and the network structure is shown in FIG. 2. 4 frames of forward pictures are subjected to ResNet34 network to obtain 4 512-dimensional feature vectors, the 512-dimensional feature vectors are obtained through single-layer LSTM network, the 64-dimensional feature vectors are respectively obtained from the current frame images on the left side and the right side through the same ResNet18 network, and the 512-dimensional feature vectors and the two 64-dimensional feature vectors are spliced to form 640-dimensional joint feature vectors. And predicting the vehicle speed and the steering wheel angle by the combined feature vector through 3 layers of full connection, wherein the number of hidden layer nodes of the LSTM network is 128. The branch network shown in the figure is selectively activated according to navigation information c, only one branch network corresponding to the navigation information is activated to predict speed and steering wheel angle each time, and the navigation information c has four states of road following, straight going, left turning and right turning. No image model CILRS in the network takes speed as input to predict accelerator and brake, so that mapping of low speed and low acceleration which is logically wrong and exists in data in network learning is avoided. Learning this mapping in networks such as CILRS and CIL may result in vehicles that may stop when they do not reach the target and the road ahead is accessible.

The DSCIL network is a condition simulation learning framework, a neural network is trained by using driving records and sensor data, different navigation instructions are realized by using a branch network, and the visual perception network is ResNet. The network input comprises a plurality of frames of forward images instead of only the current frame of forward images, and the network input does not comprise the current vehicle speed any more; the network output is changed into the driving speed and the steering wheel angle, and the accelerator size and the brake size are not directly output any more. To accomplish longitudinal control of the vehicle, proportional control is used to control the vehicle speed, and the throttle and steering wheel are adjusted depending on the error in the current vehicle speed and the desired vehicle speed output by the network.

The DSCIL model can be divided into a visual perception network, a branch decision network and a vehicle longitudinal control part.

2. Visual perception network

In a static surround vision mimic learning network, experiments confirm that visual information on the side of the car is beneficial for driving decisions, so the present invention will employ a feature extraction network ResNet18 for the side images. The reason why the LSTM network is not used for fusing the features for the lateral visual information is that the lateral visual information has a limited effect on driving activities, and the lateral visual information is mainly used for performing static obstacle avoidance tasks such as lane keeping. For the forward image, in view of the scale of the memory and the model of the display card, the DSCIL model adopts a 34-layer ResNet network to extract the forward image features, and the feature dimension is still 512 dimensions. The network structure of ResNet34 is shown in FIG. 3. The numbers in the left square boxes of the figure indicate the number of modules on the right side of the figure, and the structures of conv1, conv2_ x, conv3_ x, conv4_ x and conv5_ x in the figure are shown in brackets on the right side. If the input of the network is an array of 224 × 224 × 3, the size is 112 × 112 × 64 after conv1, the size is 56 × 56 × 64 after maximum pooling and 3 conv2, the size is 28 × 28 × 128 after 4 conv3, the size is 14 × 14 × 256 after 6 conv4, the size is 7 × 7 × 512 after 3 conv5, and finally the network can be fully connected to a 512-dimensional vector after mean pooling, and the final 512-dimensional vector is the image feature extracted by ResNet 34.

The decision principle of a human driver is adopted for the motion state of the dynamic obstacle in front of the vehicle, namely the motion state of the dynamic obstacle is sensed according to historical multi-frame visual information. The DSCIL model performs environment perception by using continuous 4 frames of forward images using visual information within 1 second, wherein the four frames of images are forward images before 0.9 second, forward images before 0.6 second, forward images before 0.3 second and forward images at the current moment respectively. The driving speed and the track of the automobile in the image can be judged through four frames of images, which cannot be realized by using a model of a single frame of image.

Four 512-dimensional feature vectors are obtained after the continuous four frames of forward images pass through a residual error network, 4 feature vectors are fused through a single-layer LSTM network with 128 hidden-layer nodes, and the dimension of the obtained fusion features is also 512-dimensional; the LSTM network has two improvements over the recurrent neural network, respectively a new internal state c_tDoor-joining mechanism, c_tHistory information up to time t is recorded, and the door mechanism controls the rate of information transfer. The structure of the LSTM network is shown in FIG. 5, where h is_t、 c_t、

x_tRespectively a state at the time t, an internal state, a candidate state and a network input; i.e. i_t、o_t、f_tThe input gate, the output gate and the forgetting gate at the moment t respectively allow information to pass through with a certain proportion within the range of (0, 1). Where σ represents the logistic function and the range is (0, 1). The state h is updated at every moment, which is a short-term memory, and the network parameters can be regarded as long-term memory, and the updating period is much slower than that of the state h. The internal state c may hold critical information at a certain moment and for a period of time longer than the short-term memory and shorter than the long-term memory, and is therefore called long short-term memory.

The updating process of the LSTM network is as follows:

(1) first of all, the external state h of the last moment is used_t-1And input x of the current time_tCalculating the input gate, output gate, forgetting gate and candidate states

The calculation formula is shown in formulas 1-4, wherein W and U are parameter matrixes, and b is a bias parameter;

(2) use forgetting door f_tAnd an input gate i_tTo update the memory cell c_tThe update formula is shown as formula 5;

(3) using output gates o_tPassing the internal state to the external state h_tThe formula is shown in formula 6.

i_t＝σ(W_ix_t+U_ih_t-1+b_i) (1)

o_t＝σ(W_ox_t+U_oh_t-1+b_o) (2)

f_t＝σ(W_fx_t+U_fh_t-1+b_f) (3)

h_t＝o_t tanh(c_t) (6)

512-dimensional fusion characteristics obtained by four frames of forward images are spliced with two 64-dimensional image characteristics obtained by two images at two sides to obtain the description of the DSCIL model on the dynamic environment, namely 640-dimensional dynamic environment characteristic vectors. The dynamic environment feature vector is the input of the branch decision network as a result of the model's visual perception.

3. Branch decision network

In the driving process of the vehicle, four navigation instructions are provided, namely road following, left turning, right turning and straight going. The road following corresponding vehicle can only run along the road, namely, is not at the intersection; when the vehicle is at the crossroad, the vehicle can have three choices of left turning, right turning and straight going. As shown in fig. 2, for different navigation commands, the DSCIL models respectively correspond to different branch decision networks, that is, different navigation commands correspond to different branch networks. The branch decision network used by the invention is also formed by a plurality of branch networks activated by the navigation instruction, only one corresponding branch network is activated by the navigation instruction each time, and each branch network has the same network structure and shares 640-dimensional dynamic environment characteristic vectors.

The structure of a single branch network is shown in fig. 5, wherein the network input is a 640-dimensional dynamic environment feature vector, and the 2-dimensional vector is output finally through a full-connection network with 256 nodes at two layers. The dynamic environment feature vector includes 512-dimensional forward image fusion features, 64-dimensional left image features, and 64-dimensional right image features. The 2-dimensional vector includes the predicted vehicle speed and the steering wheel angle. The steering wheel has a range of values (-1, 1) and the speed is a real number greater than 0. Like the single-branch decision prediction network described above, the last layer of fully-connected network uses dropout with a probability of 0.5 to avoid overfitting.

4. Mathematical description and loss function

Suppose that

Representing the forward image, the right image and the left image at the time t respectively, and the forward image fusion characteristic at the time

Can be obtained from formula 7, wherein R₃₄Is a function corresponding to a 34-layer residual network, and L is a function corresponding to an LSTM network.

Feature vectors of right and left images at time t

And

can be obtained from the formulae 8 and 9, respectively, in which R₁₈Is a function corresponding to the 18-layer residual network. Dynamic environment characteristic J at time t^tIs the forward image fusion feature at this time

Features of right image

And left image features

As shown in equation 10.

As shown in equation 11, assume the vehicle speed m of the network at time t +1_t+1And steering wheel angle s at time t_tIs a dynamic environmental characteristic J^tAnd a function of the navigation instruction c, and because different navigation instructions correspond to different decision branch networks, the decision branch network corresponding to the navigation instruction c is marked as A^cThen, formula 12 can be obtained. Combining

equations

10, 11 and 12 results in equation 13, the final mathematical description of the DSCIL network.

(m_t+1，s_t)＝F(J^t，c) (11)

(m_t+1，s_r)＝A^c(J^t) (12)

Similarly, the loss function of the DSCIL model continues to use the L1 loss as shown in equation 14 because Codevilla et al found that the L1 loss is more suitable for autonomous driving missions than the L2 loss. Wherein

Is the steering wheel angle at time t_tThe predicted value of (a) is determined,

is the vehicle speed m at the time t +1_t+1The predicted value of (2). Therefore, the first term of the loss function loss is a steering wheel angle error term, the second term is a vehicle speed error term, and the weights of the two terms of error are equal and are both 0.5.

The output of the branch decision network only solves the problem of lateral control of the vehicle, and for the problem of longitudinal control of the vehicle, the invention uses PID (proportional-integral-derivative) control to solve according to the current vehicle speed and the expected vehicle speed at the next moment.

5. Vehicle longitudinal control speed

The network solves the problem of the transverse control of the vehicle by outputting the angle of the steering wheel, and in order to solve the problem of the longitudinal control of the vehicle, the invention introduces proportional-integral (PI) control to control the speed of the vehicle. The principle diagram of proportional-integral control is shown in FIG. 6, where r_tIs the given value of the system at time t, y_tIs the output value of the system at time t, deviation e_tIs the difference between the given value and the output value, as shown in equation 15. As shown in formula 16, u_tIs the controlled variable of the controlled object, represented by the deviation e_tProportional and integral terms of, wherein k_pIs the proportionality coefficient, k_iIs the integral coefficient. The proportional term is adjusted based on the deviation, the proportional term increases when the deviation increases, the proportional term decreases when the deviation decreases, and the non-backlash control cannot be realized only by the proportional term. The integral term can effectively eliminate the static error by integrating the system deviation.

e_t＝r_t-y_t (15)

According to the experimental test, the proportional coefficient of the control speed is set to 0.25, the integral coefficient is set to 0.2, and the control quantity u shown in the formula 17 is obtained_tWherein m is_tIs the vehicle speed at the time t,

is the expected value of the speed at time t. Longitudinal control quantity u of vehicle at time t_tThrottle T at the moment of sum_tWhen u is as shown in formula 18_tWhen the throttle is less than or equal to 0, the throttle is 0; when u is_tWhen the throttle is larger than 0, the throttle size is equal to u_t1.3 times and a minimum value of 0.75. time tLongitudinal control quantity u of vehicle_tAnd brake b at time t_tWhen u is as shown in formula 19_tWhen the brake force is larger than 0, the brake force is 0; when u is_tWhen the brake force is less than or equal to 0, the brake force is equal to u_tMinus 0.35 times and 1.

FIG. 7 shows the longitudinal vehicle control amount u at time t_tAnd throttle T_tAnd a brake b_tIn the relationship of (1), the abscissa axis of the coordinate axis represents the control quantity u_tThe ordinate axis of the coordinate axis is accelerator/brake, a solid line in the graph represents the change relation of the accelerator size along with the control quantity, and a dotted line represents the change relation of the brake size along with the control quantity. Longitudinal control quantity u_tWhen the timing is positive, the brake size is 0, the accelerator size is within a certain range and u_tIs in a direct proportional relationship and is limited by a threshold of 0.75 after exceeding the range; when the longitudinal control quantity u_tWhen the throttle is negative, the throttle is 0, and the brake is within a certain range_tIs proportional and is limited by a threshold of 1.0 after exceeding the range.

Advantageous effects

The dynamic simulation learning provided by the invention is a supervised neural network which predicts the speed of dynamic obstacles in a visual field by using multi-frame visual information so as to realize safe driving in a dynamic environment. A dynamic mimic learning model DSCIL based on surround vision is proposed, for 3 components of the DSCIL neural network: a visual perception network, a branch decision network and vehicle speed control. The DSCIL model is described from the mathematical angle, the structure and the action of each module are analyzed, the logical relation among the modules is clarified, and the composition of the loss function of the model is provided.

To complete the training and validation of the model, a data set was collected using the cara platform, the data set including 12 hours of driving records and image information. And then, network training and verification are carried out by using the pytorech deep learning framework, so that a trained model is obtained. To test the driving level of the model, cara benchmark containing 1200 driving tasks and NoCrash benchmark containing 900 driving tasks were tested, and a number of experimental results demonstrated that the dynamic imitation learning model based on surround vision DSCIL has good driving ability under different driving conditions. Especially in dynamic environments, there are significant advantages over many classical end-to-end auto-driving models. The DSCIL model successfully solves the problem of error mapping of low speed and low acceleration, and completely eliminates the condition of low acceleration caused by low speed. More importantly, the DSCIL model utilizes multi-frame forward visual information to extract dynamic obstacle states, so that the performance of the model in a dynamic driving environment is improved.

Drawings

FIG. 1 is a schematic diagram of two types of mock learning;

FIG. 2 is a dynamic surround vision condition imitation learning model (DSCIL model architecture);

FIG. 3 is a ResNet34 layer network structure;

FIG. 4 is a diagram of an LSTM network architecture;

FIG. 5 is a single branch decision network;

FIG. 6 is a proportional-integral control schematic;

FIG. 7 is a relationship diagram of throttle/brake and control quantities of a vehicle;

FIG. 8 is a vehicle speed profile;

FIG. 9 is a multi-model CARLA benchmark test result;

FIG. 10 shows the multi-model NoCrash benchmark test results.

Detailed Description

Because the model is an end-to-end model based on the surrounding vision, in order to enhance the generalization capability of the model and accelerate the convergence speed of the loss function, the data in the data set is preprocessed.

The unit of vehicle speed in the data set is m/s, and the vehicle speed distribution is shown in fig. 8, in which the horizontal axis of the coordinate is vehicle speed and the vertical axis is the number of samples. It can be seen that most of the sample vehicle speeds do not exceed 11m/s, so the vehicle speeds are all normalized by dividing by 12, so that the normalized vehicle speed distribution is in the range of 0 to 1.

In order to deal with the challenge of different lighting environments to the visual model, the conventional practice is to inject noise with a certain probability to the pictures in the data set, and the experiment of the invention refers to the data preprocessing work performed in the CIL model for the transformation of the pictures, including the changes of contrast, brightness and tone, and the addition of gaussian blur, gaussian noise, salt and pepper noise and region loss (a random set of rectangles in a mask image, each rectangle occupies about 1% of the image area).

The network training frame is Pytrch0.4.1, the adopted network optimization algorithm is still Adam, and the network parameter updating method is shown in formulas 20-22. Adam has an initial learning rate of 0.0003, initial values of m and v of 0, beta1 of 0.9, beta2 of 0.999, eps of 1 e-8.

m＝bgta1*m+(1-beta1)*dw (20)

The network trained BatchSize is set to 64, the model is saved every 10000 iterations, if the loss of 2 continuous models on the verification set is not reduced, the training is stopped, and the model with the minimum loss value is selected as the test model.

The random initialization of the network weight can remarkably reduce the problems of gradient disappearance and gradient explosion at the beginning of training, the experiment of the invention continues to use Xavier initialization for weight initialization, and the initialization method enables the network parameters to obey uniform distribution with the expected value of 0. The residual error network initialization in the model uses weights pre-trained on the ImageNet data set, so that the training time can be reduced, and the initialization randomness is reduced.

CARLA benchmark comprises 1200 driving tasks under 4 different driving conditions, a vehicle controlled in the driving process cannot follow a traffic light instruction, collision occurs in the driving process, an experimental result is not directly influenced, each driving task is limited in time according to a terminal distance, and failure is achieved if the terminal is not reached in a time limit. Fig. 9 shows the results of testing the DSCIL model and five other classical end-to-end driving models on the cara benchmark, where the numbers in the table indicate the percentage of tasks of the corresponding type to be completed, the larger the number the higher the driving level, and the bolded number the best result.

As can be seen from fig. 9, almost all models performed best in the New Weather environment, performed worst in the New Town environment, and performed better in the Training environment than the New Weather & Town environment in general. For different types of driving tasks, the task completion rate is in a descending trend along with the increase of task difficulty. The DSCIL model proposed by the present invention achieves the best performance in almost all environments, all types of driving tasks, for different end-to-end models. Particularly in all dynamic environments, the DSCIL achieves the best performance, which fully explains the advantages of the model in the dynamic environment, because the model has the ability of perceiving dynamic obstacles in theory, which is the biggest difference with other static end-to-end models.

It is noted that the test results for the other models, which were derived from the codevila et al work proposing the CILRS model, trained on a dataset, carra 100, using the same data collection method as our dataset, differing only in the sensor configuration. Because the tests in CARLA0.8.4 have some randomness, the results for each model in FIG. 9 are the best results selected after 5 tests.

In 2019, with the development of an end-to-end automatic driving algorithm, most models can obtain relatively good performance on CARLA benchmark. On the other hand, the influence of collision factors on safe driving in the driving process is not considered in the testing process of the CARLA benchmark, and the driving capability of the model in the real driving environment cannot be fully reflected. When a driving algorithm is tested in a real driving environment, human drivers intervene to avoid traffic accidents when dangerous conditions such as collision and the like possibly occur in an automobile. The frequency or proportion of human driver interventions during driving is therefore a key indicator in measuring the level of automatic driving. In order to meet the real environment as much as possible, Codevilla et al propose NoCrash benchmark, which is mainly different from CARLA benchmark in that if a relatively strong collision occurs during driving, it is determined that the driving task fails. Fig. 10 shows the results of the DSCIL model and four other classical end-to-end driving models tested on a NoCrash benchmark, and the same data set used in fig. 9 is cara 100. CARLA0.8.4, the mean and variance of the results of three experiments are given in FIG. 9, with the results of each experiment representing the percentage of the corresponding type of task completed, the larger the number representing the greater the autopilot ability, and the bolder the number representing the best result.

NoCrash benchmark and cara benchmark have similar four types of lighting and weather, respectively training environment, new weather environment, new town environment, new weather new town environment. The map of the training environment and the new weather environment is Town01 and the map of the new Town environment is Town 02.

The NoCrash benchmark comprises tasks of three different traffic conditions, namely Empty, Regular and density, wherein the Empty means that the map does not comprise dynamic obstacles such as other vehicles and pedestrians, the Regular means that the map comprises obstacles such as other vehicles and pedestrians with medium scale, and the density means that the map comprises a large number of obstacles such as other vehicles and pedestrians, so that traffic is very congested. The difficulty of three different types of tasks rises in sequence, and the evasion capability of the model to dynamic obstacles and the processing capability to sudden traffic conditions are examined. For each weather and lighting condition, for each type of driving task, NoCrash benchmark includes 25 tasks, each task consisting of a pair of start and end points, and thus NoCrash benchmark includes 900 different driving tasks in total. Each task fails during driving if a relatively strong collision occurs, and fails if the end point is not reached within a defined time, which depends on the path length. Therefore, the NoCrash benchmark can fully verify the safe driving capability of the model under different weather and illumination environments, which is not possessed by the cara benchmark.

As can be seen in fig. 10, all models on NoCrash benchmark have a severe downslide compared to cara benchmark, which also indicates that NoCrash benchmark is much more difficult than cara benchmark. For the driving tasks under three different traffic conditions, no dynamic obstacle exists in the Empty environment, the driving performance is highest, the vehicle is severely congested in the Dense environment, the driving performance is worst, and the Regular environment is between the Empty environment and the Dense environment. For different end-to-end models, the DSCIL model provided by the invention achieves the best performance in most driving tasks. Particularly in a dynamic environment, the performance of the DSCIL model is greatly improved compared with other models, and the reliability of the model in the dynamic environment is fully explained. By analyzing the model structure, the DSCIL is a dynamic model, and other models are static models and have no capability of predicting the speed of dynamic obstacles, so that the DSCIL is not wonderful to perform poorly in a dynamic environment. For the model of the simulated learning, since there are few dynamic obstacles in the data set, the model does not learn enough how to drive in an environment including a large number of dynamic obstacles, which is a common reason that all models in a sense environment perform poorly.

The first three types of tasks in CARLA benchmark and the Empty type of tasks in NoCrashbenchmark do not contain dynamic obstacles, and experiments prove that the DSCIL model provided by the invention can accurately predict the angle and the speed of a steering wheel in a static environment and has good driving capability. Under a static environment, the DSCIL model is similar to the SCIL model, the visual characteristics of the side face of the vehicle can be fused, and the perception and understanding capacity of the environment is improved. On the other hand, because the DSCIL model does not take speed as model input to predict the size of the accelerator and the brake, the problem of error mapping, namely the correlation between low speed and low acceleration existing in a data set, is completely avoided from the model structure. Therefore, the DSCIL model can completely avoid the mapping of low speed and low acceleration and completely eliminate the problem of parking caused by low speed in the driving process.

The dynamic navigation task in CARLA benchmark and the Regular and Dense type tasks in NoCrashbenchmark both contain dynamic obstacles, and experiments show that the DSCIL model provided by the invention can accurately predict the angle and the speed of the steering wheel in a dynamic environment and has good safe driving capability. Unlike other models, DSCIL can sense the speed and motion track of a dynamic obstacle through the characteristics of continuous multi-frame images, which cannot be realized by other static models. Sensing the speed of dynamic obstacles is a necessary condition for safe driving in a dynamic environment, which is also an advantage of DSCIL over other models.

It is to be noted that, in the present invention, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. Vehicle control dynamic simulation learning algorithm based on surround vision is characterized in that: the dynamic surrounding vision simulation learning network is provided to improve the driving performance, the provided model is named as DSCIL, and the DSCIL model can be divided into a vision perception network, a branch decision network and a vehicle longitudinal control part; the visual perception network obtains 4 512-dimensional feature vectors from 4 frames of forward pictures through a ResNet34 network, obtains 1 512-dimensional feature vector through a single-layer LSTM network, respectively obtains 64-dimensional feature vectors from the left and right current frame images through the same ResNet18 network, and splices the 512 and the two 64-dimensional feature vectors to form 640 joint feature vectors, and the joint feature vectors predict the vehicle speed and the steering wheel angle through 3 layers of full connection, wherein the number of hidden layer nodes of the LSTM network is 128; the branch network is selectively activated according to the navigation information, only one branch network corresponding to the navigation information is activated to predict the speed and the steering wheel angle each time, and the navigation information has four states of road following, straight going, left turning and right turning.

2. The surround vision based vehicle control dynamic emulation learning algorithm of claim 1, wherein: the DSCIL network is a framework of condition simulation learning, a neural network is trained by using driving records and sensor data, different navigation instructions are realized by using a branch network, and the visual perception network is ResNet; the network input comprises a plurality of frames of forward images instead of only the current frame of forward images, and the network input does not comprise the current vehicle speed any more; the network output is the driving speed and the steering wheel angle, and the accelerator size and the brake size are not directly output any more; for longitudinal control of the vehicle, the speed of the vehicle is controlled using proportional control, and the throttle and steering wheel are adjusted depending on the error between the current speed and the desired speed output by the network.

3. The surround vision based vehicle control dynamic emulation learning algorithm of claim 1, wherein: the DSCIL model adopts continuous 4 frames of forward images using visual information within 1 second for environment perception, the four frames of images are respectively a forward image before 0.9 second, a forward image before 0.6 second, a forward image before 0.3 second and a forward image at the current moment,

the driving speed and the track of the automobile in the image can be judged through the four frames of images.

4. The surround vision based vehicle control dynamic emulation learning algorithm of claim 1, wherein: the LSTM network has two improvements over the recurrent neural network, respectively a new internal state C_tDoor-mixing mechanism, C_tHistory information up to time t is recorded, and the door mechanism controls the rate of information transfer.