Disclosure of Invention
In order to overcome the defects of the existing road adhesion coefficient estimation method, the invention provides a local road adhesion coefficient estimation method based on image segmentation.
The technical steps of the invention are as follows:
step 1: the offline pre-training image segmentation network specifically comprises: a. the method comprises the steps of collecting road surface images under different weather conditions by using CARLA software, b, carrying out local labeling on the collected road surface images under different weather conditions to form a data set of local road surface adhesion coefficient estimation, c, building a deep learning algorithm network model of image segmentation by using a deep learning means, and d, carrying out end-to-end training on a deep learning algorithm network framework of image segmentation by using the data set of local road surface adhesion coefficient estimation.
Step 2: the method comprises the following steps of acquiring a real-time road surface image, and estimating a local adhesion coefficient of the road surface in real time, wherein the method specifically comprises the following steps: a. acquiring real-time road surface images by using a vehicle-mounted camera, classifying the real-time acquired images by using a pre-trained image segmentation network and positioning different types to form a real-time road condition map, and c, estimating local road surface adhesion coefficients of the real-time road condition map according to the road surface type.
Further, in the step 1a, road surface images of different weather conditions are collected, wherein the collection tool is CARLA simulation software, and the collected images include a local water accumulation image of the road surface, a local snow accumulation image of the road surface, a local icing image of the road surface, a normal asphalt road surface and the like.
Further, the deep learning algorithm network model for image segmentation comprises a basic network structure built by utilizing a residual error structure, a highly-driven attention module is added on the basic network, and the semantic segmentation neural network model is named as H-ResNet, wherein H represents the highly-driven attention module, and ResNet represents the basic network structure built by the residual error structure.
Further, the deep learning algorithm network model is a semantic segmentation algorithm network built by using a TensorFlow or Keras or Caffe2 or PyTorch or MXNet deep learning framework.
Further, the training method is a back propagation method of Batch Gradient Descent (Batch Gradient Descent) or random Gradient Descent (Stochastic Gradient Descent) or small-Batch Gradient Descent (Mini-Batch Gradient Descent) by using a single GPU or multiple GPUs.
Further, the vehicle-mounted camera in step 2a is a web camera or a USB camera.
Further, in the step 2b, the obtained images are classified and positioned into different categories, wherein the classification and the positioning of the different categories refer to that the dry asphalt is used as a background road surface, and water accumulation areas, snow accumulation areas, icing areas and the like or random combinations of any of the water accumulation areas, the snow accumulation areas, the icing areas and the like are distinguished and the distribution of the water accumulation areas, the snow accumulation areas, the icing areas and the like on the asphalt road surface is obtained.
The invention has the beneficial effects that:
1. the method can predict the pavement type of the pavement to be contacted, can obtain the distribution conditions of different local pavements on the same asphalt pavement, makes up the defects of a dynamic method and a visual method by means of deep learning, and provides a prerequisite for path planning and decision of intelligent vehicles and intelligent machines.
2. And an offline pre-training image segmentation network is adopted, so that the real-time performance is good, and the safety of the system is improved to a great extent.
3. The CARLA software is used for acquiring a data set required by pre-training, so that the time cost and the economic cost for acquiring image data are greatly reduced.
Detailed Description
The invention will be further explained with reference to the drawings. It should be understood that the specific examples described herein are intended to be illustrative only and are not intended to be limiting.
The general technical process of the local road surface adhesion coefficient estimation method based on image segmentation is shown in the attached figure 1, and comprises the following steps:
step 1: the offline pre-training image segmentation network specifically comprises: a. the method comprises the steps of collecting road surface images under different weather conditions by using CARLA software, b, carrying out local labeling on the collected road surface images under different weather conditions to form a data set of local road surface adhesion coefficient estimation, c, building a deep learning algorithm network framework of image segmentation by using a deep learning means, d, carrying out end-to-end training on the deep learning algorithm network framework of image segmentation by using the data set of local road surface adhesion coefficient estimation, and e, transplanting a trained network model into a vehicle machine of an intelligent vehicle or an intelligent machine through ROS software.
Step 2: the method comprises the following steps of acquiring a real-time road surface image, and estimating a local adhesion coefficient of the road surface in real time, wherein the method specifically comprises the following steps: a. acquiring real-time road surface images by using a vehicle-mounted camera, classifying the real-time acquired images by using a pre-trained image segmentation network and positioning different types to form a real-time road condition map, and c, estimating local road surface adhesion coefficients of the real-time road condition map according to the road surface type.
Further, in the step 1a, road surface images of different weather conditions are collected, wherein the collection tool is CARLA simulation software, and the collected images include a local water accumulation image of the road surface, a local snow accumulation image of the road surface, a local icing image of the road surface, a normal asphalt road surface and the like.
The specific implementation procedure for steps 1 and 2 above is as follows:
the weather conditions of the scene are set in CARLA software, the collected image vehicles are driven on asphalt pavements with different weather conditions by a first person, and the pavement images are collected, so that the pavement types of the single-frame images are guaranteed to contain two or more than two of the pavement types in the attached list. At this stage, the data including the scene of the road, the illumination, the weather condition and the road structure are ensured to be as diverse as possible. The scene of the road includes but is not limited to a high-speed asphalt pavement, an urban downtown asphalt pavement and an urban suburban asphalt pavement; weather conditions include, but are not limited to, sunny weather, rainy weather, snowy weather, etc., and lighting conditions include, but are not limited to, morning, noon, dusk, night, etc.
And (3) local labeling of the image, namely performing local labeling of the road surface type and the corresponding road surface adhesion coefficient on the road surface image according to the road surface type and the corresponding road surface adhesion coefficient in the attached table 1 to form a data set required by pre-training.
TABLE 1
The semantic segmentation neural network algorithm model comprises a basic network structure constructed by using a residual error structure, a highly-driven attention module is added on the basic network, the semantic segmentation neural network is named as H-ResNet, wherein H represents the highly-driven attention module, and ResNet represents the basic network structure constructed by the residual error structure. The algorithm flow comprises a data import module, a data preprocessing module, a neural network forward propagation module, an activation function, a loss function, a backward propagation module and an optimization module.
As shown in fig. 3, the residual structure unit mainly includes: quick connection and identity mapping. As shown in fig. 3, X represents a characteristic input, mapped to an output by the shortcut connection identity on the right. weight layer represents the convolution weight layer, relu represents the activation function, and F (X) represents the residual error of the feature representation learned by X passing through the convolution weight layer. Let h (X) denote the feature that input X finally learns, it is now desirable that it can learn the residual, so that the feature that actually needs to be learned becomes f (X) ═ h (X) -X. This is so because residual learning is easier than direct learning of the original features. When the residual f (x) is 0, the convolution weight layer is only mapped identically, so that at least the performance of the network is not degraded, and actually the residual f (x) is not 0, which enables the convolution weight layer to learn new features based on the input features.
In this example, the structure of the residual error is adopted to prevent the problem of gradient disappearance in pre-training, so as to train the network better, the convolution operation with different convolution kernel sizes is carried out for a plurality of times in the structure of the residual error to better extract image features, and the convergence rate of neural network training can be increased by adopting the ReLU nonlinear activation function.
In this example, the convolution operation is formulated as follows, where w (x, y) represents a convolution kernel of size m × n, f (x, y) is an image, and · represents the convolution operation.
Wherein a represents the sum of pixels in the image width direction
b represents the sum of pixels in the image length direction
s represents the pixel position in the image width direction
t represents the pixel position in the image width direction
In this example, the ReLU activation function formula is as follows, and is a non-linear function, where the function is 0 when x, the abscissa, is equal to or less than 0, and the function value is equal to x when x is greater than 0.
In this embodiment, a highly-driven attention module is added to the basic network to improve the recognition accuracy and ensure the real-time performance of recognition.
The high-driven attention module is provided by analyzing the difference of pixels occupied by different scene types in different height levels of a single-frame image in an actual urban scene image. As shown in fig. 2, a single frame image is divided into three levels of height, i.e., upper, middle and lower levels, where the image-dominant pixel in the upper level is sky, the image-dominant pixel in the middle level is a vehicle, a pedestrian, a building, etc., and the image-dominant pixel in the lower level is a road surface, i.e., an area of interest. The attention module driven by the height can directly acquire the interested region in the image, so that the recognition accuracy is improved, and the recognition speed is accelerated.
The height-driven attention module includes operations of width pooling, down-sampling, calculating a height-driven attention feature map, inserting feature position coding, and the like. Wherein, the width pooling operation is to obtain a characteristic diagram in the image width direction, and the pooling mode is average pooling; the feature maps obtained through the width pooling operation are not all necessary, the down-sampling aims to remove unnecessary feature maps, the calculation of the down-sampled feature maps utilizes convolution operation, such as formula 1, to obtain the adjacent position relation of the feature maps, the final inserted feature position coding operation is to obtain the prior information of the vertical position of a specific object, and the position coding is generated through sine and cosine functions of different frequencies and then is added with the feature vectors of corresponding positions element by element.
In this embodiment, the overall structure of H-ResNet is shown in FIG. 4, where each ResNet stage represents a residual structure and H represents a highly driven attention module. The identification result is more accurate as the number of ResNet stages is larger, however, as the number of ResNet stages is increased, the network calculation amount is increased, and the real-time performance is reduced. The number of high-drive attention modules increases with the number of ResNet stages, in this example 4 ResNet stages, 3 high-drive attention modules are used. Each height driving attention module is inserted between the two residual error modules, the prior information of the vertical position is obtained once after the residual error modules pass each time, and the state information of the road surface can be more accurately obtained after the prior information of the vertical position is obtained for multiple times.
In this example, the deep learning framework is TensorFlow or Keras or Caffe2 or PyTorch or MXNet.
In this example, the hardware configuration of the experimental platform used for training the semantic segmentation model is a GPU of GeForce GTX 1080Ti and a CPU of i7-9700k with 64GB memory. In terms of software configuration, the experimental platform is based on Ubuntu18.04 of a 64-bit operating system. A network model is constructed by adopting a current mainstream deep learning frame pytorech and python languages, and high-performance parallel computing is carried out by using a parallel computing architecture CUDA and a GPU acceleration library CUDNN.
In this example, the training uses Focal local, also called focus Loss, as shown in equation 7, as a function of the training Loss. Focal local reduces the weight of samples which can be well classified by modifying a standard cross entropy Loss function, increases the weight of samples which are difficult to classify, and enables a model to quickly pay attention to difficult samples, namely relatively few samples, in the training process so as to solve the problem of class sample imbalance.
Where N represents the number of samples used for network training, i represents the sample index, y
iRepresenting the label sample corresponding to each training sample, and alpha represents the weight parameter at [0,1]The middle value, gamma, is also a hyperparameter, p
iIn [0,1 ]]A median value, corresponding to y
iAs a general theory of prediction of +1,
in this example, the low-level and high-level semantic information of the image is learned most quickly in order to ensure that the model is stable. The learning rate in model training is selected as the exponentially decaying learning rate, and the formula is shown in fig. 8.
decayed_lr=init_lr×decay_rate(global_step/decay_steps) (8)
In the formula: init _ lr learning rate of initial setting
decay _ rate-attenuation coefficient
global _ step-iteration round number
decay step decay Rate
In this example, the number of images input during each iterative training is set to 8, and data enhancement is performed on the input images by means of random scale transformation, random angle rotation, image inversion and the like. The initial learning rate init _ lr is set to 0.001, the attenuation coefficient decay _ rete is set to 0.95, the iteration round number global _ step is 5400, and the attenuation speed decay _ steps is set to 50.
And inputting the collected data set into the constructed semantic segmentation neural network for end-to-end training, and integrating the trained model into a vehicle machine of an intelligent vehicle or other intelligent machines through ROS software.
And acquiring a real-time image by using a vehicle-mounted network camera or a USB camera, inputting the real-time image into a semantic segmentation model integrated in a vehicle machine, and distinguishing the categories of different areas on the road surface in real time and positioning the distribution of the categories so as to obtain a road surface condition distribution map.
In this example, the camera is installed in car windshield department, avoids receiving environmental disturbance, influences the quality of gathering the image.
The above-listed series of detailed descriptions are merely specific illustrations of possible embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent means or modifications that do not depart from the technical spirit of the present invention are intended to be included within the scope of the present invention.