CN112329533A

CN112329533A - A method for estimating local pavement adhesion coefficient based on image segmentation

Info

Publication number: CN112329533A
Application number: CN202011067813.4A
Authority: CN
Inventors: 王海; 蔡柏湘; 蔡英凤; 李祎承; 陈龙; 陈小波; 刘擎超; 孙晓强
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2020-10-07
Filing date: 2020-10-07
Publication date: 2021-02-05
Anticipated expiration: 2040-10-07
Also published as: CN112329533B

Abstract

The invention discloses a local road adhesion coefficient estimation method based on image segmentation. Step 1: off-line pre-training image segmentation network, which specifically includes: a. using CARLA software to collect road images under different weather conditions, b. The road image of the condition is locally marked to form a data set for local road adhesion coefficient estimation. c. Use deep learning to build a deep learning algorithm network model for image segmentation. The deep learning algorithm network framework is trained end-to-end. Step 2: Acquiring real-time road images, and estimating the local adhesion coefficient of the road in real time, including: a. Using the vehicle camera to collect real-time road images, b. Using a pre-trained image segmentation network to classify the real-time acquired images and locate different categories , form a real-time road condition map, and c. estimate the local road adhesion coefficient on the real-time road condition map according to the type of road surface.

Description

Local pavement adhesion coefficient estimation method based on image segmentation

Technical Field

The invention belongs to the field of image segmentation, and particularly relates to a local pavement adhesion coefficient estimation method based on image segmentation.

Background

Better estimation of road adhesion coefficient has always been a very challenging problem. The road adhesion coefficient not only influences the dynamic property and the braking performance of the vehicle, but also influences the operation stability of the vehicle during running, accurately distinguishes the road adhesion coefficient in real time, and can greatly improve the safety and the comfort of the vehicle during running. With the continuous advancement of the industry towards intellectualization, the accurate estimation of the road adhesion coefficient can also greatly influence the path planning and decision of systems such as intelligent vehicles and robots. Therefore, the road surface adhesion coefficient is accurately estimated in high real-time, the driving safety of the vehicle can be greatly improved, and meanwhile, the planning and decision accuracy of the intelligent system can be improved.

At present, the estimation methods of the road adhesion coefficient mainly include three types, one type is traditional indirect estimation based on vehicle dynamic parameter identification, the second type is to acquire road surface data through sensors (sound, light, temperature sensors and the like) and estimate according to the relationship between the sensor data and the road adhesion coefficient, and the third type is to directly estimate the road adhesion coefficient based on the road surface of a vision sensor by means of deep learning.

Although the road adhesion coefficient estimation by adopting the dynamic modeling method is accurate and reliable, the model is complex, so that a plurality of vehicle dynamic parameters need to be acquired, and the instantaneity cannot be ensured; in addition, the method based on dynamic estimation requires the contact between the vehicle tire and the road surface for estimation, so that only the adhesion coefficient of the road surface in contact with the tire at present can be estimated, the road surface adhesion coefficient prediction cannot be carried out on the road surface which is about to pass through in the future, and the vehicle cannot be intervened and controlled in time. Although the vision-based method based on the deep learning means can make up for the defects of the dynamic method and has a certain advance predictability, the method only estimates the adhesion coefficient of the whole road surface and does not estimate the adhesion coefficient of the local road surface, such as local water accumulation, local snow accumulation, local icing and the like.

Disclosure of Invention

In order to overcome the defects of the existing road adhesion coefficient estimation method, the invention provides a local road adhesion coefficient estimation method based on image segmentation.

The technical steps of the invention are as follows:

step 1: the offline pre-training image segmentation network specifically comprises: a. the method comprises the steps of collecting road surface images under different weather conditions by using CARLA software, b, carrying out local labeling on the collected road surface images under different weather conditions to form a data set of local road surface adhesion coefficient estimation, c, building a deep learning algorithm network model of image segmentation by using a deep learning means, and d, carrying out end-to-end training on a deep learning algorithm network framework of image segmentation by using the data set of local road surface adhesion coefficient estimation.

Step 2: the method comprises the following steps of acquiring a real-time road surface image, and estimating a local adhesion coefficient of the road surface in real time, wherein the method specifically comprises the following steps: a. acquiring real-time road surface images by using a vehicle-mounted camera, classifying the real-time acquired images by using a pre-trained image segmentation network and positioning different types to form a real-time road condition map, and c, estimating local road surface adhesion coefficients of the real-time road condition map according to the road surface type.

Further, in the step 1a, road surface images of different weather conditions are collected, wherein the collection tool is CARLA simulation software, and the collected images include a local water accumulation image of the road surface, a local snow accumulation image of the road surface, a local icing image of the road surface, a normal asphalt road surface and the like.

Further, the deep learning algorithm network model for image segmentation comprises a basic network structure built by utilizing a residual error structure, a highly-driven attention module is added on the basic network, and the semantic segmentation neural network model is named as H-ResNet, wherein H represents the highly-driven attention module, and ResNet represents the basic network structure built by the residual error structure.

Further, the deep learning algorithm network model is a semantic segmentation algorithm network built by using a TensorFlow or Keras or Caffe2 or PyTorch or MXNet deep learning framework.

Further, the training method is a back propagation method of Batch Gradient Descent (Batch Gradient Descent) or random Gradient Descent (Stochastic Gradient Descent) or small-Batch Gradient Descent (Mini-Batch Gradient Descent) by using a single GPU or multiple GPUs.

Further, the vehicle-mounted camera in step 2a is a web camera or a USB camera.

Further, in the step 2b, the obtained images are classified and positioned into different categories, wherein the classification and the positioning of the different categories refer to that the dry asphalt is used as a background road surface, and water accumulation areas, snow accumulation areas, icing areas and the like or random combinations of any of the water accumulation areas, the snow accumulation areas, the icing areas and the like are distinguished and the distribution of the water accumulation areas, the snow accumulation areas, the icing areas and the like on the asphalt road surface is obtained.

The invention has the beneficial effects that:

1. the method can predict the pavement type of the pavement to be contacted, can obtain the distribution conditions of different local pavements on the same asphalt pavement, makes up the defects of a dynamic method and a visual method by means of deep learning, and provides a prerequisite for path planning and decision of intelligent vehicles and intelligent machines.

2. And an offline pre-training image segmentation network is adopted, so that the real-time performance is good, and the safety of the system is improved to a great extent.

3. The CARLA software is used for acquiring a data set required by pre-training, so that the time cost and the economic cost for acquiring image data are greatly reduced.

Drawings

FIG. 1 is a schematic overall flow diagram of the process of the present invention;

FIG. 2 is a schematic view of a high attention module;

FIG. 3 is a schematic diagram of a residual structure;

fig. 4 is a schematic diagram of the overall structure of H-ResNet.

Detailed Description

The invention will be further explained with reference to the drawings. It should be understood that the specific examples described herein are intended to be illustrative only and are not intended to be limiting.

The general technical process of the local road surface adhesion coefficient estimation method based on image segmentation is shown in the attached figure 1, and comprises the following steps:

step 1: the offline pre-training image segmentation network specifically comprises: a. the method comprises the steps of collecting road surface images under different weather conditions by using CARLA software, b, carrying out local labeling on the collected road surface images under different weather conditions to form a data set of local road surface adhesion coefficient estimation, c, building a deep learning algorithm network framework of image segmentation by using a deep learning means, d, carrying out end-to-end training on the deep learning algorithm network framework of image segmentation by using the data set of local road surface adhesion coefficient estimation, and e, transplanting a trained network model into a vehicle machine of an intelligent vehicle or an intelligent machine through ROS software.

The specific implementation procedure for

steps

1 and 2 above is as follows:

the weather conditions of the scene are set in CARLA software, the collected image vehicles are driven on asphalt pavements with different weather conditions by a first person, and the pavement images are collected, so that the pavement types of the single-frame images are guaranteed to contain two or more than two of the pavement types in the attached list. At this stage, the data including the scene of the road, the illumination, the weather condition and the road structure are ensured to be as diverse as possible. The scene of the road includes but is not limited to a high-speed asphalt pavement, an urban downtown asphalt pavement and an urban suburban asphalt pavement; weather conditions include, but are not limited to, sunny weather, rainy weather, snowy weather, etc., and lighting conditions include, but are not limited to, morning, noon, dusk, night, etc.

And (3) local labeling of the image, namely performing local labeling of the road surface type and the corresponding road surface adhesion coefficient on the road surface image according to the road surface type and the corresponding road surface adhesion coefficient in the attached table 1 to form a data set required by pre-training.

TABLE 1

The semantic segmentation neural network algorithm model comprises a basic network structure constructed by using a residual error structure, a highly-driven attention module is added on the basic network, the semantic segmentation neural network is named as H-ResNet, wherein H represents the highly-driven attention module, and ResNet represents the basic network structure constructed by the residual error structure. The algorithm flow comprises a data import module, a data preprocessing module, a neural network forward propagation module, an activation function, a loss function, a backward propagation module and an optimization module.

As shown in fig. 3, the residual structure unit mainly includes: quick connection and identity mapping. As shown in fig. 3, X represents a characteristic input, mapped to an output by the shortcut connection identity on the right. weight layer represents the convolution weight layer, relu represents the activation function, and F (X) represents the residual error of the feature representation learned by X passing through the convolution weight layer. Let h (X) denote the feature that input X finally learns, it is now desirable that it can learn the residual, so that the feature that actually needs to be learned becomes f (X) ═ h (X) -X. This is so because residual learning is easier than direct learning of the original features. When the residual f (x) is 0, the convolution weight layer is only mapped identically, so that at least the performance of the network is not degraded, and actually the residual f (x) is not 0, which enables the convolution weight layer to learn new features based on the input features.

In this example, the structure of the residual error is adopted to prevent the problem of gradient disappearance in pre-training, so as to train the network better, the convolution operation with different convolution kernel sizes is carried out for a plurality of times in the structure of the residual error to better extract image features, and the convergence rate of neural network training can be increased by adopting the ReLU nonlinear activation function.

In this example, the convolution operation is formulated as follows, where w (x, y) represents a convolution kernel of size m × n, f (x, y) is an image, and · represents the convolution operation.

Wherein a represents the sum of pixels in the image width direction

b represents the sum of pixels in the image length direction

s represents the pixel position in the image width direction

t represents the pixel position in the image width direction

In this example, the ReLU activation function formula is as follows, and is a non-linear function, where the function is 0 when x, the abscissa, is equal to or less than 0, and the function value is equal to x when x is greater than 0.

In this embodiment, a highly-driven attention module is added to the basic network to improve the recognition accuracy and ensure the real-time performance of recognition.

The high-driven attention module is provided by analyzing the difference of pixels occupied by different scene types in different height levels of a single-frame image in an actual urban scene image. As shown in fig. 2, a single frame image is divided into three levels of height, i.e., upper, middle and lower levels, where the image-dominant pixel in the upper level is sky, the image-dominant pixel in the middle level is a vehicle, a pedestrian, a building, etc., and the image-dominant pixel in the lower level is a road surface, i.e., an area of interest. The attention module driven by the height can directly acquire the interested region in the image, so that the recognition accuracy is improved, and the recognition speed is accelerated.

The height-driven attention module includes operations of width pooling, down-sampling, calculating a height-driven attention feature map, inserting feature position coding, and the like. Wherein, the width pooling operation is to obtain a characteristic diagram in the image width direction, and the pooling mode is average pooling; the feature maps obtained through the width pooling operation are not all necessary, the down-sampling aims to remove unnecessary feature maps, the calculation of the down-sampled feature maps utilizes convolution operation, such as formula 1, to obtain the adjacent position relation of the feature maps, the final inserted feature position coding operation is to obtain the prior information of the vertical position of a specific object, and the position coding is generated through sine and cosine functions of different frequencies and then is added with the feature vectors of corresponding positions element by element.

In this embodiment, the overall structure of H-ResNet is shown in FIG. 4, where each ResNet stage represents a residual structure and H represents a highly driven attention module. The identification result is more accurate as the number of ResNet stages is larger, however, as the number of ResNet stages is increased, the network calculation amount is increased, and the real-time performance is reduced. The number of high-drive attention modules increases with the number of ResNet stages, in this example 4 ResNet stages, 3 high-drive attention modules are used. Each height driving attention module is inserted between the two residual error modules, the prior information of the vertical position is obtained once after the residual error modules pass each time, and the state information of the road surface can be more accurately obtained after the prior information of the vertical position is obtained for multiple times.

In this example, the deep learning framework is TensorFlow or Keras or Caffe2 or PyTorch or MXNet.

In this example, the hardware configuration of the experimental platform used for training the semantic segmentation model is a GPU of GeForce GTX 1080Ti and a CPU of i7-9700k with 64GB memory. In terms of software configuration, the experimental platform is based on Ubuntu18.04 of a 64-bit operating system. A network model is constructed by adopting a current mainstream deep learning frame pytorech and python languages, and high-performance parallel computing is carried out by using a parallel computing architecture CUDA and a GPU acceleration library CUDNN.

In this example, the training uses Focal local, also called focus Loss, as shown in equation 7, as a function of the training Loss. Focal local reduces the weight of samples which can be well classified by modifying a standard cross entropy Loss function, increases the weight of samples which are difficult to classify, and enables a model to quickly pay attention to difficult samples, namely relatively few samples, in the training process so as to solve the problem of class sample imbalance.

Where N represents the number of samples used for network training, i represents the sample index, y_iRepresenting the label sample corresponding to each training sample, and alpha represents the weight parameter at [0,1]The middle value, gamma, is also a hyperparameter, p_iIn [0,1 ]]A median value, corresponding to y_iAs a general theory of prediction of +1,

in this example, the low-level and high-level semantic information of the image is learned most quickly in order to ensure that the model is stable. The learning rate in model training is selected as the exponentially decaying learning rate, and the formula is shown in fig. 8.

decayed_lr＝init_lr×decay_rate^{(global_step/decay_steps)} (8)

In the formula: init _ lr learning rate of initial setting

decay _ rate-attenuation coefficient

global _ step-iteration round number

decay step decay Rate

In this example, the number of images input during each iterative training is set to 8, and data enhancement is performed on the input images by means of random scale transformation, random angle rotation, image inversion and the like. The initial learning rate init _ lr is set to 0.001, the attenuation coefficient decay _ rete is set to 0.95, the iteration round number global _ step is 5400, and the attenuation speed decay _ steps is set to 50.

And inputting the collected data set into the constructed semantic segmentation neural network for end-to-end training, and integrating the trained model into a vehicle machine of an intelligent vehicle or other intelligent machines through ROS software.

And acquiring a real-time image by using a vehicle-mounted network camera or a USB camera, inputting the real-time image into a semantic segmentation model integrated in a vehicle machine, and distinguishing the categories of different areas on the road surface in real time and positioning the distribution of the categories so as to obtain a road surface condition distribution map.

In this example, the camera is installed in car windshield department, avoids receiving environmental disturbance, influences the quality of gathering the image.

The above-listed series of detailed descriptions are merely specific illustrations of possible embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent means or modifications that do not depart from the technical spirit of the present invention are intended to be included within the scope of the present invention.

Claims

1. a local road adhesion coefficient estimation method based on image segmentation, is characterized in that, comprises the steps:

Step 1: Offline pre-training image segmentation network model, including:

1.1. Collect road images of different weather conditions,

1.2. Locally label the collected road images of different weather conditions to form a data set for local road adhesion coefficient estimation,

1.3. Build a deep learning network model for image segmentation,

1.4. End-to-end training of the deep learning algorithm network framework for image segmentation using the data set of local road adhesion coefficient estimation;

Step 2: Obtain a real-time road image and estimate the local adhesion coefficient of the road in real time, including:

2.1. Collect road images,

2.2. Use the pre-trained image segmentation network model to classify the images obtained in real time and locate different categories to form a real-time road map,

2.3. Estimate the local road adhesion coefficient on the real-time road map according to the road type.

2. a kind of local road adhesion coefficient estimation method based on image segmentation according to claim 1, is characterized in that, in described 1.1, collect road surface images of different weather conditions, the collection tool used is CARLA simulation software, collects The images include local water images on the road surface, local snow images on the road surface, local icing images on the road surface and normal asphalt roads.

3. A method for estimating local road adhesion coefficient based on image segmentation according to claim 1, wherein in said 1.3, the deep learning network model of said image segmentation comprises a basic network structure built by using a residual structure , add a highly driven attention module to the basic network, and name this semantic segmentation neural network model as the H-ResNet model, where H represents the highly driven attention module, and ResNet represents the basic network structure built by the residual structure.

4. A method for estimating local road adhesion coefficient based on image segmentation according to claim 3, wherein each ResNet stage in the H-ResNet model represents a residual structure, and H represents highly driven attention Module; using 4 ResNet stages, 3 highly-driven attention modules, each highly-driven attention module, inserted between two residual modules, each time through the residual module to obtain a priori information of the vertical position , the state information of the road surface can be more accurately obtained after obtaining the prior information of the vertical position for many times.

5. A method for estimating local road adhesion coefficient based on image segmentation according to claim 4, wherein the residual unit comprises: a shortcut connection and an identity mapping; The identity is mapped to the output; weight layer represents the convolution weight layer, relu represents the activation function, F(X) represents the residual of the feature representation learned by X through the convolution weight layer, and H(X) represents the input X finally learned The features that need to be learned are equivalent to F(X)=H(X)-X;

The activation function adopts the ReLU nonlinear activation function: when the abscissa x is less than or equal to 0, the function is 0, and when x is greater than 0, the function value is equal to x:

The operation formula of the convolutional layer:

where w(x, y) represents a convolution kernel of size m×n, f(x, y) is an image, and · represents the convolution operation.

6 . The method for estimating local road adhesion coefficient based on image segmentation according to claim 4 , wherein the height-driven attention module is aimed at different height levels of single-frame images in actual urban scene images. 7 . It is designed for the different pixels occupied by the scene category. Specifically, the single frame image is divided into three height levels: upper, middle and lower levels. In the upper level, the dominant pixel of the image is the sky, and in the middle level, the dominant pixel of the image is vehicles and pedestrians. , buildings, etc., while the dominant pixels occupying the image in the lower layers are the pavement, that is, the region of interest.

The height-driven attention module includes width pooling, downsampling, calculating height-driven attention feature maps, and inserting feature location codes; wherein, the width pooling operation is to obtain the feature map in the width direction of the image, and the pooling method used is Average pooling; not all feature maps obtained through the width pooling operation are necessary, and unnecessary feature maps are removed by downsampling, and the convolution operation is used to calculate the downsampled feature map, such as formula 1, to obtain its Adjacent position relationship, the final insertion feature position encoding operation can obtain the prior information of the vertical position of a specific object, the position code is generated by sine and cosine functions of different frequencies, and then added element by element with the corresponding position feature vector.

7. a kind of local road adhesion coefficient estimation method based on image segmentation according to claim 1, is characterized in that, the training method in described 1.4 is to utilize single GPU or multi-GPU batch gradient descent (Batch GradientDescent) or random Backpropagation methods of Stochastic Gradient Descent or Mini-Batch Gradient Descent;

The Focal Loss used for training is obtained by modifying the standard cross-entropy loss function:

The exponential decay learning rate is chosen as the learning rate during model training, as shown in Equation 8:

decayed_lr=init_lr×decay_rate ^{(global_step/decay_steps)} (8)

In the formula: init_lr—the initial learning rate

decay_rate—decay coefficient

global_step—number of iteration rounds

decay_steps - Decay speed.

8 . The method for estimating local road adhesion coefficient based on image segmentation according to claim 1 , wherein the tool for collecting road images in 2.1 is a vehicle-mounted camera, which can be a web camera or a USB camera. 9 .

9. A method for estimating local road adhesion coefficient based on image segmentation according to claim 1, characterized in that, the different categories of classification and positioning described in 2.2 refer to taking dry asphalt as the background road surface, and distinguishing Water accumulation area, snow accumulation area or icing area, etc. or random combination of any of them, and get its distribution on the asphalt pavement.