CN115376003A

CN115376003A - Road surface crack segmentation method based on U-Net network and CBAM attention mechanism

Info

Publication number: CN115376003A
Application number: CN202210849517.2A
Authority: CN
Inventors: 杨亚龙; 苏亮亮; 张公泉; 汪明月; 徐文晶; 牛震; 胡奇志; 丁磊; 解静平
Original assignee: Anhui Jianzhu University
Current assignee: Anhui Jianzhu University
Priority date: 2022-07-19
Filing date: 2022-07-19
Publication date: 2022-11-22

Abstract

The invention discloses a pavement crack segmentation method based on a U-Net network and a CBAM attention mechanism, which comprises the following steps: the method comprises the following steps of S1, obtaining a pavement crack data set, wherein the pavement crack data set comprises a pavement crack training set, a pavement crack verification set and a pavement crack test set; and S2, respectively carrying out data expansion on the pavement crack training set in the step S1, carrying out no data expansion on the pavement crack verification set and the pavement crack test set to obtain a pavement crack sample set, and selecting a training crack detection network by using the pavement crack verification set obtained in the step S1 to obtain an optimal model in the training crack detection network so as to test the network performance by using the pavement crack test set. The invention improves the feature extraction capability and simultaneously leads the model training result to be more accurate.

Description

Road surface crack segmentation method based on U-Net network and CBAM attention mechanism

Technical Field

The invention relates to the technical field of image recognition, in particular to a pavement crack segmentation method based on a U-Net network and a CBAM attention mechanism.

Background

Cracks are one of the factors determining the condition of the structure for the safety and maintenance of the road. In recent years, deep convolutional neural networks have shown superior performance in solving many computer vision problems for pavement crack detection, and pixel-level pavement crack detection can segment crack pixels from the background to provide the geometrical characteristics of cracks, which is important for identifying the type and severity of cracks. However, the pixel-level pavement crack segmentation has the defects of information loss and quantity imbalance between crack and non-crack pixels for a deep convolutional neural network. Therefore, how to provide a pavement crack segmentation method based on a U-Net network and a CBAM attention mechanism is a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The invention aims to provide a pavement crack segmentation method based on a U-Net network and a CBAM attention mechanism, which improves the feature extraction capability and simultaneously enables the model training result to be more accurate.

The pavement crack segmentation method based on the U-Net network and the CBAM attention mechanism comprises the following steps:

the method comprises the following steps of S1, obtaining a pavement crack data set, wherein the pavement crack data set comprises a pavement crack training set, a pavement crack verification set and a pavement crack test set;

s2, respectively carrying out data expansion on the pavement crack training set in the step S1, wherein data expansion is not carried out on a pavement crack verification set and a pavement crack test set to obtain a pavement crack sample set, selecting a training crack detection network by using the pavement crack verification set obtained in the step S1 to obtain an optimal model in the training crack detection network, and the pavement crack test set is used for testing the performance of the network model;

s3, constructing an improved U-Net convolutional neural network segmentation model introducing a CBAM (cone beam amplitude modulation) attention mechanism, fusing the CBAM attention mechanism in cross-layer connection between down-sampling and up-sampling, fusing expansion convolution modules combined in series and in parallel in the last layer of feature map of the down-sampling, and using a depth separable convolution module in the up-sampling;

s4, inputting the road surface crack training set picture marked in the step S2 into the road surface crack segmentation model based on the U-Net network and the CBAM attention mechanism and constructed in the step S3 for training to obtain a trained model;

s5, selecting a training crack detection network by using the pavement crack verification set obtained in the S1 to obtain an optimal model in the training crack detection network, and inputting the pavement crack test set into the trained model for testing so as to further obtain a pavement crack test result;

and S6, comparing the pavement crack test result obtained in the step S5 with the crack positions in the label picture corresponding to the pavement crack test set, calculating an evaluation index, and evaluating the test result.

Preferably, the collected pavement crack data set of step S1 includes a real image of the pavement crack and a label image corresponding to the real image.

Preferably, the step S2 specifically includes the following steps:

s21, performing data expansion on the pavement crack training set in a way of rotating, translating, overturning and the like on all training samples in the pavement crack training set, wherein data expansion is not performed on the pavement crack verification set and the pavement crack test set;

s22, resizing all the training samples and label samples in the pavement crack training set with the data expanded in the step S21 to facilitate calculation loss, and removing the training samples with excessive noise in the training samples in the data set;

and S23, using the training sample with excessive noise removed in the step S22 for training the network, and simultaneously using the pavement crack verification set for selecting the model with the best performance and using the pavement crack test set for testing the performance of the network model.

Preferably, the rotation refers to randomly rotating the image to a plurality of designated angles, and changing the orientation of the image content, and the translation refers to translating the image in a horizontal direction or a vertical direction on the image plane.

Preferably, the convolutional neural network segmentation model in the step S3 is added with a CBAM attention mechanism module, a depth separable residual convolution module and an expansion convolution module on the basis of a U-Net network.

Preferably, the CBAM attention module includes an independent channel attention module and an independent space attention module, and the channel attention module and the space attention module perform attention weight calculation on a channel and a space, respectively.

Preferably, the step S4 of training the road surface crack segmentation model based on the U-Net network and the CBAM attention mechanism specifically includes training by using an Adam optimizer, and using a binary cross entropy function as a loss function in the training process;

wherein the binary cross entropy function is:

LBCE-loss is a loss value, N is the total pixel number of a concrete image, and Li and yi are respectively a label value and a prediction probability value of the ith pixel point.

Preferably, the step of calculating the evaluation index in step S6 includes:

s61, carrying out binarization processing on the image and setting a threshold value;

step S62, evaluating the model, and calculating the average value of the following evaluation indexes;

and S63, integrating all indexes to select the model which best expresses on the pavement crack verification set.

The invention has the beneficial effects that:

(1) According to the CBAM attention mechanism, a CAM channel attention module can judge in a multi-channel feature map and carry out weight improvement on the channels, and an SAM space attention module can judge in a space domain of the feature map so as to improve the weight of a feature value in the region, so that a model training result is more accurate.

(2) The method accelerates the operation efficiency of the model by using the depth separable residual convolution module, fuses shallow and deep features, improves the feature extraction capability, and simultaneously extracts the features by using the expansion convolutions with different expansion rates, thereby realizing that the receptive field of the feature points is increased without reducing the resolution of the feature map.

Drawings

In the drawings:

FIG. 1 is a basic structure diagram of pavement crack segmentation in a pavement crack segmentation method based on a U-Net network and a CBAM attention mechanism provided by the invention;

FIG. 2 is a block diagram of CBAM attention in the road surface crack segmentation method based on U-Net network and CBAM attention mechanism according to the present invention;

FIG. 3 is a channel attention structure diagram of a CBAM attention module structure diagram in the road surface crack segmentation method based on the U-Net network and the CBAM attention mechanism provided by the invention;

FIG. 4 is a spatial attention structure diagram of a CBAM attention module structure diagram in the road surface crack segmentation method based on the U-Net network and the CBAM attention mechanism provided by the invention;

FIG. 5 is a structural diagram of a depth separable residual convolution module in the pavement crack segmentation method based on the U-Net network and the CBAM attention mechanism provided by the invention;

FIG. 6 is a structural diagram of an expansion convolution module in the pavement crack segmentation method based on the U-Net network and the CBAM attention mechanism.

FIG. 7 is a comparison graph of crack detection effects of the pavement crack segmentation method based on the U-Net network and the CBAM attention mechanism and the U-Net, crackSegNet and DeepCrack methods.

Detailed Description

Referring to fig. 1, a pavement crack segmentation method based on a U-Net network and a CBAM attention mechanism includes the following steps:

s1, obtaining a pavement crack data set, wherein the pavement crack data set comprises a pavement crack training set, a pavement crack verification set and a pavement crack test set, and the pavement crack data set comprises a real picture of a pavement crack and a background label picture corresponding to the real picture

S2, respectively carrying out data expansion on the pavement crack training set in the step S1, carrying out no data expansion on the pavement crack verification set and the pavement crack test set to obtain a pavement crack sample set, and selecting a training crack detection network by using the pavement crack verification set obtained in the step S1 to obtain an optimal model in the training crack detection network;

the method specifically comprises the following steps:

s21, performing data expansion on the pavement crack training set by rotating, translating, overturning and other modes on all training samples in the pavement crack training set, wherein data expansion is not performed on the pavement crack verification set and the pavement crack test set;

s22, resizing all training samples, specifically a pavement crack training set, a pavement crack verification set and a pavement crack test set, and label samples to facilitate calculating loss and removing samples with excessive noise in a data set;

and S23, using the training sample with excessive noise removed in the step S22 for training a network, and simultaneously using the pavement crack verification set for selecting a model with the best performance and using the pavement crack test set for testing the performance of the network model.

The rotation refers to randomly rotating the image to a plurality of specified angles to change the orientation of the image content, and the translation refers to translating the image in the horizontal direction or translating the image in the vertical direction on the image plane.

In the embodiment of the invention, during training, the crack image is rotated by 8 different angles, the rotation angle is 45 degrees every time, the crack image is expanded in a translation, turnover and other modes, and the size of the crack image is adjusted to 320x320.

S3, constructing an improved U-Net convolutional neural network segmentation model introducing a CBAM attention mechanism, fusing the CBAM attention mechanism in cross-layer connection between down-sampling and up-sampling, fusing expansion convolution modules combined in series and parallel in the last layer of feature map of the down-sampling, and using a depth separable convolution module in the up-sampling;

and the convolutional neural network segmentation model in the step S3 is added with a CBAM attention mechanism module, a depth separable residual convolution module and an expansion convolution module on the basis of the U-Net network.

Referring to fig. 2, in the embodiment of the present invention, attention module CBAM is constructed: the CBAM comprises 2 independent sub-modules which are respectively a channel attention module and a space attention module, and the attention weight calculation is respectively carried out on a channel and a space, so that not only can parameters and calculation power be saved, but also the CBAM can be ensured to be integrated into the existing network architecture as a plug-and-play module, and the channel attention module which has a dominant effect on the division of cracks in a multi-channel feature map is judged; and the spatial attention module is used for judging which regions have higher possibility to belong to the crack regions in the spatial domain of the feature map.

Referring to fig. 3, a depth separable residual module is constructed: the depth separable residual error module uses MobileNet and ResNet structures for reference, applies the MobileNet and ResNet structures in an up-sampling stage, simultaneously introduces a residual error structure, fuses shallow layer features and deep layer features, and improves the feature extraction capability, the depth separable residual error convolution module consists of two parts, the first part uses one-time convolution operation, the convolution kernel size is 5 x 5, the step length is 1, and the padding is 2; the second part uses a convolution operation firstly, the convolution kernel size is 1 x 1, and the step size is 1; then using a group normalization operation; then the Relu activation function is used; then, a convolution operation is used, the convolution kernel size is 5 x 5, the step size is 1, the padding is 2, then, a group normalization operation is used, then, a Relu activating function is used to obtain a second part output, and the first part output and the second part output are added to obtain the output of the depth separable residual convolution module.

The depth separable residual error module uses MobileNet and ResNet structures for reference, applies the MobileNet and ResNet structures to an up-sampling stage, and introduces the residual error structures to fuse shallow and deep features, so that the feature extraction capability is improved.

Constructing an expansion convolution module: the expansion convolution module adopts a form of combining series connection and parallel connection to connect convolutions with different expansion rates, so that the resolution of the characteristic diagram is not reduced while the receptive field of the characteristic points is increased. The expansion convolution module is specifically characterized in that feature graphs are respectively subjected to expansion convolution formed by 3 layers with different expansion rates, the 1 st layer uses one expansion convolution, the expansion rate is 1, the 2 nd layer uses two expansion convolutions, the expansion rates are respectively 1 and 2 the 3 rd layer uses three expansion convolutions, the expansion rates are respectively 1, 2 and 3, then the input feature graphs and the output feature graphs of the 3 layers are subjected to addition operation to obtain an output feature graph of the expansion convolution module, the sizes of convolution kernels of all the expansion convolution operations are 3 x3, and the step length is 1.

The expansion convolution module adopts a form of combining series connection and parallel connection to connect convolutions with different expansion rates, so that the resolution of the characteristic diagram is not reduced while the receptive field of the characteristic points is increased.

The convolutional neural network segmentation model includes a contraction path for feature extraction and an expansion path for upsampling.

the method specifically comprises the following steps:

s41, importing a training data set of the road surface split into a model, firstly expanding the data set, and directly carrying out model training on the expanded data set;

s42, setting the number of batches, the iteration times and the step number of each iteration in the training process, and using the binary cross entropy as a loss function of the model;

and S43, continuously optimizing the parameters of the model according to the numerical value of the loss function, and indirectly observing the feasibility of the model to finally obtain a feasible model.

The step S4 of training the road surface crack segmentation model based on the U-Net network and the CBAM attention mechanism specifically comprises the steps of training by adopting an Adam optimizer and taking a binary cross entropy function as a loss function in the training process;

wherein the binary cross entropy function is:

LBCE-loss is a loss value, N is the total pixel number of a concrete image, and Li and yi are respectively a label value and a prediction probability value of an ith pixel point.

S5, selecting a training crack detection network by using the pavement crack verification set obtained in the step S1 to obtain an optimal model in the training crack detection network, and inputting the pavement crack test set into the trained model for testing so as to further obtain a pavement crack test result;

and S6, comparing the pavement crack test result obtained in the step S5 with the crack positions in the label pictures corresponding to the pavement crack test set, calculating an evaluation index, and evaluating the test result.

The evaluation index calculation steps are as follows:

s61, carrying out binarization processing on the image, and setting a threshold value;

wherein, TP represents the pixel number of the prediction result being a positive sample and the label being a positive sample, FP represents the pixel number of the prediction result being a positive sample and the label being a negative sample;

wherein FN represents the number of pixels with a negative sample as a prediction result but a positive sample as a label;

wherein P represents precision, R represents recall rate, and F1 represents F1 score;

wherein TN represents the pixel number of which the prediction result is a negative sample and the label is also a negative sample, and A represents the accuracy;

wherein, ioU represents the cross-over ratio;

and S63, integrating all indexes to select a model which best expresses on a pavement crack verification set.

In the embodiment of the invention, a Crack detection network is trained by using a Crack500 road Crack public data set as a training data set, for the processing of the data set, all training samples and label samples are firstly resized into 320 multiplied by 320, excessive noise samples are removed, then the remaining qualified samples are divided into a road Crack training set containing 1840 pictures, a road Crack verification set containing 348 pictures and a road Crack test set containing 1124 pictures, including corresponding labels, a model with the best performance is selected by using the road Crack verification set, and the model performance is tested by using the road Crack test set.

In performance comparison, the invention designs a comparison experiment on Crack500 pavement Crack test set data, and selects three methods of U-Net, deep Crack and CrackSegNet as references, wherein U-Net belongs to a method based on up and down sampling, and SegNet and CrackSegNet belong to a semantic segmentation method based on an encoder decoder.

TABLE 1 results on the Crack500 pavement Crack test set

The experimental results are shown in table 1 above, and it can be known that the crack detection method based on the improved self-CBAM attention mechanism effectively improves the crack detection accuracy.

According to the CBAM attention mechanism, the CAM channel attention module can judge in a multi-channel feature map and carry out weight improvement on the channels, and the SAM space attention module can judge in a space domain of the feature map so as to improve the weight of a feature value in the area, so that a model training result is more accurate.

The method accelerates the operation efficiency of the model by using the depth separable residual convolution module, fuses shallow and deep features, improves the feature extraction capability, and simultaneously extracts the features by using the expansion convolutions with different expansion rates, thereby realizing that the receptive field of the feature points is increased without reducing the resolution of the feature map.

Claims

1. A pavement crack segmentation method based on a U-Net network and a CBAM attention mechanism is characterized by comprising the following steps:

s2, respectively carrying out data expansion on the pavement crack training set in the step S1, wherein data expansion is not carried out on a pavement crack verification set and a pavement crack test set to obtain a pavement crack sample set, and then selecting a training crack detection network by using the pavement crack verification set obtained in the step S1 to obtain an optimal model in the training crack detection network so as to test the network performance by using the pavement crack test set;

s4, inputting the road surface crack training set picture marked in the S2 into the road surface crack segmentation model based on the U-Net network and the CBAM attention mechanism and constructed in the S3 for training to obtain a trained model;

2. The method for segmenting the road surface cracks based on the U-Net network and the CBAM attention mechanism according to claim 1, wherein the step S1 of collecting the road surface crack data set comprises real pictures of the road surface cracks and corresponding label pictures.

3. The road surface crack segmentation method based on the U-Net network and the CBAM attention mechanism as claimed in claim 1, wherein the step S2 specifically includes the following method steps:

4. The method for segmenting the road surface cracks based on the U-Net network and the CBAM attention mechanism as claimed in claim 3, wherein the rotation means randomly rotating the image to a plurality of specified angles, changing the orientation of the image content, and the translation means translating the image in a horizontal direction or a vertical direction on the image plane.

5. The method for segmenting the road surface crack based on the U-Net network and the CBAM attention mechanism according to claim 1, wherein the convolutional neural network segmentation model in the step S3 is added with a CBAM attention mechanism module, a depth separable residual error convolution module and an expansion convolution module on the basis of the U-Net network.

6. The method for road surface crack segmentation based on the U-Net network and the CBAM attention mechanism as claimed in claim 5, wherein the CBAM attention module comprises an independent channel attention module and an independent space attention module, and the channel attention module and the space attention module perform attention weight calculation on a channel and a space respectively.

7. The road surface crack segmentation method based on the U-Net network and the CBAM attention mechanism of claim 6, wherein the training of the road surface crack segmentation model based on the U-Net network and the CBAM attention mechanism in the step S4 specifically comprises the steps of training by adopting an Adam optimizer, and adopting a binary cross entropy function as a loss function of a training process;

wherein the binary cross entropy function is:

8. The road surface crack segmentation method based on the U-Net network and the CBAM attention mechanism as claimed in claim 1, wherein the step of calculating the evaluation index in the step S6 is as follows: