CN108876780B

CN108876780B - Bridge crack image crack detection method under complex background

Info

Publication number: CN108876780B
Application number: CN201810669942.7A
Authority: CN
Inventors: 李良福; 孙瑞赟
Original assignee: Shaanxi Normal University
Current assignee: Shaanxi Normal University
Priority date: 2018-06-26
Filing date: 2018-06-26
Publication date: 2020-11-10
Anticipated expiration: 2038-06-26
Also published as: CN108876780A

Abstract

The invention relates to a method for detecting a crack of a bridge crack image under a complex background, which comprises the steps of firstly, providing and utilizing a bridge pavement crack image generation model to amplify a data set according to a depth convolution generation type countermeasure network principle; then, a bridge pavement crack image segmentation model based on semantic segmentation is constructed according to crack characteristics; and finally, extracting the cracks in the crack images by using the bridge pavement crack image segmentation model. The data set amplification effectively relieves the phenomenon of under-fitting caused by insufficient training data, and the accuracy and the recall rate are respectively improved by 79.4 percent and 74.7 percent. Compared with the existing semantic segmentation algorithm, the algorithm has the advantages of reduced parameter quantity, reduced training time, improved accuracy rate, recall rate and F1 score, and can reach more than 92%. Compared with the existing bridge pavement crack detection and segmentation algorithm, the algorithm is more suitable for detecting and segmenting the bridge pavement crack under the complex background, and has stronger identification effect and better generalization capability.

Description

Bridge crack image crack detection method under complex background

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a bridge crack image crack detection method under a complex background.

Background

Transportation is the basic need and prerequisite of economic development, the survival foundation and civilization sign of modern society, the infrastructure and the important tie of industry development, and the development of national economy is concerned, bears the life line of social progress. According to statistics of national economic and social development statistical bulletin in 2017, the highway mileage of 6796 kilometers is newly built in China, the operating mileage of 2182 kilometers of newly built high-speed railways is built, the largest highway network and high-speed rail operating network in the world are built, and the international influence is remarkably enhanced. In modern traffic construction, the proportion of the bridge in the high-speed railway which is already constructed reaches 50 percent, wherein the proportion of the bridge in the high-speed railway of Jingshang reaches 81.5 percent. The construction and maintenance are always important in the management of highway bridges in China, and with the continuous progress of traffic industry, the rapid increase of highway traffic flow brings huge pressure to the operation safety of the highway bridges, and the pressure can accelerate the aging speed of the bridges and roads, thereby causing potential safety hazards. The quality safety of the bridge is related to the national civilization and the family. Therefore, the maintenance and management of the bridge become the key to ensure the safe operation of the bridge. Research has shown that most of the concrete bridges are damaged by cracks, so that the detection and the segmentation of the cracks of the concrete bridges are very important.

The method has the advantages that the effective means is adopted to detect and divide the cracks of the bridge pavement, plays an important role in ensuring the safety and normal operation of public transportation, and has been widely concerned by academic and engineering circles at home and abroad for a long time. From the rapid development of the traditional image processing technology to the deep learning nowadays, scholars at home and abroad continuously apply new technology to the detection and segmentation of the bridge pavement crack, and some excellent research results are obtained. In the traditional image processing algorithm, a road surface crack segmentation algorithm based on threshold segmentation is the earliest researched and simplest and easiest segmentation algorithm. Wherein Oh H et al propose an iterative threshold segmentation algorithm [ Oh H, Garrick N W, Achenie L E K.Segmentation adaptive clustering for processing noise development images. imaging Technologies: Techniques and Applications in visual engineering. Second International conference.1998: 138-); li Yuan et al proposed an OSTU improvement algorithm based on Hough transform [ Li Yuan, Huangquan, Houxin ], study of the OSTU improvement algorithm based on Hough transform in pavement crack detection [ J ]. electronic design engineering, 2016,24(05):43-46 ]; talab A M A et al use the Otsu method and multiple filtering in image processing techniques to detect cracks in images [ Talab A M A, Huang Z, Xi F, et al. detection crack in image using Otsu method and multiple filtering in image processing techniques [ J ]. Optik-International Journal for Light and Electron Optics,2016,127(3):1030- "1033 ]; in addition, Fang Cui et al achieved image crack detection through a modified K-Means algorithm [ Fang C, Zhe L, Li y. images crack detection technology based on improved K-Means algorithm [ J ]. Journal of Multimedia,2014,9(6) ]; ren Liang et al proposed pavement crack joining algorithm based on Prim min spanning tree [ ren Liang, Xuzhi just, Zhao Xiang Mo, etc.. pavement crack joining algorithm based on Prim min spanning tree [ J ]. computer engineering, 2015,41(01):31-36+43 ]; zhang Jing et al proposed a multi-scale input image infiltration model to make up for the deficiency of single-scale feature extraction [ Zhang J, Nie H Y, Yu Q. bridge crack detection based on percolation model with multi-scale input image [ J ]. computer engineering,2017,43(02): 273. 279.[ Zhang Jing, Nie, flood, strength. bridge crack detection based on the multi-scale input image infiltration model [ J ]. computer engineering,2017,43(02): 273. 279 ]; these conventional algorithms all require manual setting and adjustment of parameters, which is highly dependent on manual operation.

In recent years, machine learning and deep learning become hot spots of rapid development of artificial intelligence direction, and many scholars successfully combine bridge crack detection with the hot spots. For example, L.Zhang et al use deep convolutional neural networks to detect bridge cracks [ Zhang L, Yang F, Zhang Y D, et al]//IEEE International Conference on Image Processing.IEEE,2016](ii) a Fu-Chen Chen et al use the convolutional neural network and naive Bayes data fused NB-CNN network for crack detection [ Chen F C, Jahanshahi M.NB-CNN: Deep learning-based crack detection using connected neural network and

bayes data fusion[J].IEEE Transactions on Industrial Electronics,2018,65(5):4392-4400](ii) a Wendy D.Fisher et al skillfully use a support vector machine to complete fracture extraction]Fisher W,Camp T,Krzhizhanovskaya V.Crack detection in earth dam and levee passive seismic data using support vector machines[J].Procedia Computer Science,2016,80:577-586](ii) a Yong Shi et al used random structured forest to perform automatic pavement crack extraction [ Shi Y, Cui L, Qi Z, et al].IEEE Transactions on Intelligent Transportation Systems,2016,17(12):3434-3445](ii) a The bridge pavement crack detection algorithm based on image processing obtains good experimental effectThe acquired image has high contrast, low noise and simple scene, and has no obstacles such as fallen leaves, water spots, lane lines, shadows and the like, so that the complexity of the bridge pavement image is underestimated, and the requirements of engineering application are difficult to meet. In the fields of computer vision and image processing, semantic segmentation [ Chenhongxiang ] is based on the image semantic segmentation of a convolutional neural network [ D ]]2016, Zhejiang university]As a precise segmentation method for a pixel level, it has been widely used. SegNet [ Badrinarayanan V, Kendall A, Cipola R.SegNet: A deep conditional encoder-decoder architecture for scene segmentation [ J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2015,PP(99):2481-2495]、FCN[Shelhamer E,Long J,Darrell T.Fully convolutional networks for semantic segmentation[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2014,39(4):640-651]、DeepLab[Chen L C,Papandreou G,Kokkinos I,et al.DeepLab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected CRFs[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2018,40(4):834-848]、DenseNet[Iandola F,Moskewicz M,Karayev S,et al.DenseNet:Implementing efficient convnet descriptor pyramids[J].Eprint Arxiv,2014]The equal network model is proved to achieve advanced effects in multi-target classification on a plurality of classical data sets. If the existing semantic segmentation model is directly applied to bridge pavement crack detection, the crack is of a linear topological structure, the area of the whole picture is small, the background texture is complex, the number of obstacles is large, and a satisfactory effect cannot be achieved.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a bridge crack image crack detection method under a complex background. The technical problem to be solved by the invention is realized by the following technical scheme: a bridge crack image crack detection method under a complex background comprises the following steps: step 1, carrying out geometric transformation, spatial filtering and linear transformation on an acquired original bridge crack image, and then carrying out data set amplification through a generation sub-model and a discrimination sub-model;

step 2, inputting the amplified image into a segmentation model for training, wherein the specific method comprises the following steps:

step 2.1, performing 5x5 convolution on the amplified image for once;

step 2.2: inputting the convolution result into a DenseBlock containing 4 layers;

step 2.3: performing a Transition Down operation on the result of the step 2.2 to reduce the image score of the crack

Resolution;

step 2.4: the number of layers of the DenseBlock is set to be 5 layers, 7 layers and 10 layers in sequence

Layer, repeating step 2 and step 2.3 for 4 times;

step 2.5: inputting the result of step 2.4 into Bottleneck composed of 12 layers, and finishing

Performing full down-sampling and performing connection operation of a plurality of characteristics;

step 2.6: inputting the output result of the upper layer into a circuit board consisting of Transition Up and DenseBlock

The number of layers corresponding to downsampling of a DenseBlock is 10;

step 2.7: the layers of the DenseBlock in the step 2.6 are set as 7, 5 and 4 in sequence,

repeating step 2.6 for 4 times;

step 2.8: performing 1x1 convolution operation on the output result of the step 2.7;

step 2.9: inputting the result of the step 2.8 into a softmax layer for judgment, and outputting the probability of cracks and non-cracks;

and step 3: and 2, after the training in the step 2 is finished, carrying out crack extraction on the image of the crack to be detected through the trained segmentation model.

Compared with the prior art, the invention has the beneficial effects that: the data set amplification of the invention effectively relieves the phenomenon of under-fitting caused by insufficient training data, and the accuracy and recall rate are respectively improved by 79.4 percent and 74.7 percent. Compared with the existing semantic segmentation algorithm, the method has the advantages of reduced parameter amount, reduced training time, improved accuracy rate, recall rate and F1 score, and can reach more than 92%. Compared with the traditional crack detection algorithm and the existing deep learning algorithm, the method is not influenced by road noise and obstacles, the crack segmentation result is noiseless, and the cracks are clear and complete; the algorithm is more suitable for detecting and segmenting the bridge pavement cracks under the complex background, and has stronger identification effect and better generalization capability.

Drawings

FIG. 1 is a schematic diagram of the expansion of a bridge fracture image dataset according to the present invention.

FIG. 2 is a diagram illustrating the structure of a generation submodel according to the present invention.

FIG. 3 is a diagram illustrating the structure of the discrimination submodel of the present invention.

FIG. 4 is a schematic diagram of a DenseBlock structure including 4 layers for a Dense conditional Network.

FIG. 5 is a schematic diagram of a segmentation model structure according to the present invention.

FIG. 6 is a schematic diagram of the layer structure of the segmentation model layers according to the present invention.

FIG. 7 is a structural diagram of the segmentation model Transition Down according to the present invention.

FIG. 8 is a schematic structural diagram of the segmentation model Transition Up of the present invention.

FIG. 9 is a schematic process diagram of the crack extraction method for the high-resolution bridge crack image according to the invention.

FIG. 10 is a visual comparison of the BCIGM and DCGAN generated fractures of the present invention.

Fig. 11 is a visual comparison of the inventive ReLU activation function and the SeLU activation function to generate a fracture image.

FIG. 12 is a visual comparison of the results of the present invention with or without data set amplification experiments.

FIG. 13 is a graph comparing the present invention with a mainstream fracture segmentation algorithm.

FIG. 14 is a visual chart of the effect of the detection result of the present invention.

FIG. 15 is a graph of the accuracy of the present invention.

FIG. 16 is a Loss graph of the present invention.

FIG. 17 is a schematic flow chart of the method of the present invention.

Detailed Description

The present invention will be described in further detail with reference to specific examples, but the embodiments of the present invention are not limited thereto.

As shown in fig. 17, the present embodiment provides a method for detecting a crack in a bridge crack image under a complex background, including the following steps: step 1, carrying out geometric transformation, spatial filtering and linear transformation on an acquired original bridge crack image, and then carrying out data set amplification through a generation sub-model and a discrimination sub-model; the generation sub-model sequentially comprises a full connection layer, a dimension conversion layer, a first transposition convolution layer, a second transposition convolution layer, a third transposition convolution layer, a fourth transposition convolution layer and a fifth transposition convolution layer; the sizes of convolution kernels of the first transposition convolutional layer, the second transposition convolutional layer, the third transposition convolutional layer, the fourth transposition convolutional layer and the fifth transposition convolutional layer are all 5x5, the step length is 2, and the number of convolution kernels is 512, 256, 128, 64 and 3 in sequence; the first, second, third and fourth transposed convolutional layers are all convolved with the SeLU activation function.

The discrimination submodel sequentially comprises a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer, a fifth convolution layer, a sixth convolution layer, a seventh convolution layer and a Sigmoid activation function layer; the sizes of convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer, the fourth convolution layer, the fifth convolution layer and the sixth convolution layer are all 5x5, the steps are all 2, and the number of convolution kernels is 64, 128, 256, 512, 1024 and 2048 in sequence; the convolution kernel size of the seventh convolution layer is 1x 1;

the first, second, third, fourth, fifth, sixth, and seventh convolutional layers are all fully convolutional layers.

In step 1, the present embodiment proposes a Bridge-Crack-Image-general-Model (BCIGM) based on a deep convolution-generation countermeasure network. The principle of the bridge crack image generation model is the same as that of a generation type countermeasure network, and the bridge crack image generation model is composed of a generation sub-model G and a judgment sub-model D which have a countermeasure relation. The generating sub-model G is used for generating a sample G (z) which is as follows the distribution of the real sample as possible by learning the probability distribution of the existing real data sample, and the judging sub-model D is used for judging whether the input is from the real sample or the generating sample G (z) and is essentially a two-classification model.

As shown in fig. 1, fig. 2 and fig. 3, the present embodiment provides a bridge crack image generation model based on a deep convolution generation type countermeasure network, referred to as a BCIGM for short.

And (3) data set expansion: using the BCIGM requires thousands of images as training samples, which is still a serious problem if all manual acquisitions are performed, so the amplification of the data set is done in two steps, first, a small amount of data is expanded by three image processing methods of geometric transformation, spatial filtering, and linear transformation of the images, and then, a large number of crack images are generated using the BCIGM. After the first step of data set expansion, 9362 images are manually selected as a training set of the BCIGM, and part of the images after the image processing algorithm expansion are shown in fig. 1, where the first column of fig. 1(a) is the original image, and the second to fifth columns are the images after the horizontal inversion of fig. 1(b), the vertical inversion of fig. 1(c), the linear transformation of fig. 1(d), and the spatial filtering transformation of fig. 1 (e).

The process of training the BCIGM is a process of continuously and alternately updating G and D, when the generation submodel G is fixed, the discrimination submodel D is optimized, so that the output tends to 1 as much as possible when a real data sample is input, and the output tends to 0 as much as possible when a generated data sample is input; when the discrimination submodel D is fixed, the generation submodel G is optimized, so that the generated sample passes through the discrimination submodel D and then is output with high probability, and when the discrimination submodel D and the generation submodel G reach Nash equilibrium, the generation submodel G generates the generated sample which can not be judged whether the generated sample is a real sample or not. The above process is expressed as the following formula (5):

the BCIGM comprises a generation sub-model and a discrimination sub-model, wherein the generation sub-model sequentially comprises a full connection layer, a dimension conversion layer, a first transposition convolution layer, a second transposition convolution layer, a third transposition convolution layer, a fourth transposition convolution layer and a fifth transposition convolution layer; the sizes of convolution kernels of the first transposition convolutional layer, the second transposition convolutional layer, the third transposition convolutional layer, the fourth transposition convolutional layer and the fifth transposition convolutional layer are all 5x5, the step length is 2, and the number of convolution kernels is 512, 256, 128, 64 and 3 in sequence;

the first, second, third and fourth transposed convolutional layers are all convolved with the SeLU activation function.

The generation submodel is used for generating a crack image of 256x256x3 by upsampling the input noise, and mainly comprises a transposed convolution. The specific process is as follows: firstly, 100-dimensional noise is input, then dimension conversion is carried out after full connection of layers, the noise is converted into 1024 feature maps of 8x8, finally, transposition convolution with five convolution kernels of 5x5, step size of 2, the number of the convolution kernels of 512, 256, 128, 64 and 3 is carried out in sequence, and SeLU activation functions [ Klambauer G, Unterthiner T, Mayr A, et al. The reason why BCIGM selects the SeLU activation function instead of the Relu activation function and Batch Normalization is: first, the SeLU introduces the property of self-normalization so that neuron excitation values can automatically converge to zero mean and unit variance, and even in the presence of noise and perturbations, will converge to zero mean and unit variance after forward propagation through many layers. In addition, for excitation values that do not approximate unit variance, the variance has a supremum and a infimum, so gradient extinction and gradient explosion are unlikely to occur, which greatly increases the stability of the BCIGM. The concrete model is shown in fig. 2.

The discrimination submodel sequentially comprises a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer, a fifth convolution layer, a sixth convolution layer, a seventh convolution layer and a Sigmoid activation function layer; the sizes of convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer, the fourth convolution layer, the fifth convolution layer and the sixth convolution layer are all 5x5, the steps are all 2, and the number of convolution kernels is 64, 128, 256, 512, 1024 and 2048 in sequence; the convolution kernel size of the seventh convolution layer is 1x 1.

The function of the judging submodel is to judge whether the input sample is a real sample through feature extraction, and the judging submodel is composed of a full convolution network. The specific process is as follows: firstly, inputting a 256x256x3 sample image, including two types of real samples and generated samples, then performing convolution with 6 convolution kernels of 5x5, step of 2 and the number of the convolution kernels of 64, 128, 256, 512, 1024 and 2048 in sequence, secondly performing convolution with 1 convolution kernel of 1x1, and finally mapping the input sample probability through a Sigmoid activation function. The 1x1 convolution kernel is added because it allows dimensionality reduction without changing the feature map size, reducing the number of parameters, and thus reducing computation time. The concrete model is shown in fig. 3.

step 2.1, performing 5x5 convolution on the amplified image for once;

Resolution;

Layer, repeating step 2 and step 2.3 for 4 times;

The number of layers corresponding to downsampling of a DenseBlock is 10;

repeating step 2.6 for 4 times;

In step 2, the embodiment improves the FC-DenseNet103 Model, and proposes a Bridge-Crack-Image-Segmentation-Model (BCISM for short) based on semantic Segmentation, which is suitable for a complex background.

Image semantic segmentation is an important branch in the field of artificial intelligence, and is an important ring for image understanding in computer vision. It is known that under certain conditions, the deeper the network depth, the more accurate the extracted features, and the better the detection effect, but in practical application, it is found that the deeper the network depth, the more likely the problem of gradient diffusion occurs. In 2016, the density Convolutional Network proposed by Gao Huang et al solved this problem.

Dense capacitive Network (DenseNet) is a Convolutional neural Network with Dense connections (Huang G, Liu Z, Maaten L V D, et al. In the network, any two layers have direct connection, that is, the input of each layer of the network is the union of the outputs of all the previous layers, and the feature map learned by the layer is directly transmitted to all the next layers as input. The DenseNet not only can efficiently utilize the characteristic diagram, but also can effectively solve the gradient disappearance problem when the network is deep. Can be expressed as

X_l＝H_l([X₀X₁,...,X_l-1])

Wherein l represents the number of layers, X_lRepresents the output of layer l, [ X ]₀,X₁,...,X_l-1]The output characteristic graphs of 0 to l-1 layers are linked according to the depth, namely, the Filter localization, and the advantage of using the Filter localization is that the explosive requirement of computing resources caused by the increase of the number of layers is prevented, so that the width and the depth of the network can be expanded. H_l(.) consists of Batch Normalization, the ReLU activation function, and the convolution operation. Due to Filter collocation requires that the feature maps X0, X1, and Xl-1 have the same size, while the pooling operation changes the feature map size and is indispensable, therefore, a DenseBlock is proposed, so that the feature maps in the DenseBlock have the same size, and the number of the feature maps is increased by k for each layer of the DenseBlock, and the width of the network can be controlled by k. Shown in fig. 4 is a DenseBlock comprising 4 layers.

Document Jeguo S, Drozdzal M, Vazquez D, et al, the one-sided folded layers mining a full volumetric contained influence for semantic segmentation [ J ].2016:1175-1183, proposes an FC-DenseNet103 model for semantic segmentation, and achieves satisfactory effects in CamVid data sets, but the results are not good when applied to bridge pavement crack extraction under complex backgrounds, and the parameters are numerous and the training time of the model is long.

The BCISM consists of 74 convolutional layers in total, and comprises a Down-sampling path consisting of DenseBlock and Transition Down, an Up-sampling path consisting of DenseBlock and Transition Up, and a Softmax function. The DenseBlock is composed of 4, 5, 7, 10, 12, 10, 7, 5 and 4 layers in sequence, each layer is composed of Batch Normalization, ReLU activation function, 3x3 convolution and Dropout, and Dropout refers to that a neural network unit is temporarily discarded from a network according to a certain probability in the training process of the deep learning network, so that each Batch trains a different network, and the Dropout in the network is 0.2.

The role of the Transition Down is to reduce the spatial dimension of the feature map, and consists of Batch Normalization, ReLU activation function, 1x1 convolution and 2x2 pooling operation, wherein 1x1 convolution is used to save the number of feature maps, and 2x2 pooling operation is used to reduce the resolution of the feature maps, thereby compensating the linear increase in the number of feature maps due to the large increase in the number of network layers.

The Transition Up is composed of a transposed convolution, which has the function of restoring the spatial resolution of the input image, and the transposed convolution is only used for the feature map of the last DenseBlock, because the last DenseBlock integrates the information of all the previous DenseBlock. The effect of the Softmax function is to output the probability of cracks versus non-cracks. The network structure parameters of the BCISM are shown in table 1:

TABLE 1 network architecture parameters of BCISM

Specifically, the present embodiment provides a bridge fracture image segmentation model based on semantic segmentation, as shown in fig. 5, fig. 6, fig. 7, and fig. 8, which includes a Down-sampling path composed of a DenseBlock and a Transition Down, an Up-sampling path composed of a DenseBlock and a Transition Up, and a softmax function, where the Up-sampling path composed of a DenseBlock and a Transition Up is used to recover the spatial resolution of an input image, where m represents the number of feature maps, and c represents the number of final classifications.

The segmentation model consists of 74 convolutional layers in total: the first convolution acts directly on the input image, 26 convolutional layers in the downsampling path consisting of DenseBlock, 12 convolutional layers in the Bottleneck Bottleneck, 26 convolutional layers in the upsampling path consisting of DenseBlock, and 4 Transition Down, each Transition Down containing one convolution, and 4 Transition Up, each Transition Up containing one transposed convolution, and the 1x1 convolution of the last layer in the network.

The last layer of the downsampled path is called Bottleneck; bottlenek is actually DenseBlock consisting of 12 layers, and has the advantages of alleviating the disappearance of the gradient and greatly reducing the calculation amount.

A Dense conditional Network (densnet) is a Convolutional neural Network with Dense connections. In the network, any two layers have direct connection, that is, the input of each layer of the network is the union of the outputs of all the previous layers, and the feature map learned by the layer is directly transmitted to all the next layers as input. In a conventional convolutional neural network, if you have L layers, there are L connections, but in DenseNet, there are L (L +1)/2 connections, which is specifically expressed as the following formula (1):

X_l＝H_l([X₀X₁,…,X_l-1]) (1)

wherein l represents the number of layers, X_lRepresents the output of layer l, [ X ]₀X₁…X_l-1]Representing the concatenation of the output profiles of layers 0 to l-1; h_l(.) represents the combination of Batch Normalization, ReLU, and convolution of 3x 3.

As is well known, to a certain extent, the deeper the network model is, the better the obtained effect is, however, the deeper the network is, the harder it is to train; because the parameter change of the previous layer affects the change of the later layer in the training process of the convolutional network, and the influence is amplified continuously with the increase of the network depth. When the convolutional network is trained, most of the convolutional networks adopt a batch gradient descent method, so that the distribution of input data of each layer of the network is changed continuously along with the continuous change of the input data and the continuous adjustment of parameters in the network, and the layers need to be changed continuously in the training process to adapt to the new data distribution, thereby causing the problems of difficult network training and difficult fitting; aiming at the problem, a Batch Normalization layer is introduced in the training process;

the Batch Normalization algorithm normalizes each layer of input in each iteration, and normalizes the distribution of input data into a distribution with a mean value of 0 and a variance of 1, as shown in formula (2):

wherein x is^kRepresenting the k-dimension of the input data, E x^k]Mean value representing k dimension

Represents the standard deviation;

the Batch Normalization algorithm sets two learnable variables, γ and β, as shown in equation (3),

gamma and beta are used for restoring the data distribution which should be learned by the previous layer, and y represents the data output value.

In order to enhance the expressive power of the network, the deep learning introduces a continuous nonlinear Activation Function, and the specific calculation of the relu (rectified Linear unit) Function is shown in formula (4):

ReLU(x)＝max(0,x) (4)。

because the activation function, ReLU, is generally considered to have a biological interpretation, and ReLU has been shown to fit better; therefore, the activation function in the model chooses to use ReLU.

According to the formula (1), we need to perform connection operation on a plurality of output characteristic graphs, and the requirement for performing connection operation is that the sizes of the characteristic graphs are consistent; the downsampling layer is necessary in the convolution network, and the effect of the downsampling layer is to reduce the dimension by changing the size of the feature map; therefore, to facilitate downsampling and smooth completion of join operations in our architecture, we divide the network into a number of densely-joined dense blocks DenseBlock, each having the same size of feature map.

Step 2 includes a densiBlock module with 4 layers, wherein the layer of the densiBlock module is composed of Batch Normalization, ReLU, convolution of 3x3 and Dropout, the Dropout means that in the training process of the deep learning network, for a neural network unit, the neural network unit is temporarily discarded from the network according to a certain probability, so that each mini-Batch is used for training different networks, wherein the Dropout is 0.2, and the Dropout layer can effectively prevent overfitting and improve the accuracy of the experiment.

The transformation Down operation is used to reduce the spatial dimension of the feature map, such transformation consisting of Batch Normalization, RELU, 1x1 convolution, and 2x2 pooling operations; where a convolution of 1x1 is used to preserve the number of feature maps and a pooling operation of 2x2 is used to reduce the resolution of the feature maps. The number of features grows linearly with the increase in the number of layers, however, the pooling operation may effectively reduce the resolution of the feature map, and thus the spatial resolution is reduced by the pooling operation to compensate for the increase in the number of feature maps caused by the increase in the number of layers.

The role of the Transition Up operation is to restore the spatial resolution of the input image, such a conversion consists of a transposed convolution, which is only used for the feature map of the last DenseBlock, since the last DenseBlock integrates the information of all previous denseblocks.

The segmentation model of the embodiment adopts Filter localization to link the feature maps according to depth.

In practical application, the bridge crack images are relatively high in resolution, which is commonly 512x512, 1024x1024, or even 2048x2048 resolution images, but the input images generally used in the deep learning network are relatively low in resolution, because if the network is designed to be high-resolution input, training time is long due to huge data volume, and even because the network model is too deep and improperly designed, the model cannot be converged, so that ideal output cannot be obtained, so the embodiment also provides a segmentation algorithm for the high-resolution crack images. The specific method comprises the following steps: firstly, sequentially cutting a high-resolution image by using a sliding window algorithm, and labeling cut image blocks in sequence; then, sequentially inputting the image blocks into the BCISM which is trained to complete crack extraction, and sequentially outputting the image blocks according to the original sequence; and finally, splicing the segmented images according to the original sequence to obtain the segmented images of the original image. As shown in fig. 9, a schematic diagram of dividing a 512 × 512 crack image is obtained by sequentially cutting an original image into (a), (b), (c), and (d), sequentially inputting the original image into the BCISM, sequentially outputting the original image into (a1), (b1), (c1), and (d1), and finally stitching the original image into a divided image.

Results and analysis of the experiments

1. Data set

In the embodiment, 983 bridge and pavement crack images with and without obstacles are collected, the original resolution of the collected crack image is 2448x3264, and for convenience of use in a subsequent algorithm, a series of processing is performed on the original image: firstly, the short edge of the collected image is zoomed to 2048 size, and a central 2048x2048 area is cut out; then, down-sampling the 2048 × 2048 resolution image into a 1024 × 1024 resolution image; secondly, adopting a sliding window algorithm for 1024x1024 images, and cutting the images into 16 images with the size of 256x256 without overlapping; and finally, performing data set expansion through the bridge crack generation model, wherein the proportion of various types of images in the final data set is shown in table 2:

TABLE 2 occupation ratio of each type of image in the data set

2. Experimental Environment

The program of the algorithm is developed by using Python language based on a mainstream deep learning open source framework TensorFlow; the experimental hardware environment is an Intel i7 processor and an NVIDIA GeForce GTX1070 display card; the software environment is the Ubuntu 16.04LTS operating system.

3. Comparative experiment

In order to verify the effectiveness and accuracy of the algorithm provided herein, four sets of comparative experiments were designed herein for verification, respectively. The first set of experiments was used to verify the effect of the DCGAN model and the BCIGM on generating 256x256 fracture images. The experiment comprises three small comparison experiments, wherein experiment 1 is the visual comparison of a DCGAN model and a BCIGM generated crack image; experiment 2 is a visual comparison of BCIGM using the SeLU activation function with the generation of fracture images using the ReLU activation function. The first 4 generated images were selected as representative for both sets of experiments. Experiment 3 is a comparison of the time of use of each batch when the BCIGM without 1x1 convolution kernel was trained with the BCIGM with 1x1 convolution kernel, the time of use being calculated by summing the running times of each batch and taking the average.

Fig. 10 shows the results of experiment 1, where the first and second rows are fracture images generated by DCGAN, the third and fourth rows are fracture images generated by BCIGM, the first and second rows are graphs of results when Epoch is 01, the third and fourth rows are graphs of results when Epoch is 03, the fifth and sixth rows are graphs of results when Epoch is 16, and the seventh and eighth rows are graphs of results when Epoch is 25. Fig. 11 shows the results of experiment 2, where the first and second rows show the fracture images generated using the ReLU activation function, the third and fourth rows show the fracture images generated using the SeLU activation function, the first and second columns of fig. 11(a) show the results when the Epoch is 01, the third and fourth columns of fig. 10(b) show the results when the Epoch is 03, the fifth and sixth columns of fig. 10(c) show the results when the Epoch is 16, and the seventh and eighth columns of fig. 11(d) show the results when the Epoch is 25. Table 3 shows the results of experiment 3. It can be seen by observing the two sets of graphs in experiment 1 that the crack image generated by the original DCGAN model has a serious grid phenomenon, and the 16 th Epoch is learned, and the crack image generated by the BCIGM provided herein is clear, basically has no grid phenomenon, has a very high similarity to the crack image actually acquired, and has a connected linear crack characteristic at the 3 rd Epoch. It can be seen from observation experiment 2 that the BCIGM adopting the ReLU activation function only has discontinuous black crack points during the 3 rd Epoch, and the BCIGM adopting the SeLU activation function has continuous crack rudiment, in addition, the finally generated crack image is clearer, the crack image is more consistent with the crack image shot in the actual scene, more importantly, the stability of the model is improved, and no model collapse phenomenon occurs during the test. It can be seen from the observation of table 3 that the training speed of the BCIGM becomes faster after the addition of the convolution kernel of 1 × 1, and although the lifting speed of each batch is not much, when the training data set is huge and the Epoch is increased, the time expenditure can be still reduced.

TABLE 3 BCIGM Effect on training speed with and without 1x1 convolution kernels

The second set of experiments was used to verify the effect of the dataset amplification method on the BCISM. The specific procedure for this experiment was as follows: firstly, directly using 983 manually collected bridge pavement crack images to train the BCISM without a data set amplification method; then, amplification was performed using the above-described data set amplification method, and BCISM was trained using the data set after amplification. And finally, 156 bridge pavement crack images are randomly selected from the test set of the data set to respectively test the trained BCISM. The test results were evaluated from two angles, the first angle being a visual comparison of the fracture segmentation results, as shown in fig. 12, where the first column (a) is the original, the second column (b) is the label graph, the third column (c) is the detection result of the unexpanded data set, and the fourth column (d) is the segmentation result after the expansion of the data set. The second angle is evaluated using the quantified index accuracy Pre and recall Rec. The accuracy rate is for the prediction result, which indicates how many of the samples predicted to be positive are true positive samples. And predicting a positive class includes both predicting a positive class as a positive class (TP) and predicting a negative class as a positive class (FP). The recall is for the original sample, which indicates how much of the positive class in the sample was correctly predicted. There are also two possibilities, one to predict the original positive class as a positive class (TP) and the other to predict the original positive class as a negative class (FN). In this context, TP represents the number of correctly detected and segmented crack region pixels, FP represents the number of wrongly determined crack region pixels, FN represents the number of pixels belonging to the crack region but not detected and segmented, the specific calculation formula of the accuracy Pre and the recall ratio Rec is shown as the following formula, and the experimental results are shown in table 4.

TABLE 4 Effect of Presence or absence of dataset amplification on BCISM

As can be seen from fig. 12, when the BCISM is tested by the data set training without data amplification, the matching degree of the crack portion and the crack in the original image is very low, the segmentation effect is extremely poor, and even any crack pixel cannot be detected, as shown in the fourth diagram of the third column in fig. 12. It can be seen from the observation table 4 that the training samples of the network model are seriously insufficient due to the fact that the first group of experiments do not expand the collected data, so that an under-fitting phenomenon is generated, which directly shows that the accuracy and the recall rate are extremely low, and when the BCISM trained by the data set after the data set is expanded is used for testing, the accuracy and the recall rate are obviously improved, and the under-fitting phenomenon of the network model is basically eliminated. In summary, sufficient training samples are crucial for the success or failure of the model training, which also proves that the data set amplification method proposed herein is extremely necessary.

A third set of experiments was used to verify the comparison of the BCISM proposed herein with the current mainstream semantic segmentation model. SegNet, FCN, DeepLab, FC-DenseNet56, FC-DenseNet67 and FC-DenseNet103 models are selected as comparison models of the BCISM, and comparison is performed on the aspects of Pre-training, parameter size, accuracy Pre, recall Rec, F1_ Score and time used by each graph during training. The comparison results are shown in table 5, wherein the precision Pre and the recall Rec are consistent with those described in the second set of comparison experiments, and F1_ score can be regarded as a weighted average of the model precision and the recall, and the check and calibration effects of the precision and the recall are considered. A specific calculation formula of F1_ Score is as follows.

TABLE 5BCISM vs. mainstream semantic segmentation model

From an observation of table 5, it can be seen that: compared with three models of SegNet, FCN8 and DeepLab, the BCISM has lower parameters and time without Pre-training, and all the models of Pre, Rec and F1_ Score are greatly improved. Compared with FC-DenseNet56, although the parameters and time are slightly higher, the parameters of BCISM, Pre, Rec and F1_ Score are all improved by about 4%; compared with FC-DenseNet67, BCISM has low parameters and time, and Pre, Rec, F1_ Score are all improved; compared with FC-DenseNet103, BCISM has less than one third of the parameters and the time per graph is reduced by about 0.1 second, but the results of Pre, Rec and F1_ Score of the BCISM and the DenseNet are similar and are all more than 92%; in summary, the BCISM achieves the most accurate detection result with less parameters and shorter training time without pre-training.

And the fourth group of experiments are used for verifying the comparison of the algorithm provided by the fourth group of experiments with the existing crack extraction algorithm on the detection and segmentation effects of the bridge pavement crack under the complex background. In order to prove that the algorithm provided by the invention is more suitable for complex backgrounds, the experiment specifically comprises the following steps: firstly, typical images containing various obstacles are selected from a data set as a test set, and then the algorithm provided by the invention is compared with a threshold segmentation algorithm, a random structure forest algorithm and a deep learning algorithm. The experimental results are shown in fig. 13, where the first column (a) is the original, the second column (b) is the label, the third column (c) is the extraction result using the threshold segmentation algorithm, the fourth column (d) is the extraction result using the random structure forest algorithm, the fifth column (e) is the extraction result using the deep learning algorithm, and the sixth column (f) is the extraction result using the text algorithm.

As can be seen from the observation of the experimental results, the detection result by the threshold segmentation algorithm generates a great amount of noise, which includes not only the noise generated by the complex texture of the road surface, but also the noise due to the presence of obstacles, as shown in the third column of fig. 13. The random structure forest algorithm makes a great improvement over the threshold segmentation algorithm, the noise amount is obviously reduced greatly, but the existence of the noise is still obvious, as shown in the fourth column of fig. 13. The detection result of the deep learning algorithm is relatively satisfactory, only contains a small amount of noise, but can cause the incomplete phenomenon of the extracted crack, wherein the second graph in the fifth column of fig. 13 is obvious compared with the third graph. The bridge pavement crack detection and segmentation algorithm provided by the invention can achieve an ideal effect under a complex background, and the extracted cracks are complete, free of noise and high in matching degree with the labels, as shown in the sixth row of fig. 13.

In this embodiment, crack detection and segmentation are finally performed on the bridge pavement crack image with different texture backgrounds and different obstacles in the test set, and a part of experimental results are given below, as shown in fig. 14. The first column (a), the fourth column (d) and the seventh column (g) are original images, the second column (b), the fifth column (e), the eighth column (h) are label images, the third column (c), the sixth column (f) and the ninth column (l) are result images of crack segmentation by using the algorithm provided herein. In addition, fig. 15 is a graph of accuracy of the algorithm herein, and fig. 16 is a plot of loss.

Claims

1. A method for detecting a crack of a bridge crack image under a complex background is characterized by comprising the following steps: the method comprises the following steps: step 1, carrying out geometric transformation, spatial filtering and linear transformation on an acquired original bridge crack image, and then carrying out data set amplification through a generation sub-model and a discrimination sub-model; the generation submodel sequentially comprises a full-connection layer, a dimension conversion layer, a first transposition convolutional layer, a second transposition convolutional layer, a third transposition convolutional layer, a fourth transposition convolutional layer and a fifth transposition convolutional layer; the sizes of convolution kernels of the first transposition convolutional layer, the second transposition convolutional layer, the third transposition convolutional layer, the fourth transposition convolutional layer and the fifth transposition convolutional layer are all 5x5, the step length is 2, and the number of convolution kernels is 512, 256, 128, 64 and 3 in sequence;

the distinguishing submodel sequentially comprises a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer, a fifth convolution layer, a sixth convolution layer, a seventh convolution layer and a Sigmoid activation function layer; the sizes of convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer, the fourth convolution layer, the fifth convolution layer and the sixth convolution layer are all 5x5, the steps are all 2, and the number of convolution kernels is 64, 128, 256, 512, 1024 and 2048 in sequence; the convolution kernel size of the seventh convolution layer is 1x 1;

step 2.1, performing 5x5 convolution on the amplified image for once;

step 2.3: performing Transition Down operation on the result of the step 2.2, and reducing the resolution of the crack image;

step 2.4: the number of layers of the DenseBlock is sequentially set to be 5 layers, 7 layers and 10 layers, and the steps 2.2 and 2.3 are sequentially repeated for 3 times;

step 2.5: inputting the result of the step 2.4 into a Bottleneck consisting of 12 layers, completing all down-sampling, and performing connection operation of a plurality of characteristics;

step 2.6: inputting an upper layer output result into an Up-sampling channel consisting of Transition Up and DenseBlock, wherein the number of layers in the down-sampling corresponding to DenseBlock is 10;

step 2.7: setting the number of layers of the DenseBlock in the step 2.6 as 7, 5 and 4 in sequence, and repeating the step 2.6 for 3 times;

and step 3: and 2, after the training in the step 2 is finished, inputting the image of the crack to be detected into the trained segmentation model for crack extraction.

2. The method for detecting the crack of the bridge crack image under the complex background according to claim 1, characterized in that: the first, second, third, and fourth transposed convolutional layers are all transposed convolved using the SeLU activation function.

3. The method for detecting the crack of the bridge crack image under the complex background according to the claim 1 or 2, characterized in that: the first, second, third, fourth, fifth, sixth, and seventh convolutional layers are all fully convolutional layers.

4. The method for detecting the crack of the bridge crack image under the complex background according to claim 1, characterized in that: each layers layer of the DenseBlock includes Batch Normalization, ReLU activation function, 3x3 convolution, and Dropout.

5. The method for detecting the crack of the bridge crack image under the complex background according to claim 1, characterized in that: the Transition Down consists of Batch Normalization, ReLU activation function, 1x1 convolution and 2x2 pooling operations.

6. The method for detecting the crack of the bridge crack image under the complex background according to claim 4 or 5, wherein the method comprises the following steps: the Batch Normalization specific algorithm is as follows:

wherein x is^kRepresenting the k-dimension of the input data, E x^k]The average value of the k-dimension is expressed,

represents the standard deviation;

the two learnable variables γ and β of the Batch Normalization algorithm are as in equation (3),

γ and β are used to restore the data distribution learned by the previous layer.

7. The method for detecting the crack of the bridge crack image under the complex background according to claim 4 or 5, wherein the method comprises the following steps: the ReLU activation function is a continuous nonlinear activation function, and is calculated as shown in formula (4):

ReLU(x)＝max(0,x) (4)。

8. the method for detecting the crack of the bridge crack image under the complex background according to claim 4 or 5, wherein the method comprises the following steps: each of the transitions Down includes a convolution and each Transition Up includes a transposed convolution.

9. The method for detecting the crack of the bridge crack image under the complex background according to claim 4 or 5, wherein the method comprises the following steps: after the downsampling is completed in the step 2.5, the connection operation is performed on the plurality of output features, which is specifically as in a formula (1):

X_l＝H_l([X₀X₁,…,X_l-1]) (1)

wherein l represents the number of layers, X_lRepresents the output of layer l, [ X ]₀X₁…X_l-1]Representing the connection of the output profiles of the 0 to l-1 layers, H_l(.) represents the combination of Batch Normalization, ReLU, and convolution of 3x 3.