CN111721770A

CN111721770A - Automatic crack detection method based on frequency division convolution

Info

Publication number: CN111721770A
Application number: CN202010540557.XA
Authority: CN
Inventors: 范衠; 陈颖; 李冲
Original assignee: Shantou University
Current assignee: Shantou University
Priority date: 2020-06-12
Filing date: 2020-06-12
Publication date: 2020-09-29

Abstract

The embodiment of the invention discloses an automatic crack detection method based on frequency division convolution, which comprises the following steps: shooting a road image by using a camera, and creating a training set and a test set of the road crack image; creating a deep convolution neural network comprising a frequency division convolution, a frequency division transposition convolution, a hole convolution module and a jump connection structure; training the deep convolutional neural network by using the established training set; and testing the trained deep convolution neural network model by using the test set, and outputting a crack image. The method has the advantages of simple detection process, high detection efficiency, low labor intensity, convenience in carrying, strong operability and the like.

Description

Automatic crack detection method based on frequency division convolution

Technical Field

The invention relates to the field of structural health detection and evaluation, in particular to an automatic crack detection method based on frequency division convolution.

Background

With the rapid development of Chinese economy, the popularization and construction of the Chinese road network have been rapidly developed, and the perfectness and flatness of the road surface are important factors for ensuring the running of vehicles on the highway. Cracks are important signs of road damage, if the road surface is uneven, cracks and the like, the service life of the road and the safety of drivers are seriously influenced, and the health condition of the drivers needs to be evaluated regularly, so that the cracks of the road and the bridge are detected to be of great importance.

At present, the crack detection method of the road and the bridge is mainly based on the traditional image processing algorithm and human eye recognition. The crack detection and identification are carried out by human eyes alone, and the efficiency is not high. The image processing method is mainly used for detecting cracks of background images of the same material and texture, and the color images cannot be directly subjected to crack detection. The road crack detection based on the deep learning framework can realize the crack detection processing of the color image, can realize the end-to-end image processing and does not need the sliding block processing of the convolutional neural network. Therefore, the road crack detection method based on the deep learning frame can realize the automatic detection of the road crack. Therefore, how to improve the monitoring efficiency and effect of pavement crack detection is a technical problem to be overcome in the field of pavement crack detection.

Disclosure of Invention

Based on the above, the invention aims to provide an automatic crack detection method based on frequency division convolution. The method can solve the problems of low positioning precision, large error and the like in human eye observation and image processing crack detection.

In order to solve the above-mentioned problems of the prior art, an embodiment of the present invention provides an automatic crack detection method based on frequency division convolution, which specifically includes the following steps:

s1, shooting a road image by a camera, and creating a training set and a test set of the road crack image;

s2, creating a deep convolution neural network comprising a frequency division convolution, a frequency division transposition convolution, a hole convolution module and a jump connection structure;

s3, training the deep convolutional neural network by using the established training set;

and S4, testing the trained deep convolution neural network model by using the test set, and outputting a crack image.

Further, the step S1 specifically includes:

s11, shooting a crack image by using all intelligent terminals of the user, or dividing the crack image into a training set and a testing set by using a common crack image data set CFD, AigleRN and other crack image data sets;

s12, constructing a crack image database by the collected surface crack images of different structures, performing data enhancement on the constructed crack image database, expanding a data set, performing artificial label marking on the crack area of the crack image in the expanded crack image database, and then dividing the image in the crack image database into a training set and a test set.

Further, the step S2 specifically includes:

s21, building a deep neural network structure model: determining the number of layers of an encoder and a decoder in the deep convolutional neural network volume, the number of feature maps contained in the high frequency and the low frequency of each frequency division convolutional layer, the number of layers of pooling layers, the size and the training step length of a sampling kernel in each pooling layer, the number of layers of frequency division transpose convolutional layers, the number of feature maps contained in the high frequency and the low frequency of each deconvolution layer, a connection mode of jump connection and the size of a hollow ratio in a hollow convolutional module;

s22, selecting a training strategy of the deep neural network: selecting a cost function in the deep neural network training as a cross entropy loss function and Relu of an activation function, adding a weight attenuation regularization item into the loss cost function, and adding dropout into a convolutional layer to reduce overfitting, wherein an optimization algorithm SGD is used in the deep neural network training;

s23, building frequency division convolution layer X ═ X^H,X^LY ═ Y^H，Y^LDenotes input and output, where Y^L＝Y^H ^→L+Y^L→LAnd Y is^H＝Y^H→H+Y^L→HWhich is indicative of a change in the output frequency,

W^H＝[W^H→H，W^L→H]，W^L＝[W^H→L，W^L→L]representing the variation of the frequency of the convolution kernel, the variation of high and low frequencies in the frequency division convolution operation is represented by the following formula:

wherein (p, q) represents the position of the pixel point, k represents the size of the convolution kernel, σ (·) represents the activation function, b represents the bias variation, and X^H，X^LHigh and low frequency profiles, Y, representing the input profile, respectively^H，Y^LHigh-frequency and low-frequency feature maps respectively representing the output feature map, H → L representing the feature map converted from high frequency to low frequency, L → H representing the feature map converted from low frequency to high frequency, H → H representing the feature map converted from high frequency to high frequency, L → L representing the feature map converted from low frequency to low frequency, m and n being used to determine the range of the local receptive field with (p, q) as the pixel center point on the input X;

s24, the built frequency division transpose convolution layer X ═ X^H,X^LAnd

representing an input and an output, wherein

And

representing variation of the output of the frequency-dividing transpose, W^H＝[W^H→H，W^L→H]，W^L＝[W^H→L，W^L→L]The variation of the high and low frequencies of the convolution is expressed, and the variation of the high and low frequencies in the frequency division transposition convolution operation is expressed by the following formula:

and

respectively representing high-frequency and low-frequency characteristic graphs of a frequency division transposition convolution output characteristic graph, wherein values of m and n are used for determining the range of a local receptive field taking (p, q) as a pixel central point on an input X, and k represents the size of a convolution kernel;

s25, connecting the encoder and the decoder in the deep convolutional neural network through jump connection;

s26, in the deep convolutional neural network, the input image and the encoder part and all encoders are connected through jump connection, so that the transfer of image information can be realized;

s27, in the cavity convolution module in the deep convolution neural network, the input of the cavity convolution module is the output of the feature map in the last convolution layer of the encoder, the cavity convolution module is composed of convolution layers with different cavity rates, and the output of the cavity convolution module is obtained by superposition and fusion of feature maps obtained by convolution with different cavity rates;

s28, using a deep learning library package in the deep convolutional neural network: caffe, Tensorflow and PyTorch realize the deep neural network structure, model training is carried out according to the divided training set and the divided testing set, parameters of the deep neural network are learned by continuously reducing function values of the loss function, and parameter values in the deep neural network model are determined.

Further, the step S3 specifically includes:

and S31, training the deep convolutional neural network by using a training set according to the steps S21, S22, S23, S24, S25, S26, S27 and S28, continuously optimizing parameters of the neural network through backward propagation, reducing the value of a loss function, optimizing the network, and realizing end-to-end training.

Further, the step S4 specifically includes:

s41, testing the trained neural network model by using a test set according to the step S31;

and S42, normalizing the output value of the neural network model and outputting a probability chart of the crack image.

Drawings

FIG. 1 is a flow chart of an automated crack detection method based on void convolution according to the present invention;

FIG. 2 is a flow chart of a deep convolutional neural network model according to an embodiment of the present invention;

FIG. 3 is a flow diagram of a frequency-division convolution model according to an embodiment of the present invention;

FIG. 4 is a flow diagram of a frequency division transpose convolution model according to an embodiment of the present invention;

FIG. 5 is a flow diagram of a hole convolution module in accordance with an embodiment of the present invention;

FIG. 6 is a diagram of the output of the deep convolutional neural network in accordance with an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.

The experimental environment of the embodiment of the invention is an outdoor environment which is an experimental building, a wall and a road surface in a highway. In this embodiment, the crack image is selected as a public area of the outdoor environment.

In this embodiment, a PC including an Nvidia video card is used. The implementation method is an Ubuntu method, a Tensorflow method platform is built, and an open source software library in Tensorflow is adopted.

Referring to fig. 1, an automatic crack detection method based on void convolution according to an embodiment of the present invention includes the following steps:

and S1, shooting road images by using a camera, and creating a training set and a testing set of road crack images.

In the present example, a common data set CFD is used, which contains 118 original color images and 118 label data images, and the data set is divided into a training set test set, wherein each of the training sets contains 100 original color images and corresponding 100 label data images, and the test set contains 18 original color images and corresponding 18 label data images.

Meanwhile, in order to expand the image data volume and perform data enhancement on the crack images in the CFD data set, the original color images and the label data images in each piece of divided data are rotated and cut to increase the number of the crack images in the embodiment of the invention.

And S2, creating a deep convolutional neural network comprising an encoder, a decoder, a hole convolutional module and a jump connection structure.

The deep convolutional neural network model adopted in the embodiment of the invention is based on a U-net model, and the network model is improved. Please refer to fig. 2 for a flowchart of a deep convolutional neural network model used in an embodiment of the present invention.

The establishment of the deep neural network large model structure comprises the steps of determining the number of layers of an encoder and a decoder in a deep convolutional neural network volume, the number of characteristic graphs contained in high frequency and low frequency in each frequency division convolutional layer, the number of layers of pooling layers, the size and training step length of a sampling kernel in each pooling layer, the number of layers of frequency division transpose convolutional layers, the number of characteristic graphs contained in high frequency and low frequency in each deconvolution layer, the connection mode of jump connection and the size of hollow ratio in a hollow convolutional module.

Selecting a training strategy of the deep neural network: the cost function in the deep neural network training is selected as a cross entropy loss function and Relu of an activation function, meanwhile, a weight attenuation regularization item is added into the loss cost function, dropout is added into a convolutional layer to reduce overfitting, and an optimization algorithm SGD is used in the deep neural network training.

In the embodiment of the present invention, a frequency division convolution layer (as shown in fig. 3) X ═ X^H,X^LY ═ Y^H,Y^LDenotes input and output, where Y^L＝Y^H→L+Y^L→LAnd Y is^H＝Y^H→H+Y^L→HWhich is indicative of a change in the output frequency,

in the embodiment of the invention, the frequency division transpose convolution layer (as shown in fig. 4) is built, wherein X is { X ═ X^H，X^LAnd

representing an input and an output, wherein

And

and

the high-frequency characteristic diagram and the low-frequency characteristic diagram respectively represent a frequency division transposition convolution output characteristic diagram, the values of m and n are used for determining the range of a local receptive field taking (p, q) as the central point of a pixel on an input X, and k represents the size of a convolution kernel.

In the embodiment of the invention, an activation function adopted by a convolution layer in a deep neural network large model is a ReLU, a sigmoid activation function is adopted in the output of the last layer to output a logit, and a loss function formula used in the embodiment of the invention is as follows:

where α and β are hyper-parameters,

is the true value of the tag data and,

is a predicted value of the original image through the depth network. Meanwhile, the embodiment of the invention uses an Adam optimization algorithm for optimization, and the learning rate is 0.001 to minimize the loss function.

In the embodiment of the invention, the encoder part and the decoder part in the U-net structure in the deep convolutional neural network are connected through a contract, and the jump connection function can realize the transmission of the texture information of the image to the decoder, thereby avoiding the loss of image characteristics caused by a pooling layer or downsampling.

Meanwhile, in the deep convolutional neural network, the input image and the encoder part and all the encoders are connected through jumping connection, so that the transmission of image information can be realized, the input image can still keep the original characteristic information of the input image through jumping connection input after a series of convolution and pooling, and the loss of image texture information is avoided.

The deep learning library of the deep neural network used in the embodiment of the invention is TensorFlow, cross validation is carried out according to the divided training set and validation set by utilizing the deep learning library, the parameter of the deep neural network is learned by continuously reducing the loss function, and the value of the parameter in the large model of the deep neural network is determined.

In the hole convolution module (as shown in fig. 5) in the deep convolution neural network, the period input is the output of the feature map in the last convolution layer of the encoder, and the output of the hole convolution module is obtained by superposition and fusion of the feature maps obtained by convolution with different hole rates.

The deep convolutional neural network structure is realized by using a deep learning library comprising Caffe and Tensorflow, model training is carried out according to a divided training set and a verification set, parameters of the deep neural network are learned by continuously reducing function values of a loss function, and parameter values in a deep neural network model are determined.

And S3, training the deep convolutional neural network by using the created training set.

The deep convolutional neural network is trained by utilizing a training set, parameters of the neural network are continuously optimized through back propagation, the value of a loss function is reduced, the network is optimized, and end-to-end training is realized.

Testing the trained neural network model by using the test set, then normalizing the output value of the neural network model, and outputting a probability map of the crack image, referring to fig. 6, which sequentially comprises from left to right: real images, labels, prediction results.

The above examples only represent the preferred embodiments of the present invention, and the description thereof is more specific and detailed, but not to be construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An automatic crack detection method based on frequency division convolution is characterized by specifically comprising the following steps of:

2. The method according to claim 1, wherein the step S1 specifically includes:

s11, shooting a crack image by using an intelligent terminal, or dividing the crack image into a training set and a testing set by using a common crack image data set CFD and AigleRN;

3. The method according to claim 2, wherein the step S2 specifically includes:

s23, building frequency division convolution layer X ═ X^H,X^LY ═ Y^H,Y^LDenotes input and output, where Y^L＝Y^H→L+Y^L ^→LAnd Y^H＝Y^H→H+Y^L→HRepresenting a change in output frequency, W^H＝[W^H→HW^L→H]，W^L＝[W^H→L，W^L→L]Representing the variation of the frequency of the convolution kernel, the variation of high and low frequencies in the frequency division convolution operation is represented by the following formula:

wherein (p, q) represents the position of the pixel point, k represents the size of the convolution kernel, σ (·) represents the activation function, b represents the bias variation, and X^H，X^LHigh and low frequency profiles, Y, representing the input profile, respectively^H，Y^LRespectively representing output characteristic diagramsHigh-frequency and low-frequency feature maps, wherein H → L indicates that the feature map is converted from high frequency to low frequency, L → H indicates that the feature map is converted from low frequency to high frequency, H → H indicates that the feature map is converted from high frequency to high frequency, L → L indicates that the feature map is converted from low frequency to low frequency, and m and n are used for determining the range of the local receptive field with (p, q) as the central point of the pixel on the input X;

s24, the built frequency division transpose convolution layer X ═ X^H,X^LAnd

representing an input and an output, wherein

And

representing variation of the output of the frequency-dividing transpose, W^H＝[W^H→H，W^L ^→H]，W^L＝[W^H→L，W^L→L]The variation of the high and low frequencies of the convolution is expressed, and the variation of the high and low frequencies in the frequency division transposition convolution operation is expressed by the following formula:

wherein the content of the first and second substances,

and

high and low frequency profiles representing the frequency division transposed convolution output profile, respectively, the values of m and n being used to determine the range of the local field of view on the input X with (p, q) as the pixel center, kRepresents the convolution kernel size;

4. The method according to claim 3, wherein the step S3 specifically includes:

5. The method according to claim 4, wherein the step S4 specifically includes: