CN116416244A

CN116416244A - Crack detection method and system based on deep learning

Info

Publication number: CN116416244A
Application number: CN202310492746.8A
Authority: CN
Inventors: 刘国良; 石昌腾; 田国会
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2023-04-26
Filing date: 2023-04-26
Publication date: 2023-07-11

Abstract

The invention provides a crack detection method and system based on deep learning, and relates to the technical field of crack detection. The method comprises the steps of obtaining an original image of the crack; taking a deep V < 3+ > model comprising an encoder module and a decoder module as a basic model, wherein the encoder module comprises a main feature extraction network and a pyramid part, the pyramid part of the encoder module is fused with an SA-Net attention module, and a convolution layer after the fusion of shallow features and deep features of the decoder is replaced by a depth separable convolution to build the SA-deep V < 3+ > model; inputting the original image of the crack into an SA-deep V < 3+ > model, extracting features and obtaining a crack prediction image. The invention has higher detection accuracy, higher efficiency and stronger generalization capability for cracks.

Description

Crack detection method and system based on deep learning

Technical Field

The invention belongs to the technical field of crack detection, and particularly relates to a crack detection method and system based on deep learning.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Cracks are one of the common disasters, and exist not only in buildings but also on roads and airport roads. On a building, a general crack has no effect on the use of the building, but if the width of the crack exceeds a certain limit value, the crack becomes a harmful crack, and the existence of the harmful crack seriously affects the service life of the building. Similarly, on roads and airport roads, smaller cracks do not affect the use of roads and airport roads, but if the road cracks are not repaired in time, the road cracks are more likely to further aggravate road damage due to repeated load and severe weather effects, even structural damage may occur, and accidents further occur. Therefore, it is necessary to detect the crack more accurately, timely and efficiently.

The inventor finds that the traditional detection method can detect cracks, but has the advantages of lower efficiency, larger limitation, high data cost and low accuracy.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a crack detection method and a crack detection system based on deep learning, which are characterized in that an algorithm is more concentrated on crack pixels, and the crack detection accuracy is higher, the efficiency is higher and the generalization capability is stronger.

To achieve the above object, one or more embodiments of the present invention provide the following technical solutions:

the first aspect of the invention provides a crack detection method based on deep learning.

A crack detection method based on deep learning comprises the following steps:

acquiring an original image of the crack;

taking a deep V < 3+ > model comprising an encoder module and a decoder module as a basic model, wherein the encoder module comprises a main feature extraction network and a pyramid part, the pyramid part of the encoder module is fused with an SA-Net attention module, a convolution layer after fusion of shallow features and deep features in the decoder network is replaced by depth separable convolution, and the SA-deep V < 3+ > model is built, wherein the shallow features are features extracted by convolution with fewer times in the network, and the deep features are features extracted by convolution with more times in the network;

inputting the original image of the crack into an SA-deep V < 3+ > model, extracting features and obtaining a crack prediction image.

Further, inputting the original image of the crack into an SA-deep V < 3+ > model, extracting features and obtaining a crack prediction image, wherein the method specifically comprises the following steps:

inputting the original image of the crack into a main feature extraction network of an encoder, respectively extracting shallow features after 3-5 convolutions of the main feature extraction network, and extracting deep features after more than 14 convolutions of the main feature extraction network;

inputting deep features into the pyramid part, carrying out parallel sampling on the deep features by using cavity convolution with different sampling rates, and capturing the context of the image in a plurality of proportions to obtain a feature map after parallel sampling;

the SA-Net attention module is utilized to distribute attention weights to the feature graphs after parallel sampling, and the attention weights and the corresponding feature graphs are weighted, so that deep features after feature weighting are obtained;

upsampling the deep features weighted by the features, inputting the upsampled result and the shallow features into a decoder together for stacking, and performing depth separable convolution on the stacked features to obtain an effective feature map;

and up-sampling the effective feature map to obtain a crack prediction image.

Further, the SA-Net attention module is utilized to distribute attention weights to the feature graphs after parallel sampling, and the attention weights and the corresponding feature graphs are weighted, so that deep features after feature weighting are obtained, specifically:

the SA-Net firstly divides the feature map after parallel sampling into G groups to obtain G sub-features, each sub-feature is divided into two branches along the channel dimension, one branch is used for generating a space attention pattern, the other branch is used for generating a channel attention pattern, each sub-feature is captured in the training process, and a corresponding weight coefficient is generated for each sub-feature through the SA-Net attention module;

weighting each sub-feature and the corresponding weight coefficient to obtain a weighted sub-feature;

and (3) utilizing a shuffling mechanism to enable each weighted sub-feature to flow in the channel dimension, and finally integrating all the weighted sub-features in the channel dimension to obtain the processed integral feature, namely the deep feature.

Further, the depth separable convolution consists of a channel-by-channel convolution and a point-by-point convolution:

in the channel-by-channel convolution, one convolution kernel takes charge of one channel height, one channel is only convolved by one convolution kernel, and the number of channels of the characteristic map generated in the process is identical to the number of channels of the input;

the point-by-point convolution only performs weighted combination in the channel direction, and the number of feature images generated by generating effective feature images is determined by the number of convolution kernels.

Further, the loss function of the SA-deep V < 3+ > model adopts a Focal loss and a Dice loss, wherein the calculation formula of the Focal loss is as follows:

y and y' refer to a label value and a predicted value of the image, respectively; a is a balance factor; gamma is a regulator; the calculation formula of the Dice is:

y _i and y _i ' refers to a label value and a predicted value of an image, respectively; n refers to the total number of pixels in the image.

Further, the dynamic compensation weights are fused to the Focal loss:

wherein y and y' refer to the label value and the predicted value, beta, respectively, of the image ₁ And beta ₂ Is a dynamic compensation weight coefficient.

Further, beta ₁ And beta ₂ The calculation is performed by the following formula:

wherein F is _p Is false positive, F _n P is the total number of cracked pixel points in the image; s is S _n As the total number of pixels in the image,

is the percentage of split pixels in the whole image,/>

Is the percentage of non-cracked pixels in the entire image.

The second aspect of the invention provides a crack detection system based on deep learning.

A deep learning based fracture detection system, comprising:

an image acquisition module configured to: acquiring an original image of the crack;

a model building module configured to: taking a deep V < 3+ > model comprising an encoder module and a decoder module as a basic model, wherein the encoder module comprises a main feature extraction network and a pyramid part, the pyramid part of the encoder module is fused with an SA-Net attention module, a convolution layer after fusion of shallow features and deep features in the decoder network is replaced by depth separable convolution, and the SA-deep V < 3+ > model is built, wherein the shallow features are features extracted by convolution with fewer times in the network, and the deep features are features extracted by convolution with more times in the network;

a crack detection module configured to: inputting the original image of the crack into an SA-deep V < 3+ > model, extracting features and obtaining a crack prediction image.

A third aspect of the present invention provides a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the steps in the deep learning based crack detection method according to the first aspect of the present invention.

A fourth aspect of the invention provides an electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the steps in the deep learning based fracture detection method according to the first aspect of the invention when the program is executed.

The one or more of the above technical solutions have the following beneficial effects:

the invention uses the fusion loss function of Focal loss and Dice loss after fusing dynamic compensation weights on the basis of SA-deep V < 3+ >, constructs the overall algorithm of DWF+SA-deep V < 3+ >, performs data training and testing on the basis of the algorithm, and obtains better crack extraction effect.

Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

FIG. 1 is a diagram showing the overall structure of SA-deep V3+ according to the first embodiment.

Fig. 2 is a diagram showing the construction of the first embodiment SA-Net attention module.

Fig. 3 is a schematic diagram of a general convolution structure of the first embodiment.

Fig. 4 is a schematic diagram of a channel-by-channel convolution structure according to the first embodiment.

Fig. 5 is a schematic diagram of a point-by-point convolution structure of the first embodiment.

Fig. 6 (a) is an original picture of the crack of the first embodiment.

Fig. 6 (b) is a first embodiment crack signature.

FIG. 6 (c) shows the result of the first embodiment deep V3+ detection.

FIG. 6 (d) shows the DWF+SA-deep V3+ detection of the first embodiment.

Fig. 7 is a system configuration diagram of the second embodiment.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention.

Embodiments of the invention and features of the embodiments may be combined with each other without conflict.

The core structure of the original deep V < 3+ > neural network consists of two modules, namely a spatial pyramid pooling module and a coding and decoding module.

The specific characteristic extraction process is as follows:

inputting the image into a trunk feature extraction network, obtaining shallow layer features and deep layer features through the trunk feature extraction network, inputting the deep layer features into a cavity pyramid pooling module ASPP, respectively rolling and pooling with four cavity convolution layers and one pooling layer to obtain five feature layers, then splicing the feature layers, sending the feature images into a 1x1 convolution layer for operation, and then obtaining a new feature layer A through up-sampling.

The shallow layer features are directly subjected to 1x1 convolution to obtain dimension reduction features B, then are spliced with the dimension reduction features A, and finally output a prediction result with the same size as the original image through a 3x3 convolution layer and up-sampling.

In the invention, shallow features are extracted and acquired after convolution of a trunk feature extraction network for a few times, and deep features are extracted after convolution of the whole trunk feature network.

The invention has the overall conception that:

the traditional detection method can detect cracks, but has the advantages of lower efficiency, larger limitation, high data cost and low accuracy. With the development of computer power, the application of the deep learning method in crack detection is gradually increased, and meanwhile, the accuracy rate is higher, the efficiency is higher, and the generalization capability is strong. The invention provides a crack detection algorithm based on deep learning according to the current situation.

The invention mainly provides a new and more accurate crack detection algorithm based on the existing deep learning network framework, and the specific improvement content can be summarized into the following aspects:

1. the Sa-Net attention mechanism is added into the original deep V < 3+ > neural network pyramid module.

2. The loss function of the Diceloss and Focal loss fusion is used, so that loss calculation interference caused by small occupation of the crack prospect in the image is reduced, and the attention of the network to the crack is improved.

3. The dynamic compensation weight of the new loss function is provided, and can be used in combination with the loss functions such as a focus loss function, a cross entropy loss function and the like to compensate the problem that the ratio of the crack pixels in the whole image is too small in the crack detection process. And fusing the dynamic compensation weight with the Focal loss, using the dynamic compensation weight in combination with the Dice loss to replace an original cross entropy loss function, and forming a DWF+SA-deep V < 3+ > algorithm model together with the SA-deep V < 3+ >.

The algorithm improvement of the invention mainly aims to make the algorithm more concentrated on the crack pixels, and the practical experiment is carried out after the training on the data to obtain good effects.

Example 1

The embodiment discloses a crack detection method based on deep learning.

As shown in fig. 1, a crack detection method based on deep learning includes the following steps:

acquiring an original image of the crack;

taking a deep V < 3+ > model comprising an encoder module and a decoder module as a basic model, wherein the encoder module comprises a main feature extraction network and a pyramid part, the pyramid part of the encoder module is fused with an SA-Net attention module, a convolution layer after fusion of shallow features and deep features in the decoder network is replaced by depth separable convolution, and the SA-deep V < 3+ > model is built, wherein the shallow features are features extracted by 3-5 times of convolution in the network, and the deep features are features extracted by more than 14 times of convolution in the network;

inputting the original image of the crack into a main feature extraction network of an encoder, respectively extracting shallow features after convolution for a few times by the main feature extraction network, and extracting deep features after convolution for a plurality of times by the main feature extraction network;

and up-sampling the effective feature map to obtain a crack prediction image.

In this embodiment, the shallow features are features obtained by 3-5 times of convolution extraction in the network, and the deep features are features obtained by more than 14 times of convolution extraction in the network.

Specific:

integral network frame SA-deep V3+

To reduce the information loss caused by upsampling, the 3x3 convolution layer after the fusion of the decoder portion shallow features and deep features in the deep v3+ model is replaced with a depth separable convolution. Meanwhile, in order to make the algorithm pay more attention to the crack pixels, a pyramid part (ASPP) in the model is fused with an SA-Net attention module, and a specific network structure is shown in fig. 1.

The core structure of SA-deep V < 3+ > consists of two parts, namely an encoder module and a decoder module.

The main network of the encoder part model is mobilenet v2, and unlike the commonly used res net residual structure, the mobilenet v2 firstly performs dimension ascending on an input feature matrix through 1x1 convolution, increases the size of a channel, then performs convolution processing through a 3x3 deep convolution kernel (DW convolution), and finally performs dimension descending through a 1x1 convolution kernel. Meanwhile, the mobilenet v2 adopts Rectified Linear Unit (ReLU 6) activation function, when the input value is smaller than 0, the default is set to zero, and the input value is kept unchanged between [0,6], but when the input value is larger than 6, the output value is set to 6.

Meanwhile, for the spatial feature pyramid part in the encoder module, SA-Net is fused. For a given feature map, SA-Net first divides the given feature map into G groups, then captures each sub-feature during training, and generates a corresponding weight coefficient for each sub-feature through the attention module. In particular, it is that each sub-feature is split into two branches along the channel dimension, one branch being used to generate the spatial attention pattern and the other branch being used to generate the channel attention pattern, so that the model can focus more on meaningful parts. Where for the implementation of the channel attention mechanism, SA-Net does not use conventional Squeeze Excitation (SE), but rather performs global pooling first, then moves and scales the channel vector through a pair of parameters, and finally performs sigmoid activation on the value. The SA-Net also adds a shuffling mechanism, so that each group of information flows in the channel dimension, and finally the grouped information is integrated in the channel dimension, thereby achieving the processed integral characteristics. The SA-Net attention module architecture is shown in FIG. 2.

For the decoder part we replace the normal convolution with a depth separable convolution. Unlike the normal convolution approach, the depth separable convolution consists of a channel-by-channel convolution and a point-by-point convolution. In the channel-by-channel convolution, one convolution kernel is responsible for one channel high, one channel is only convolved by one convolution kernel, and the number of channels of the feature map generated by the process is identical to the number of channels of the input. The point-by-point convolution only performs weighted combination in the channel direction to generate new feature images, and the number of the generated feature images is determined by the number of convolution kernels. Thus, the depth separable convolution parameters and computational cost are smaller and the loss in the feature extraction process is smaller than in normal convolution. Exemplary diagrams of channel-by-channel and point-by-point convolution structures in normal convolution and depth separable convolution are shown in fig. 3, 4, and 5, respectively.

(II) loss function improvement

Through observation analysis of a large number of crack images, we find that, unlike traditional semantic segmentation, cracks tend to occupy a small part of the images, and therefore, for loss calculation, if a cross entropy loss function (Cross Entropy Loss) is used, training interference is caused due to the fact that the background occupies too large component.

For this case, the Focal Loss function (Focal Loss) can solve the problem of imbalance of foreground and background to a certain extent, so that a small number of target classes are weighted, and samples with wrong classification are weighted. The focus loss function is modified on the basis of the cross entropy loss function, and the calculation formula of the two-class cross entropy loss function is as follows:

y and y' refer to a label value and a predicted value of the image, respectively; the label value of the image indicates whether the image is actually a crack, if so, the value is 1, and the non-crack value is 0; the predicted value of the image represents the value output through the detection algorithm presented herein.

Focal loss is a modification based on a cross entropy loss function, and a factor gamma, gamma >0 is added on the original basis, so that the loss of an easy-to-classify sample is reduced, and the model focuses on difficult and wrong classified samples. Simultaneously changing the magnitude of γ also adjusts the rate at which simple sample weights decrease, as γ increases, the impact of the adjustment factor increases. In addition, a balance factor a is added to balance the uneven proportion of the positive and negative samples. The calculation formula of the Focal loss is as follows:

y and y' refer to the label value and the predicted value of the image, respectively.

Meanwhile, the invention also introduces a Dice, which is suitable for image binary segmentation, and can alleviate the problem of unbalanced number of positive and negative samples to a certain extent. For predictions with higher confidence, a lower Dice coefficient will be obtained, thus making Dice less, while for predictions with lower confidence, a higher Dice coefficient will be obtained, thus making Dice greater. The calculation formula of the Dice is as follows:

y _i and y _i ' refers to the label value and the predicted value of the image, respectively, and N refers to the total number of pixels in the image.

The single use of Dice loss has the problem of loss saturation, so the patent uses a mixture of Focal loss and Dice loss.

In addition, the invention provides a dynamic compensation weight based on the F1 score. The method mainly uses the ratio of false positive to false negative to true positive in the model training process as a weight coefficient, so that loss compensation is performed when the model prediction result is poor. The Focal loss calculation formula after fusing the dynamic compensation weight is as follows:

similarly, y and y' refer to the label value and the predicted value, β, respectively, of the image ₁ And beta ₂ Is a dynamic compensation weight coefficient.

β ₁ And beta ₂ The calculation can be performed by the following formula:

wherein F is _p Refers to false positive, F _n Refers to false negatives, and P refers to the total number of cracked pixels in the image. When the predicted false positives are more, the corresponding coefficient is increased, so that the corresponding prediction loss about the true crack pixels is increased, when the predicted false negatives are more, i.e. the number of the false-crack pixels which are misjudged as being non-crack pixels in the prediction process is more, the corresponding loss of the background pixels is increased, and when the final prediction result is gradually better, beta ₁ Gradually approach toAt 0.S is S _n As the total number of pixels in the image,

is the percentage of cracked pixels in the entire image. />

Is the percentage of non-cracked pixels in the entire image. Because the percentage of non-cracked pixels in the image is larger, the coefficient can compensate the problem that the prediction loss is smaller because the cracked pixels occupy smaller in the whole image.

(III) Overall algorithm Structure DWF+SA-deep V3+

The patent uses the fusion loss function of Focal loss and Dice loss after fusing dynamic compensation weights on the basis of SA-deep V < 3+ >, and constructs the overall algorithm of DWF+SA-deep V < 3+ >. Data training and testing are carried out on the basis of the algorithm, and a good crack extraction effect is obtained. The detection results of the original picture, the label graph and the deep v3+ on the common data set craackforest and the detection results of the DWF+SA-deep v3+ algorithm are shown in fig. 6 (a), 6 (b), 6 (c) and 6 (d).

The DWF+SA-deep V < 3+ > algorithm provided by the invention can effectively realize crack detection. The original algorithm and the DWF+SA-deep V3+ algorithm are trained and verified on the whole CrackForest data set, and compared to obtain a conclusion: the algorithm provided by the invention has better crack detection effect, the crack detection result is more complete and finer, the result is closer to a label image, the F1 Score (F1 Score) is improved by 0.057, the average cross-over ratio (MIOU) is improved by 0.07, the crack detection is carried out in actual use, the better effect is obtained, and the crack detection task can be well mastered. The quantitative results of the two algorithms are compared as shown in table 1.

Table 1 comparison of the results of the two algorithm tests

Example two

The embodiment discloses a crack detection system based on deep learning.

As shown in fig. 7, a crack detection system based on deep learning includes:

Example III

An object of the present embodiment is to provide a computer-readable storage medium.

A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in a method for deep learning based crack detection as described in embodiment 1 of the present disclosure.

Example IV

An object of the present embodiment is to provide an electronic apparatus.

An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the steps in the deep learning based fracture detection method as described in embodiment 1 of the present disclosure when the program is executed.

The steps involved in the devices of the second, third and fourth embodiments correspond to those of the first embodiment of the method, and the detailed description of the embodiments can be found in the related description section of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media including one or more sets of instructions; it should also be understood to include any medium capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any one of the methods of the present invention.

It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented by general-purpose computer means, alternatively they may be implemented by program code executable by computing means, whereby they may be stored in storage means for execution by computing means, or they may be made into individual integrated circuit modules separately, or a plurality of modules or steps in them may be made into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.

While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims

1. The crack detection method based on deep learning is characterized by comprising the following steps of:

acquiring an original image of the crack;

2. The method for detecting cracks based on deep learning according to claim 1, wherein the method for detecting cracks based on deep learning is characterized in that an original image of cracks is input into an SA-deep V < 3+ > model, and feature extraction is performed to obtain predicted images of the cracks, specifically:

and up-sampling the effective feature map to obtain a crack prediction image.

3. The method for detecting cracks based on deep learning according to claim 2, wherein the method for detecting cracks based on deep learning is characterized in that the SA-Net attention module is used for distributing attention weights to the feature graphs after parallel sampling, and weighting the attention weights and the corresponding feature graphs so as to obtain deep features after feature weighting, specifically comprising the following steps:

4. The method for deep learning based fracture detection of claim 2,

the depth separable convolution consists of a channel-by-channel convolution and a point-by-point convolution:

the point-by-point convolution only performs weighted combination in the channel direction to generate effective feature images, and the number of the generated effective feature images is determined by the number of convolution kernels.

5. The deep learning based fracture detection method of claim 1, wherein the loss function of the SA-deep v3+ model employs a Focal loss and a Dice loss, wherein the Focal loss has a calculation formula:

y _i and y _i ' tag value and prediction referring to an image, respectivelyA value; n refers to the total number of pixels in the image.

6. The deep learning based fracture detection method of claim 5, wherein the Focal loss is fused with dynamic compensation weights:

7. The deep learning based fracture detection method of claim 6, wherein β ₁ And beta ₂ The calculation is performed by the following formula:

is the percentage of split pixels in the whole image,/>

Is the percentage of non-cracked pixels in the entire image.

8. Crack detection system based on deep learning, its characterized in that: comprising the following steps:

9. A computer readable storage medium having a program stored thereon, which when executed by a processor, implements the steps of the deep learning based crack detection method as claimed in any one of claims 1-7.

10. An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor implements the steps of the deep learning based fracture detection method of any one of claims 1-7 when the program is executed by the processor.