CN112651926A

CN112651926A - Method and device for detecting cracks based on recursive attention mechanism

Info

Publication number: CN112651926A
Application number: CN202011412270.5A
Authority: CN
Inventors: 卢涛; 吴志豪; 张彦铎; 王彬; 王波; 陈圆; 阮小丽
Original assignee: Wuhan Institute of Technology
Current assignee: Wuhan Institute of Technology
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2021-04-13

Abstract

The invention discloses a method for detecting cracks based on a recursive attention mechanism, which comprises the following steps: acquiring a crack picture, and performing downsampling on the crack picture through a coding part of U-Net to obtain characteristic signals of downsampled pictures with different scales; inputting the characteristic signals of the downsampled pictures with different scales into a corresponding RAM module, wherein the RAM module comprises a recursion module and outputs the corresponding downsampled characteristic signals; inputting a l-layer characteristic signal of an encoding part and a l-1-layer gating signal of a decoding part of the U-Net; outputting a downsampling output signal of a coding part of the U-Net and obtaining a characteristic signal of a salient region through gate control signal attention; and constructing an up-sampling module of U-Net, splicing the characteristic signal of the significant region output by the RAM module and the up-sampled characteristic signal obtained by deconvolution, and then performing up-sampling to obtain a crack detection picture. According to the method, the RAM is integrated into a crack segmentation task through the recursive residual block and the attention module, and the performance of an image crack detection algorithm is improved.

Description

Method and device for detecting cracks based on recursive attention mechanism

Technical Field

The invention belongs to the field of segmentation of computer vision images, and particularly relates to a method and a device for detecting cracks based on a recursive attention mechanism.

Background

Image crack detection (Image crack detection) is a special field of Semantic Segmentation (Semantic Segmentation), and the purpose of crack detection is to distinguish the position of a crack in a picture and to perform abnormality recognition. For example, bridge bottom/road crack detection may identify where cracks are and alert for maintenance. Electroluminescence can determine whether a crack is in a solar cell or cell module. Therefore, crack detection has been widely used in many practical scenarios, playing an important role.

In general, the crack detection task seems to have a similar purpose to edge detection, semantic segmentation. In practice, edge detection operators (e.g., Sobel, Laplace, Canny and some enhanced versions thereof) are widely used directly for crack segmentation in images. These edge operator based detection methods distinguish between cracks or backgrounds by a gradient distribution that relies on an accurately selected threshold. Considering that global and local gradients may differ from each other, the selection threshold does not correctly represent the entire image gradient distribution, and thus edge operators based on manual feature learning have limited performance in processing complex real-world images. On the basis of the success of deep Convolutional Neural Networks (CNN), some advanced semantic segmentation methods are applied end-to-end. Xie takes full advantage of the CNN edge detection task to learn hierarchical image representation (HED) to guide boundary detection. Liu introduces a Richer Convolution Feature (RCF) to improve detection performance by using rich texture information. Yang further uses the feature pyramid and the hierarchy enhancement network to detect pavement cracks. The method learns the multi-scale features from local patches to the global image to detect edges and obtain satisfactory results on road cracks.

Recently, U-Net has been proposed for a large number of segmentation tasks and has yielded impressive results. Since U-Net is naturally represented using multi-scale information, it behaves very powerful for the segmentation task. Cheng detected cracks at the pixel level through U-Net and achieved good results. However, unlike ordinary images containing rich texture information, bridge bottom or road cracks are randomly generated in a monotonic/simple background. Thus, detecting cracks is more challenging than semantically segmenting rich texture images. On the other hand, although multi-scale information can be accessed by directly using U-Net for segmentation, a simple background cannot distinguish between irrelevant information, resulting in errors on predicted edges.

Although the U-Net approach has some drawbacks, there is no significant focus on the target region and the suppression of the associated background region, making the detection result less than ideal. Therefore, it is important to learn more edge information through a deeper network, how to acquire the features of sufficiently salient regions, and suppress the response of irrelevant regions.

Disclosure of Invention

The invention aims to provide a method and a device for detecting cracks based on a recursive attention mechanism, which solve the problem that the conventional picture crack detection algorithm cannot well segment a better crack picture or segment a plurality of defective pictures.

In order to solve the technical problems, the technical scheme of the invention is as follows: a method of crack detection based on a recursive attention mechanism, comprising the steps of:

s1, acquiring a crack picture, and downsampling the crack picture through the coding part of the U-Net to obtain characteristic signals of downsampled pictures of different scales;

s2, respectively inputting the characteristic signals of the downsampled pictures with different scales into corresponding RAM modules, wherein the RAM modules comprise recursion modules, and the corresponding downsampled characteristic signals are output through the recursion modules;

s3, inputting a layer I characteristic signal of a coding part of the U-Net and a layer I-1 gating signal of a decoding part of the U-Net through the RAM module respectively;

s4, outputting down-sampling output signals of the coding part of the U-Net and obtaining characteristic signals of the salient region through gate control signal attention through the RAM module;

s5, constructing an up-sampling module of U-Net, splicing the characteristic signal of the salient region output by the RAM module and the up-sampled characteristic signal obtained through deconvolution operation, and then performing up-sampling to obtain a crack detection picture.

Furthermore, in the step S1, a trilinear interpolation method is used to perform downsampling on the crack picture in the U-Net network, so as to obtain feature signals of downsampled pictures with different scales.

Further, the S3 further includes an addition of the l-layer characteristic signal of the encoding portion and the l-1-layer gating signal of the decoding portion of the U-Net, where the addition mechanism is an additive attention mechanism, and a specific formula is as follows:

in the formula, v₁Expressed is the ReLu function, W_g、W_xPsi are convolution operations, b_gAnd b_ψFor the bias term corresponding to convolution, t represents the recursive convolution operation.

Further, in S4, the l-layer feature signal of the encoding part of U-Net in which the feature signal of the salient region of interest is subjected to jump linking is added to the l-1-layer gating signal of the decoding part, and then the output is subjected to ReLU and sigmoid transformation and is subjected to α_iThen, the concrete formula is as follows:

in the formula, σ₂Is the sigmoid function and g is the gating signal.

Further, in S5, the encoding part of U-Net uses the deconvolution layer to gradually restore details of the target and corresponding spatial dimensions through the decoder, and concatenates the features of the output RAM with the features of the upsampling through the skip link to construct the upsampling module.

Further, in S5, the upsampling specifically includes obtaining feature signals of the salient region and splicing the upsampled features through RAM module output, and obtaining a crack detection picture by performing convolution and sigmoid function change on the spliced features by 1x 1.

Further, the method also comprises the following steps:

and S6, constructing a training module, and obtaining a crack detection model after iterating for a given number of times.

Still further, the method comprises the steps of:

and S7, constructing a test module, and inputting the test image into the crack detection model to obtain a crack test picture.

The device for realizing the method for detecting the cracks based on the recursive attention mechanism comprises a sample acquisition module, a plurality of RAM modules, a training module and a testing module; wherein the content of the first and second substances,

the sample acquisition module is used for acquiring a crack picture, and downsampling the crack picture through a coding part of the U-Net to obtain characteristic signals of downsampled pictures with different scales; respectively inputting the characteristic signals of the downsampled pictures with different scales into corresponding RAM modules;

the RAM module comprises a recursion module which is used for outputting corresponding down-sampling characteristic signals through the recursion module; respectively inputting a layer I characteristic signal of a coding part of the U-Net and a layer I-1 gating signal of a decoding part of the U-Net; respectively outputting a down-sampling output signal of an encoding part of the U-Net and a characteristic signal of a remarkable region concerned by a gating signal; constructing an up-sampling module of U-Net, splicing the characteristic signal of the significant region with the original up-sampled characteristic signal, and then performing up-sampling to obtain a crack detection picture;

the training module is used for obtaining a crack detection model after iteration for a given number of times;

and the test module is used for inputting the test image into the crack detection model to obtain a crack test picture.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a method and a device for detecting cracks based on a recursion attention mechanism.

Drawings

FIG. 1 is a flow chart of a method for crack detection based on a recursive attention mechanism in an embodiment of the invention;

FIG. 2 is a diagram of a crack detection network based on a recursive attention mechanism in an embodiment of the invention;

FIG. 3 is a block diagram of a RAM module in an embodiment of the invention;

FIG. 4 is a structural diagram of a curative block in an embodiment of the present invention;

FIG. 5 is a table comparing crack data sets for different test methods in examples of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The method for detecting the crack based on the recursive attention mechanism, disclosed by the embodiment of the invention, as shown in fig. 1, comprises the following steps:

s1, downsampling the input crack picture through a coding part of U-Net to obtain features of different scales;

s2, respectively inputting the features of different scales into corresponding RAM modules, wherein a recursion module in the RAM can output corresponding down-sampling features;

s3, two inputs are provided in the RAM, one input is the characteristic of the coding l layer, and the other input is the gating signal of the decoding l-1 layer;

s4, in the output of the RAM module, one output is the down-sampling output of U-Net in the coding part, the other output is the jump-link input to the attention mechanism after the down-sampling of the characteristics of the I layer input in the coding part through U-Net, the addition with the gating signal input in the I-1 layer of the decoding part uses the additive attention, although the time is more time-consuming in the calculation, the accuracy is higher than the multiplication attention, and then the characteristics of the remarkable area and the suppression irrelevant area are compared through a series of transformation outputs such as Relu, sigmoid and up-sampling.

S5, constructing an up-sampling module of U-Net, splicing the characteristics of the attention salient region output by the RAM and the original up-sampled characteristics, then up-sampling, and finally outputting to obtain a finer crack detection picture.

And S6, constructing a training module, and iterating given times to obtain an optimal crack detection model.

And S7, constructing a test module, and inputting the test image into the optimal model to obtain a detected crack picture.

Further, in step S1: and (3) downsampling the crack by using a Trilinear interpolation (Trilinear interpolation) network to obtain the characteristics of downsampled pictures with different sizes.

Further, in step S2: the original picture is input into the RAM module and is subjected to down-sampling learning to obtain more edge features after passing through the recursion module.

Further, in step S3:

s3.1, for the RAM module, the RAM module has only one input and one output, which is different from the common module. The RAM module has two inputs and two outputs. One of the inputs is the l-layer characteristic of the U-Net encoded part, and the other input is the gate signal g of the l-1 layer of the U-Net decoded part.

S3.2, for two inputs of the RAM, wherein the characteristic input of the l layer of the coding part of the U-Net is added with the gating signal of the l-1 layer of the decoding part after being subjected to down sampling, an additive attention mechanism is selected, and the specific formula is as follows:

in the formula, σ₁Expressed is the ReLu function, W_g、W_xB denotes the convolution operation_gAnd b_ψAre bias terms corresponding to the convolution. Additive attention is chosen over multiplicative attention because, although computationally more expensive, it performs more than multiplicative attention experimentally.

Preferably, step S4 includes:

s4.1 in RAM module, there are two outputs, one is output characteristic sampled by recursion module in U-Net encoder part, another output is characteristic of I layer coding part input by jump chain U-Net and decoding part I-1 to add and output by series of ReLU and sigmoid conversion and alpha_iSubsequent feature map, α_iThe effect of (a) is to highlight salient image areas and suppress task-independent feature responses. The specific formula is as follows:

in the formula, σ₂Is a sigmoid function, so₂Sigmoid function is used because sequential use of softmax function will output sparser outputAnd g is a gating signal which does not make a vector representing a global image but makes grid signal of partial image space information under certain conditions, and the gating signal of each jump link summarizes information from a plurality of imaging scales.

S4.2 in the RAM module, the characteristics output by the recursive module are sampled, wherein t in the recursive module represents a recursive convolution operation. It was found that the detected mlou is best when t is 2, which consists of a single convolutional layer and two subsequent recursive convolutional layers.

Preferably, step S5 includes: in the encoding part of U-Net, a decoder uses a network layer such as a deconvolution layer to gradually restore the details of a target and the corresponding spatial dimension, wherein the characteristics of an output RAM are spliced with the characteristics of upsampling through jump link, so that more fine characteristics can be obtained, and more details can be restored.

Preferably, step S6 includes: and after iteration of the given rounds, obtaining an optimal crack detection model.

Further, step S7 includes: and giving a crack image to be tested, and outputting a picture after crack detection.

The invention also provides an image defogging device based on the perception enhancement generation countermeasure network for realizing the image defogging method based on the perception discrimination enhancement generation countermeasure network, which comprises the following steps:

the sample acquisition module is used for collecting cracked images for training and testing and marked ground truth images for crack segmentation;

the RAM module comprises a recursion module which is used for outputting corresponding down-sampling characteristic signals through the recursion module; respectively inputting a layer I characteristic signal of a coding part of the U-Net and a layer I-1 gating signal of a decoding part of the U-Net; respectively outputting a down-sampling output signal of an encoding part of the U-Net and a characteristic signal of a remarkable region concerned by a gating signal; constructing an up-sampling module of U-Net, splicing the characteristic signal of the attention salient region with the original up-sampled characteristic signal, and then performing up-sampling to obtain a crack detection picture;

the training module is used for inputting an image with cracks and an image training model of an original crack segmentation group channel, and obtaining an optimal model after training for given times;

the test module inputs a crack detection image to be tested and outputs a segmented crack detection image;

the experiment used the three disclosed CRACK data sets CRACK500, GAPs38 and CFD data set, where CRACK500 contained 500 pavement CRACK images, approximately 2,000x1,500 pixels in size. Due to the limited number, each image is cropped into 16 non-overlapping image regions, leaving only the regions containing cracks of more than 1000 pixels. Among them, 1896 pieces of training data, 348 pieces of verification data, and 1124 pieces of test data.

Compared with other crack detection algorithms, the invention provides experimental data to express the effectiveness of the method, the parameter comparison of the comparison experimental result is shown in figure 5, and the table experimental data shows that the method of the invention obtains high score compared with the comparison method, namely is superior to the comparison algorithm.

The above-described method according to the present invention can be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD-ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the method described herein can be stored in such software processing on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It is understood that the computer, processor, microprocessor controller or programmable hardware includes storage components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the hierarchical multi-scale residual fusion network-based face super-resolution method described herein. Further, when a general-purpose computer accesses code for implementing the processes shown herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the processes shown herein.

It should be noted that, according to the implementation requirement, each step/component described in the present application can be divided into more steps/components, and two or more steps/components or partial operations of the steps/components can be combined into new steps/components to achieve the purpose of the present invention.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method of crack detection based on a recursive attention mechanism, comprising the steps of:

2. The method for crack detection based on the recursive attention mechanism as claimed in claim 1, wherein the crack picture is downsampled in the U-Net network using a trilinear interpolation method in S1, so as to obtain the feature signals of the downsampled picture at different scales.

3. The method for crack detection based on the recursive attention mechanism as claimed in claim 2, wherein the S3 further includes adding the l-layer signature signal of the encoding portion and the l-1-layer gating signal of the decoding portion of the U-Net, the adding mechanism is an additive attention mechanism, and the specific formula is as follows:

in the formula, σ₁Expressed is the ReLu function, W_g、W_xPsi are convolution operations, b_gAnd b_ψFor the bias term corresponding to convolution, t represents the recursive convolution operation.

4. The method for crack detection based on the recursive attention mechanism as claimed in claim 3, wherein in S4, the feature signal of the salient region is processed by adding the l-layer feature signal of the encoding part of the jump-linked U-Net and the l-1-layer gating signal of the decoding part, then the output is processed by the ReLU and sigmoid transformation, and then the output is processed by the α -gate_iThen, the concrete formula is as follows:

in the formula, σ₂Is the sigmoid function and g is the gating signal.

5. The method for crack detection based on the recursive attention mechanism as claimed in claim 4, wherein in the step S5, the encoding part of U-Net gradually restores the details and corresponding spatial dimensions of the target by using the deconvolution layer through the decoder, and the features of the output RAM are spliced with the features of the up-sampling by jumping link to construct the up-sampling module.

6. The method for crack detection based on the recursive attention mechanism as claimed in claim 5, wherein in the step S5, the step of upsampling is to obtain a feature signal of the salient region through the RAM module output, and to splice the feature signal with the upsampled feature, and to change the spliced feature through a convolution of 1x1 and a sigmoid function, so as to obtain a crack detection picture.

7. The method of crack detection based on a recursive attention mechanism according to claim 6, further comprising the steps of:

8. The method of crack detection based on a recursive attention mechanism according to claim 7, further comprising the steps of:

9. An apparatus for implementing the method for crack detection based on the recursive attention mechanism as claimed in claim 8, characterized by comprising a sample collection module, a plurality of RAM modules, a training module and a testing module; wherein the content of the first and second substances,

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.