CN112734739A

CN112734739A - Visual building crack identification method based on attention mechanism and ResNet fusion

Info

Publication number: CN112734739A
Application number: CN202110065534.2A
Authority: CN
Inventors: 范千; 周梦原; 夏樟华
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2021-01-18
Filing date: 2021-01-18
Publication date: 2021-04-30
Anticipated expiration: 2041-01-18
Also published as: CN112734739B

Abstract

The invention relates to a visual building crack identification method based on the fusion of an attention mechanism and ResNet. The method comprises the following steps: (1) collecting a building crack image by using an unmanned aerial vehicle to construct a crack data set; (2) performing data preprocessing and data enhancement on the crack image by using methods such as histogram equalization, bilateral filtering, image center random cutting and the like; (3) establishing a building crack identification model based on the combination of an attention mechanism and a depth residual error neural network; (4) adopting a gradient weighting activation heat map algorithm to visually display the image recognition result in layers, adjusting a network structure and network parameters according to the visual result, and building and adjusting a final model; (6) and detecting the image in the actual field by using the adjusted and optimized model. The method can quickly and accurately identify the cracks, effectively break through a black box mechanism of the neural network in the identification process, and provide visual basis for the adjustment of the network structure.

Description

Visual building crack identification method based on attention mechanism and ResNet fusion

Technical Field

The invention relates to the technical field of computer image processing, in particular to a visual building crack identification method based on attention mechanism and ResNet fusion.

Background

In the engineering construction process, the quality and safety inspection of buildings are very important links. Among them, the detection of cracks in the outer wall of a building is particularly important. The method not only can influence the function and the attractiveness of a building, but also can reduce the structural safety and deteriorate the anti-seismic performance, so that the rapid and accurate detection and identification of the building cracks is a problem to be solved in the current structural health detection field.

At present, a manual regular inspection method is often adopted for detecting and identifying the building cracks. However, the method has great subjectivity, time and labor consumption for inspection, high cost and low working efficiency. In addition, some high-rise buildings or buildings with complex structures are not beneficial to the visual observation of inspectors, and the possibility of crack missing detection and false detection is easy to occur.

In recent years, a crack recognition method based on an image processing technology has attracted much attention, but the detection accuracy of a conventional image processing method represented by an edge detection method is low due to shadows in a crack image and noise generated by low contrast. Currently, intelligent crack identification models based on deep neural networks are widely researched. However, the existing crack detection depth neural network model is not high in identification accuracy, a black box mechanism of the neural network model cannot be overcome, in addition, the identification result of each layer in the network cannot be seen in a layered visualization manner, and the optimal network model is difficult to determine, so that the existing crack detection depth neural network model is also designed based on experience mostly.

Disclosure of Invention

The invention aims to overcome the problems of the existing method, provides a visual building crack identification method based on the fusion of an attention mechanism and ResNet, can improve the identification accuracy, can break a black box mechanism for crack identification of the traditional neural network, and is beneficial to the visual establishment of a network model. In addition, the method is based on the images acquired by the unmanned aerial vehicle, and the intelligent algorithm can be used for rapidly and accurately identifying the cracks of the outer wall of the building, so that the manual inspection cost is reduced, and the detection efficiency is greatly improved.

In order to achieve the purpose, the technical scheme of the invention is as follows: a visual building crack identification method based on attention mechanism and ResNet fusion comprises the following steps:

s1, collecting a preset number of building outer wall pictures by using an unmanned aerial vehicle, respectively marking the collected pictures as two types of samples with or without cracks, and constructing a training data set;

step S2, preprocessing and data enhancing the building outer wall picture by adopting a mode of histogram equalization, bilateral filtering and center random cutting;

step S3, establishing a depth residual error neural network crack identification model based on an attention mechanism;

s4, putting the data processed in the S2 into the model established in the S3 for training to obtain a primary building crack identification model;

step S5, activating a heat map algorithm by adopting a gradient weighting class, visualizing each convolution layer, adjusting a network structure and network parameters according to the visualization result of each convolution layer on the sensitive area of the characteristic map, and retraining the model to obtain an optimal building crack identification model;

s6, constructing a detection data set by using a newly acquired building outer wall picture of the unmanned aerial vehicle, performing image preprocessing on the detection data set by adopting histogram equalization and bilateral filtering, and transmitting the preprocessed picture into an optimal building crack identification model; and carrying out crack identification and model layering visualization on the building outer wall picture, and outputting a picture identification result and a model layering visualization result.

In an embodiment of the present invention, the building exterior wall picture preprocessing and image enhancing method in step S2 includes: respectively carrying out histogram equalization on three channels of RGB of the building outer wall picture, and then summing vectors of the three channels to obtain the building outer wall picture with enhanced information content; denoising the building outer wall picture by adopting a bilateral filter, and removing noise points while not blurring the picture; randomly cutting the center of the picture into pictures with different sizes and aspect ratios according to a set proportion range, and then scaling the cut picture with the size of 224 x 224; randomly horizontally flipping a given image according to a given probability; normalization is performed according to the mean and variance of a given RGB three-channel image.

In an embodiment of the present invention, in the step S3, the attention-based deep residual neural network is composed of a ResNet18 network + AM attention module, that is, each AM attention module is respectively inserted into the middle four layers of the ResNet18, where the AM attention module is composed of a channel attention module and a space attention module, and the AM module sequentially infers an attention map along two independent dimensions, that is, a channel and a space, and then multiplies the attention map with the input feature map for adaptive feature optimization.

In an embodiment of the present invention, in the step S5, the step of visualizing each convolutional layer by using a gradient weighted activation heatmap algorithm includes the following specific steps:

step S51, after obtaining the initial building crack identification model, calculating the output value y of the final layer of the network, namely the previous layer of the Softmax layer^cPartial derivatives for each point pixel on the feature map:

wherein, y^cScore, A, corresponding to fracture class c^kI and j are width and height serial numbers of each pixel point on each characteristic graph respectively for the kth characteristic graph output by the last layer of convolution layer;

step S52, y^cAfter the partial derivative of each pixel on the kth feature map is solved, the global average pooling on the first width and height dimension is taken to obtain the weight coefficient of the kth feature map corresponding to the category c

Wherein Z represents the number of pixels of the feature map,

is the value at the (i, j) position in the kth feature map;

step S53, weighting factor

Multiplied by the kth signature and activated with the RELU function:

wherein, the calculation formula of the ReLU function is as follows:

compared with the prior art, the invention has the following beneficial effects:

1. the method adopts a deep residual error learning algorithm, so that the crack identification of the building outer wall is more intelligent, and the manual inspection cost is reduced.

2. The unmanned aerial vehicle is used for acquiring the building outer wall image, so that the problem of difficulty in detection caused by severe geographic environment can be solved, and the method is high in applicability and popularization.

3. The deep residual error neural network is used as a basic network architecture, and the problem that the accuracy rate of model identification of a deep network model is reduced along with the increase of the depth can be effectively solved.

4. The AM module can perform weighting operation on the feature map on 2 angles of a channel and a space, so that the network can quickly pay attention to an image area to be noticed, and the model training and convergence speed is increased.

5. The AM attention module is a lightweight, general purpose module that can be seamlessly integrated into any convolutional neural network architecture with negligible overhead and end-to-end training with the basic convolutional neural network.

6. The Gradcam algorithm can be seamlessly integrated into the model, and the residual block can be visualized in a layered mode without changing the structure of the model, so that the gradient weighting type activation heat map is obtained. And obtaining a high-resolution and specific concept oriented gradient weighted activation heat map (Guided Gradcam) through back propagation calculation.

7. The feature map obtained by the Gradcam visualization algorithm breaks through a black box mechanism of a convolutional neural network, and the model interpretability is enhanced. The visualization result of each layer provides theoretical basis and basis for efficiently adjusting the network structure.

8. The algorithm of the invention is simple to realize and has high running speed.

Drawings

FIG. 1 is an architecture diagram of a depth residual error neural network building crack identification visualization method based on an attention mechanism, which is proposed by the invention;

FIG. 2 is a graph of 1 convolution block in the depth residual neural network crack identification method based on the attention mechanism, wherein the total number of the convolution blocks in the whole network is 4;

FIG. 3 is a flow chart of a gradient weighted class activation heatmap visualization algorithm in accordance with the present invention;

fig. 4 is a first example of the algorithm for identifying the crack of the building outer wall.

Fig. 5 is an example two of the algorithm provided by the present invention for identifying cracks on the exterior wall of a building.

Detailed Description

The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

As shown in fig. 1, the invention provides a visual identification method of a building crack based on attention mechanism and ResNet fusion, comprising the following steps:

In the embodiment of the present invention, the building exterior wall image preprocessing and image enhancing method in step S2 is that histogram equalization is performed on three channels of picture RGB, and then the three vectors are summed to obtain the building exterior wall image with enhanced information content; denoising the building outer wall picture by adopting a bilateral filter, and removing noise points while not blurring the picture; randomly cutting the center of the picture into pictures with different sizes and aspect ratios according to a set proportion range, and then scaling the cut picture with the size of 224 x 224; randomly horizontally flipping a given image according to a given probability; normalization is performed according to the mean and variance of a given RGB three-channel image.

In the embodiment of the present invention, the method for preprocessing the building exterior wall image in step S2 includes performing gray histogram equalization on each component of RGB of the building exterior wall color image, and then summing the three vectors to obtain the building exterior wall image with enhanced information content; the bilateral filter is adopted to carry out smooth filtering processing on the building outer wall image, and the bilateral filter gives the numerical value of each pixel point on the basis of comprehensively considering the distance and the color weight, so that the noise can be effectively removed, and the edge information can be well protected.

In the embodiment of the present invention, the method for enhancing the crack sample data of the building exterior wall image in step S2 includes randomly cutting the center of the picture into pictures with different sizes and aspect ratios according to a set proportion range, and then scaling the cut picture with size of 224 × 224; randomly horizontally flipping a given image according to a given probability; and normalizing according to the mean value and the variance of the given RGB three-channel image.

In the present embodiment, the depth residual error neural network based on attention mechanism in step S3 is composed of the module "ResNet 18 network + AM attention mechanism". The ResNet18 network is shown in FIG. 2. The AM attention module consists of a channel attention module and a space attention module, which in turn infers an attention map along two independent dimensions (channel and space) and then multiplies the attention map with the input feature map for adaptive feature optimization, as shown in fig. 3. The AM modules are inserted into the middle four layers of the ResNet18 respectively, and since AM is a lightweight general-purpose module, the overhead of the module can be ignored, and the module can be seamlessly integrated into any convolutional neural network architecture, and end-to-end training can be performed together with the basic convolutional neural network.

In the present embodiment, the depth residual error neural network based on attention mechanism in step S3 is composed of the module "ResNet 18 network + AM attention mechanism". ResNet18 is composed of 4 residual blocks (blocks), 18 convolutional layers, 18 ReLU layers, 2 pooling layers, and 1 full link layer, and 4 AM modules are placed behind the 4 residual blocks respectively for channel and spatial attention operations (FIG. 2).

In the present embodiment, the gradient-weighted class-activated heat map visualization algorithm in step S5 is to forward the image through the model given an image and a target class as input, and obtain the original score of the class. For all classes, except for the incoming target class whose gradient is set to 1, the remaining gradients are set to zero; setting a visualized convolution layer, and then reversely propagating the signal to the whole concerned convolution characteristic graph; synthesizing the weight information of the feature maps of the layers in front of the selected visual convolutional layer to obtain a gradient weighted activation heat map (Gradcam), wherein the darker part is the sensitive area of the model to the feature maps and is also the basis of the decision of the model, as shown in FIG. 4(a) and FIG. 5 (a); finally, the thermodynamic diagram is multiplied point by the result of Guided back propagation to obtain a Guided gradient weighted class activation heat map (Guided Gradcam) with high resolution and special concept, which can further clearly reflect the gradient information of the crack fine-grained pixel level, as shown in fig. 4(b) and fig. 5 (b).

In an embodiment of the present invention, a Gradient-weighted Class Activation heatmap algorithm (GradCam) is adopted, and the specific steps for visualizing each convolutional layer are as follows:

step S51, after obtaining the preliminary building crack identification model, calculating the output value y of the final layer of the network, namely the front output value y of the Soft max layer^cPartial derivatives for each point pixel on the feature map:

Wherein Z represents the number of pixels of the feature map,

is the value at the (i, j) position in the kth feature map;

step S53, weighting factor

Multiplied by the kth signature and activated with the ReLU function:

wherein, the calculation formula of the ReLU function is as follows:

as will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing is directed to preferred embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.

Claims

1. A visual building crack identification method based on attention mechanism and ResNet fusion is characterized by comprising the following steps:

step S3, establishing a ResNet depth residual error neural network crack identification model based on an attention mechanism;

2. The visual identification method for the building cracks based on the fusion of the attention mechanism and the ResNet (R-ResNet) according to claim 1, wherein the building exterior wall picture preprocessing and image enhancement method in the step S2 comprises: respectively carrying out histogram equalization on three channels of RGB of the building outer wall picture, and then summing vectors of the three channels to obtain the building outer wall picture with enhanced information content; denoising the building outer wall picture by adopting a bilateral filter, and removing noise points while not blurring the picture; randomly cutting the center of the picture into pictures with different sizes and aspect ratios according to a set proportion range, and then scaling the cut picture with the size of 224 x 224; randomly horizontally flipping a given image according to a given probability; normalization is performed according to the mean and variance of a given RGB three-channel image.

3. The method for visually identifying the building crack based on the fusion of the attention mechanism and the ResNet as claimed in claim 1, wherein in the step S3, the depth residual neural network based on the attention mechanism is composed of a ResNet18 network + AM attention mechanism modules, that is, each AM attention mechanism module is respectively inserted into the middle four layers of the ResNet18, wherein the AM attention mechanism module is composed of a channel attention module and a space attention module, the AM module sequentially infers the attention diagram along two independent dimensions, that is, a channel and a space, and then multiplies the attention diagram with the input feature diagram for adaptive feature optimization.

4. The method for visually identifying the building crack based on the fusion of the attention mechanism and the ResNet according to claim 1, wherein in the step S5, the step of visualizing each convolutional layer by using a gradient weighting type activation heatmap algorithm comprises the following specific steps:

wherein, y^cScore, A, corresponding to fracture class c^kThe k characteristic diagram outputted for the last convolution layer, i and j are respectively the characteristic diagramThe width and height serial numbers of the pixel points;

Wherein Z represents the number of pixels of the feature map,

is the value at the (i, j) position in the kth feature map;

step S53, weighting factor

Multiplied by the kth signature and activated with the RELU function:

wherein, the calculation formula of the ReLU function is as follows: