CN116823800A - Bridge concrete crack detection method based on deep learning under complex background - Google Patents
Bridge concrete crack detection method based on deep learning under complex background Download PDFInfo
- Publication number
- CN116823800A CN116823800A CN202310874021.5A CN202310874021A CN116823800A CN 116823800 A CN116823800 A CN 116823800A CN 202310874021 A CN202310874021 A CN 202310874021A CN 116823800 A CN116823800 A CN 116823800A
- Authority
- CN
- China
- Prior art keywords
- convolution
- bridge
- crack
- input
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 76
- 238000013135 deep learning Methods 0.000 title claims abstract description 24
- 238000000605 extraction Methods 0.000 claims abstract description 75
- 238000000034 method Methods 0.000 claims abstract description 52
- 230000004927 fusion Effects 0.000 claims abstract description 48
- 230000011218 segmentation Effects 0.000 claims abstract description 39
- 238000011176 pooling Methods 0.000 claims description 48
- 238000012545 processing Methods 0.000 claims description 22
- 238000012549 training Methods 0.000 claims description 22
- 238000010586 diagram Methods 0.000 claims description 17
- 238000005070 sampling Methods 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 15
- 239000000284 extract Substances 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 4
- 238000005259 measurement Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000005336 cracking Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- 241001270131 Agaricus moelleri Species 0.000 description 1
- 239000009636 Huang Qi Substances 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 239000010426 asphalt Substances 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 239000004035 construction material Substances 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000002407 reforming Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0004—Industrial image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0004—Industrial image inspection
- G06T7/0008—Industrial image inspection checking presence/absence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a bridge concrete crack detection method based on deep learning under a complex background, which uses a bridge crack identification detection model trained by the deep learning method to carry out segmentation, identification and prediction on a crack region in a bridge crack image; according to the bridge crack identification detection model, a high-level semantic feature map of a bridge crack image is extracted through an attention fusion feature extraction network, a position outline feature map of the bridge crack image is extracted through a shallow feature extraction network, and then a crack segmentation identification detection result of the bridge crack image is obtained according to the fusion result prediction of the high-level semantic feature map and the position outline feature map, so that misjudgment on complex background pixels is reduced, and positioning of a crack region is accurately realized; the method can realize the segmentation recognition and extraction of the bridge concrete cracks under the complex background more quickly and more accurately, thereby solving the problems of low crack segmentation recognition speed, insufficient accuracy and the like caused by overlarge background noise.
Description
Technical Field
The invention relates to the technical field of bridge structure detection and the technical field of neural networks, in particular to a method for detecting bridge concrete cracks under a complex background based on deep learning.
Background
Along with the development and perfection of the infrastructure and the transportation system in China, the number of bridges is rapidly increased, and more bridges gradually enter the stage of maintenance and repair. The most common damage to bridges is cracking of concrete, and more than 76% of damage to concrete bridges is caused by cracking. The cause of the crack formation is mainly the following: because the tensile strength of the concrete structure is low, the concrete structure is stressed unevenly and cracks are generated due to the fact that the concrete structure is loaded excessively for a long time; because the concrete structure is often exposed to the outside, the concrete structure is easily deformed under the influence of external temperature change and is often corroded by rainwater to cause cracks; the quality and the construction technical level of construction materials directly influence the quality and the service life of the concrete structure. The location and morphology of cracks in bridges can provide a great deal of information about internal damage, degradation and potential risk to the structure. Therefore, the detection of cracks in the concrete structure of the bridge is a very necessary and important task for the evaluation of the health status of the bridge and the subsequent maintenance and repair of the structure.
At present, the common bridge concrete crack detection method mainly comprises two types of manual detection and image vision-based detection. The manual detection mainly depends on a means of human eye observation, and detection personnel observe by means of telescope, rope, tower erection, truck and the like; but the manual detection efficiency is lower, the detection is easy to miss, and the problems of high labor cost and high safety risk exist. Based on the development of unmanned aerial vehicle technology, image acquisition is carried out on the surface of the whole bridge by using unmanned aerial vehicle carried camera equipment, then the acquired image is analyzed based on an image visual identification method, and bridge crack detection is completed.
The image vision method is high in efficiency, but is complex, and becomes an important research direction of bridge concrete cracks. In 2017, ZHANG et al established an effective architecture for a backbone Convolutional Neural Network (CNN) that could detect 3D asphalt surface cracks at the pixel level, but it did not have a polling layer. The radix astragali, qi and the like recognize the crack by adopting a gray value threshold segmentation method and combining interference rejection based on the combination of a binary image and a gray image, but are not suitable for the situation that the crack and the background are not obvious in difference. Ruan Xiaoli introduces the idea of treating the crack region as a connected region, and filtering out non-cracks according to the characteristic parameters of the cracks, so that the width of smaller cracks can be identified, but the difference of dry gray scales is still limited. In 2018, YANG et al introduced a Full Convolutional Network (FCN) to solve both the identification and measurement crack problems, but the measurement error was large. Wang Sen and the like construct a novel CrackFCN model, so that high-precision detection of cracks and error marking reduction under a more complex background are realized, but the processing efficiency is still not high enough. In 2019, zhou Ying and the like combine the crack fragment splicing and image processing methods, so that high-precision measurement of the crack width is realized, but identification of cracks in a more complex background is to be promoted.
Therefore, how to perform the segmentation recognition of the bridge concrete cracks from the complex background more quickly and accurately becomes a problem to be further researched and solved.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a method for detecting the bridge concrete cracks under the complex background based on deep learning, which can realize the segmentation and identification of the bridge concrete cracks under the complex background more rapidly and more accurately.
The technical problems are solved, and the invention is realized by adopting the following technical scheme:
the method comprises the steps of obtaining a bridge crack image to be processed, inputting the bridge crack image to a pre-trained bridge crack recognition and detection model, and obtaining a crack segmentation recognition and detection result of the bridge crack image to be processed;
the bridge crack identification and detection model comprises an attention fusion feature extraction network based on a U-Net network framework and a shallow feature extraction network based on a multi-scale convolution attention module and a transducer; the bridge crack identification and detection model respectively takes an input bridge crack image as the input of a multi-scale convolution attention module and a shallow feature extraction network, extracts a high-level semantic feature map of the bridge crack image through the attention fusion feature extraction network, extracts a position outline feature map of the bridge crack image through the shallow feature extraction network, and then multiplies the high-level semantic feature map and the position outline feature map of the bridge crack image to generate a crack segmentation identification and detection result serving as an output bridge crack image in a fusion mode.
In the method for detecting the bridge concrete cracks under the complex background based on deep learning, as a preferable scheme, the attention fusion characteristic extraction network comprises four coding layers, a bottleneck layer and four decoding layers which are sequentially connected;
each coding layer comprises a multi-scale convolution attention module and a stripe pooling module which are cascaded; wherein the input of the coding layer is used as the input of a multi-scale convolution attention module, the output of the multi-scale convolution attention module in the coding layer is used as the input of a strip pooling module, and the output of the strip pooling module is used as the output of the coding layer;
each decoding layer comprises two cascaded convolution layers and a convolution up-sampling module; wherein the input of the decoding layer is used as the input of a first convolution layer, the output of the first convolution layer is used as the input of a second convolution layer, the output of the second convolution layer is used as the input of a convolution up-sampling module, and the output of the convolution up-sampling module is used as the output of the decoding layer;
the bottleneck layer comprises two cascaded convolution layers;
the bridge crack image input to the attention fusion feature extraction network is used as the input of a first coding layer; the output of each coding layer serves as the input of the next layer it connects in the attention fusion feature extraction network; the output of the fourth coding layer is used as the input of the bottleneck layer, and after convolution up-sampling, the output of the bottleneck layer is spliced with the output of the multi-scale convolution attention module in the fourth coding layer to be used as the input of the fourth decoding layer; the input of each decoding layer is a splice diagram of the output of a multi-scale convolution attention module in a corresponding encoding layer and the output of the decoding layer at the upper layer in the attention fusion characteristic extraction network; the output of the first decoding layer serves as the overall output of the attention fusion feature extraction network.
In the method for detecting the bridge concrete cracks under the complex background based on deep learning, as an optimal scheme, the shallow layer feature extraction network comprises two multi-scale convolution attention modules, a strip pooling module and a transform module;
the bridge crack image input to the shallow feature extraction network is respectively used as the input of a first multi-scale convolution attention module and a transform module, the output of the first multi-scale convolution attention module is used as the input of a strip pooling module, the output of the strip pooling module is spliced with the output of the transform module to be used as the input of a second multi-scale convolution attention module, and the output of the second multi-scale convolution attention module is used as the integral output of the shallow feature extraction network.
In the method for detecting the bridge concrete cracks under the complex background based on deep learning, as a preferred scheme, the processing procedure of the multi-scale convolution attention module comprises the following steps:
after 5×5 depth convolution is performed on the input feature map, branch depth stripe convolution of two branches is performed on the depth convolution map respectively, wherein one branch performs 1×7 and 7×1 stripe convolution, and the other branch performs 1×11 and 1×11 stripe convolution; after two different scale feature images obtained by the strip convolution processing of the two branches are spliced with the depth convolution image, a multichannel feature image is obtained by convolution calculation with a convolution kernel of 1; and finally, multiplying the multichannel characteristic diagram with the input characteristic diagram to obtain an attention characteristic diagram serving as output.
In the method for detecting the bridge concrete cracks under the complex background based on deep learning, as a preferred scheme, the processing process of the strip pooling module comprises the following steps:
respectively carrying out pixel width banding pooling treatment and pixel height banding pooling treatment on the input feature map to respectively obtain a pixel width pooling feature map with the size of 1 XW and a pixel height pooling feature map with the size of H X1, wherein H, W is the pixel height and the pixel width of the input feature map; then, the pixel width pooling feature images are subjected to one-dimensional convolution expansion operation to obtain pixel width-based expansion feature images with the size of H multiplied by W, the pixel height pooling feature images are subjected to one-dimensional convolution expansion operation to obtain pixel height-based expansion feature images with the size of H multiplied by W, the two expansion feature images are subjected to superposition fusion, and then a convolution operation with the convolution kernel of 1 and a Sigmoid activation function operation are sequentially carried out to obtain an activation feature image; and finally, multiplying the activated feature map with the input feature map to obtain a pooled downsampling feature map serving as output.
In the method for detecting the bridge concrete cracks under the complex background based on deep learning, as a preferable scheme, the bridge crack identification and detection model is trained in the following manner:
and taking the bridge crack sample image with the crack region segmentation mask mark which is finished in advance as a training sample to form a training sample set, inputting the bridge crack identification detection model, constructing a total loss function comprising cross entropy loss and Dice loss for evaluating the bridge crack identification performance, optimizing and updating model parameters of the bridge crack identification detection model with the minimum total loss function as a target, and further training the bridge crack identification detection model.
In the method for detecting the bridge concrete cracks under the complex background based on deep learning, as a preferable scheme, the total loss function is as follows:
L Total =L CE +λL Dice ;
wherein L is Total Represents the total loss function, L CE Represents cross entropy loss, L Dice Representing the Dice loss; g i True segmentation mask value, p, representing the i-th pixel position in a bridge crack sample image i The prediction segmentation mask value of the ith pixel position in the bridge crack sample image predicted and output by the bridge crack recognition detection model is represented, i epsilon {1,2, …, H multiplied by W }, and H, W are the pixel height and the pixel width of the bridge crack sample image respectively; λ is the weight coefficient.
Compared with the prior art, the invention has the following beneficial technical effects:
1. according to the method, the deep learning method is used for training the bridge crack recognition and detection model, after training, the crack region in the bridge crack image is subjected to segmentation recognition prediction, and the segmentation recognition and detection of the bridge concrete crack under the complex background can be rapidly realized.
2. According to the bridge crack identification detection model used in the method, the high-level semantic feature map of the bridge crack image is extracted through the attention fusion feature extraction network, the position outline feature map of the bridge crack image is extracted through the shallow feature extraction network, then the crack segmentation identification detection result of the bridge crack image is obtained according to the fusion result prediction of the high-level semantic feature map and the position outline feature map, the wide channel characteristics of the attention fusion feature extraction network and the crack detail enhancement performance of the shallow feature extraction network are comprehensively utilized, misjudgment on complex background pixels is reduced, and the positioning of a crack region is accurately realized.
3. According to the bridge crack recognition and detection model training method, the bridge crack recognition and detection model is trained through the combination of the cross entropy loss and the Dice loss, and the training effect and recognition performance of the bridge crack recognition and detection model can be improved.
4. The method for detecting the bridge concrete cracks under the complex background based on deep learning can realize the segmentation recognition and extraction of the bridge concrete cracks under the complex background more quickly and more accurately, thereby solving the problems of low crack segmentation recognition speed, insufficient accuracy and the like caused by overlarge background noise.
Drawings
FIG. 1 is a schematic diagram of a bridge crack identification and detection model used in the method of the present invention.
Fig. 2 is a schematic diagram of the structure of the attention fusion feature extraction network.
Fig. 3 is a schematic diagram of a processing flow of the multi-scale convolution attention module MSCA.
Fig. 4 is a schematic diagram of the processing flow of the strip pooling module SP.
Fig. 5 is a schematic structural diagram of a shallow feature extraction network.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention will be described in further detail with reference to the drawings and the specific examples.
The invention provides a method for detecting bridge concrete cracks under a complex background based on deep learning.
The bridge crack identification and detection model used in the method comprises an attention fusion feature extraction network based on a U-Net network framework and a shallow feature extraction network based on a multi-scale convolution attention module and a transducer; the bridge crack identification and detection model respectively takes an input bridge crack image as the input of a multi-scale convolution attention module and a shallow feature extraction network, extracts a high-level semantic feature map of the bridge crack image through the attention fusion feature extraction network, extracts a position outline feature map of the bridge crack image through the shallow feature extraction network, and then multiplies the high-level semantic feature map and the position outline feature map of the bridge crack image to generate a crack segmentation identification and detection result serving as an output bridge crack image in a fusion mode.
The method for detecting the bridge concrete cracks under the complex background based on deep learning is described in more detail below.
1. Image preprocessing
In order to better utilize the method of the invention to detect the bridge concrete cracks, the bridge crack images acquired in the bridge detection report are preferably subjected to certain screening, uniform resolution, and the like. Specifically, the bridge crack pictures with detection barriers caused by stains, shadows, graffiti, line drawing and the like can be manually screened and removed from the bridge detection report; then, for the condition that the sizes and the resolutions of the pictures are different, processing methods such as an image difference algorithm and the like can be adopted to unify the bridge crack pictures into the same resolution size.
After the pretreatment, the method can be used for bridge concrete crack detection. Meanwhile, after the pretreated bridge crack pictures are subjected to effective crack region segmentation and identification, segmentation mask marking can be carried out on the crack regions, and the bridge crack pictures can be used as bridge crack sample images of a bridge crack identification and detection model and used as training samples or test verification samples.
2. Bridge crack recognition and detection model
After training, the bridge crack recognition and detection model is used for carrying out crack segmentation recognition and detection on the bridge crack image to be processed. The bridge crack recognition and detection model framework used in the method of the invention is shown in fig. 1, and mainly comprises a attention fusion feature extraction network and a shallow feature extraction network, which are respectively described in detail below.
2.1, attention fusion feature extraction network
The attention fusion feature extraction network is based on a U-Net network framework and is used for extracting a high-level semantic feature map of the bridge crack image. In practical technical implementation, the attention fusion feature extraction network is reformed based on a U-Net network framework, and the reforming mode is that a Strip Pooling module (SP) is introduced into the U-Net network to replace conventional convolution downsampling processing in a coding structure so as to collect abundant context information, so that crack information of a complex scene is explored, and misjudgment on background pixels is reduced; meanwhile, a conventional convolution block in a coding structure of a conventional U-Net network is replaced by a Multi-scale convolution attention module (Multi-Scale Convolutional Attention, abbreviated as MSCA), a relation between different channels is established while local information is aggregated, spatial attention is aroused to pay attention to crack information, interference of complex backgrounds such as line drawing, graffiti and the like is further suppressed, interaction of Multi-scale information is finally achieved, and precision of crack detection is improved.
Specifically, the structure of the attention fusion feature extraction network is based on a U-Net network architecture, and as shown in fig. 2, includes four coding layers, one bottleneck layer and four decoding layers connected in sequence.
Each coding layer comprises a multi-scale convolution attention module MSCA and a stripe pooling module SP which are cascaded; wherein the input of the coding layer is taken as the input of a multi-scale convolution attention module MSCA, the output of the multi-scale convolution attention module MSCA in the coding layer is taken as the input of a banding pooling module, and the output of the banding pooling module SP is taken as the output of the coding layer.
Each decoding layer comprises two cascaded convolution layers Conv and a convolution up-sampling module UC; wherein, the input of the decoding layer is taken as the input of a first convolution layer Conv1, the output of the first convolution layer Conv1 is taken as the input of a second convolution layer Conv2, the output of the second convolution layer Conv2 is taken as the input of a convolution up-sampling module UC, and the output of the convolution up-sampling module UC is taken as the output of the decoding layer.
The bottleneck layer comprises two concatenated convolutional layers Conv.
As shown in fig. 2, the bridge crack image input to the attention fusion feature extraction network serves as the input of the first encoding layer; the output of each coding layer serves as the input of the next layer it connects in the attention fusion feature extraction network; the output of the fourth coding layer is used as the input of the bottleneck layer, and after convolution up-sampling, the output of the bottleneck layer is spliced with the output of the multi-scale convolution attention module in the fourth coding layer to be used as the input of the fourth decoding layer; the input of each decoding layer is a splice diagram of the output of a multi-scale convolution attention module in a corresponding encoding layer and the output of the decoding layer at the upper layer in the attention fusion characteristic extraction network; the output of the first decoding layer serves as the overall output of the attention fusion feature extraction network.
The processing procedure of the multi-scale convolution attention module MSCA is shown in fig. 3, specifically: after 5×5 depth convolution is performed on the input feature map, branch depth stripe convolution of two branches is performed on the depth convolution map respectively, wherein one branch performs 1×7 and 7×1 stripe convolution, and the other branch performs 1×11 and 1×11 stripe convolution; after two different scale feature images obtained by the strip convolution processing of the two branches are spliced with the depth convolution image, a multichannel feature image is obtained by convolution calculation with a convolution kernel of 1; and finally, multiplying the multichannel characteristic diagram with the input characteristic diagram to obtain an attention characteristic diagram serving as output.
Stacking of conventional convolution blocks does not enable the network to focus on more crack pixels. Firstly, the multi-scale convolution attention module MSCA can arouse the spatial attention through simple element multiplication, so that the attention of a network to cracks is increased, the extraction of useless information is reduced, and the interference of background information in a picture is effectively restrained. In addition, the multi-scale convolution attention module comprises deep convolution branches, which can aggregate local feature information of the crack, and can still extract abstract features of the crack under the condition that the image background is complex and has shielding, so that the feature extraction capability of the coding structure is enhanced. Secondly, the lightweight banded convolution is beneficial to extracting the characteristics of the banded object while reducing the calculation cost, namely, the banded convolution is suitable for detecting cracks. Therefore, the invention replaces the standard convolution blocks in the coding structure of the U-Net network with the multi-scale convolution attention module MSCA to enhance the crack characteristics.
The processing procedure of the strip pooling module SP is shown in fig. 4, specifically: respectively carrying out pixel width banding pooling treatment and pixel height banding pooling treatment on the input feature map to respectively obtain a pixel width pooling feature map with the size of 1 XW and a pixel height pooling feature map with the size of H X1, wherein H, W is the pixel height and the pixel width of the input feature map; then, the pixel width pooling feature images are subjected to one-dimensional convolution expansion operation to obtain pixel width-based expansion feature images with the size of H multiplied by W, the pixel height pooling feature images are subjected to one-dimensional convolution expansion operation to obtain pixel height-based expansion feature images with the size of H multiplied by W, the two expansion feature images are subjected to superposition fusion, and then a convolution operation with the convolution kernel of 1 and a Sigmoid activation function operation are sequentially carried out to obtain an activation feature image; and finally, multiplying the activated feature map with the input feature map to obtain a pooled downsampling feature map serving as output.
The area occupied by the crack in the whole picture is smaller, and the crack has the characteristics of long and thin; moreover, considering that the common downsampling limits the context information of the crack, the problem that the background contains obstacles such as line drawing, graffiti and the like cannot be well solved. Based on the consideration of the factors, the invention introduces the strip pooling module SP into the coding structure to better capture the local context information, thereby reducing the misjudgment of background pixels.
The processing process of the attention fusion characteristic extraction network on the input bridge crack image is as follows: after being processed by a multi-scale convolution attention module MSCA and a strip pooling module SP in a first coding layer, a second coding layer, a third coding layer and a fourth coding layer in sequence, the bridge crack image enters a bottleneck layer to be subjected to two-layer convolution processing and then is subjected to up-sampling, then is subjected to convolution and up-sampling processing in a fourth decoding layer, a third decoding layer, a second decoding layer and a first decoding layer in sequence, in the decoding process, an up-sampling image of the upper layer is spliced with a characteristic image output by the multi-scale convolution attention module in one coding layer corresponding to the decoding layer, and then is input to the decoding layer to be processed, and finally, a high-layer semantic characteristic image of the output bridge crack image is obtained.
In the processing process of the attention fusion feature extraction network on the bridge crack image, a multi-scale convolution attention module MSCA and a banding pooling module SP of four coding layers are adopted in the coding stage for processing, resolution is continuously reduced through banding pooling of four layers to obtain image information of different scales, crack feature information in the bridge crack image gradually changes from information such as points, lines, gradients and the like in bottom layer information to outline in elevation information and more abstract semantic information, and the whole network completes extraction and combination of 'from thin to thick' features, so that high-level semantic feature information extracted from the bridge crack image is more comprehensive; on the other hand, in the next decoding stage, if the image is not sensitive to detail information in the process of simply passing through up-sampling from low resolution to high resolution, after the up-sampling image of the upper layer is spliced with the feature image output by the multi-scale convolution attention module in one coding layer corresponding to the decoding layer, the processing of the decoding layer is executed, and the information such as accurate gradient, point, line and the like extracted from the coding layer with the same layer height is directly transferred to the decoding layer of the same layer, which is equivalent to adding detail information in the general area of the crack judgment target, so that the attention fusion feature extraction network can obtain more accurate crack segmentation recognition results.
2.2 shallow feature extraction network
The attention fusion feature extraction network is mainly used for extracting high-level semantic features related to cracks in bridge crack images, and the high-level semantic background information can improve the detection performance of a larger structure, but when the crack boundaries are processed, the problems of loss or discontinuity can occur, and shadows and stains are easily misjudged as cracks. The more the shallow layer features stored in the bridge crack image can embody the detail information of the crack, the more the detail information contains rich position information and contour information, so that the shallow layer features of the crack are extracted by considering the establishment of a shallow layer branch, and the position and contour information of the crack are obtained so as to more accurately position the crack.
Specifically, the structure of the shallow feature extraction network is shown in fig. 5, and includes two multi-scale convolution attention modules, a strip pooling module and a transducer module. The multi-scale convolution attention module MSCA and the stripe pooling module SP are the same as described above; the transducer module is also a mature and common network module unit in the field, and will not be described in detail here.
The bridge crack image input to the shallow feature extraction network is respectively used as the input of a first multi-scale convolution attention module MSCA1 and a transform module, the output of the first multi-scale convolution attention module MSCA1 is used as the input of a strip pooling module SP, the output of the strip pooling module SP is overlapped with the output of the transform module to be used as the input of a second multi-scale convolution attention module MSCA2, and the output of the second multi-scale convolution attention module MSCA2 is used as the integral output of the shallow feature extraction network.
The processing process of the shallow feature extraction network on the input bridge crack image is as follows: the bridge crack image is processed through two branch lines, one branch line sequentially passes through a first multi-scale convolution attention module MSCA1 and a strip pooling module SP, the other branch line passes through a transducer module to conduct feature extraction processing, and after the processing result graphs of the two branch lines are spliced, the second multi-scale convolution attention module MSCA2 is processed to obtain the position profile feature graph of the output bridge crack image.
The shallow feature extraction network extracts the feature information of crack points and line contours in the bridge crack image through the first multi-scale convolution attention module MSCA1 and the strip pooling module SP, and the characteristic capability of the multi-scale convolution attention module on local features is well utilized; meanwhile, global position feature information in the bridge crack image is extracted through a transducer module, and modeling capacity of the transducer module on global context features is well utilized; and then, the two are spliced and then subjected to shallow fusion of contour feature information and position feature information through a second multi-scale convolution attention module MSCA2, so that a position contour feature map of the bridge crack image is obtained, and the contour feature information and the position feature information are utilized more efficiently, so that information redundancy is prevented.
Finally, the high-level semantic feature map obtained by the attention fusion feature extraction network and the position contour feature map obtained by the shallow feature extraction network are fused through multiplication operation, a crack segmentation recognition detection result of a bridge crack image is predicted by a bridge crack recognition detection model according to the fusion result, the wide channel characteristics of the attention fusion feature extraction network and the crack detail enhancement performance of the shallow feature extraction network are comprehensively utilized, misjudgment on complex background pixels is reduced, positioning of a crack region is accurately realized, and therefore segmentation recognition and extraction of a bridge concrete crack under the complex background are realized more quickly and accurately, and the problems of low crack segmentation recognition speed, insufficient accuracy and the like caused by overlarge background noise are solved.
3. Training of bridge crack recognition and detection model
In the training of the bridge crack identification and detection model, the method disclosed by the invention comprises the following steps of:
and taking the bridge crack sample image with the crack region segmentation mask mark which is finished in advance as a training sample to form a training sample set, inputting the bridge crack identification detection model, constructing a total loss function comprising cross entropy loss and Dice loss for evaluating the bridge crack identification performance, optimizing and updating model parameters of the bridge crack identification detection model with the minimum total loss function as a target, and further training the bridge crack identification detection model.
Wherein, the total loss function is:
L Total =L CE +λL Dice ;
wherein L is Total Represents the total loss function, L CE Represents cross entropy loss, L Dice Representing the Dice loss; g i True segmentation mask value, p, representing the i-th pixel position in a bridge crack sample image i The prediction segmentation mask value of the ith pixel position in the bridge crack sample image predicted and output by the bridge crack recognition detection model is represented, i epsilon {1,2, …, H multiplied by W }, and H, W are the pixel height and the pixel width of the bridge crack sample image respectively; λ is the weight coefficient.
According to the invention, the bridge crack recognition and detection model is trained by combining the cross entropy loss and the Dice loss, so that the method can be better suitable for complex crack morphological characteristics such as different sizes, shapes and the like in bridge crack images, and can be better suitable for bridge crack images with complex background interferences such as line drawing, graffiti and the like, thereby improving the training effect and recognition performance of the bridge crack recognition and detection model, and finally realizing the segmentation and recognition of the bridge concrete crack under the complex background more quickly and accurately.
4. Overview of the invention
In summary, the method of the invention has the following technical advantages:
1. according to the method, the deep learning method is used for training the bridge crack recognition and detection model, after training, the crack region in the bridge crack image is subjected to segmentation recognition prediction, and the segmentation recognition and detection of the bridge concrete crack under the complex background can be rapidly realized.
2. According to the bridge crack identification detection model used in the method, the high-level semantic feature map of the bridge crack image is extracted through the attention fusion feature extraction network, the position outline feature map of the bridge crack image is extracted through the shallow feature extraction network, then the crack segmentation identification detection result of the bridge crack image is obtained according to the fusion result prediction of the high-level semantic feature map and the position outline feature map, the wide channel characteristics of the attention fusion feature extraction network and the crack detail enhancement performance of the shallow feature extraction network are comprehensively utilized, misjudgment on complex background pixels is reduced, and the positioning of a crack region is accurately realized.
3. According to the bridge crack recognition and detection model training method, the bridge crack recognition and detection model is trained through the combination of the cross entropy loss and the Dice loss, and the training effect and recognition performance of the bridge crack recognition and detection model can be improved.
4. The method for detecting the bridge concrete cracks under the complex background based on deep learning can realize the segmentation recognition and extraction of the bridge concrete cracks under the complex background more quickly and more accurately, thereby solving the problems of low crack segmentation recognition speed, insufficient accuracy and the like caused by overlarge background noise.
Finally, it is noted that while the present invention has been described with reference to the preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the appended claims.
Claims (7)
1. The method is characterized by obtaining a bridge crack image to be processed, inputting the bridge crack image to a pre-trained bridge crack identification detection model, and obtaining a crack segmentation identification detection result of the bridge crack image to be processed;
the bridge crack identification and detection model comprises an attention fusion feature extraction network based on a U-Net network framework and a shallow feature extraction network based on a multi-scale convolution attention module and a transducer; the bridge crack identification and detection model respectively takes an input bridge crack image as the input of a multi-scale convolution attention module and a shallow feature extraction network, extracts a high-level semantic feature map of the bridge crack image through the attention fusion feature extraction network, extracts a position outline feature map of the bridge crack image through the shallow feature extraction network, and then multiplies the high-level semantic feature map and the position outline feature map of the bridge crack image to generate a crack segmentation identification and detection result serving as an output bridge crack image in a fusion mode.
2. The method for detecting the bridge concrete cracks under the complex background based on the deep learning according to claim 1, wherein the attention fusion characteristic extraction network comprises four coding layers, one bottleneck layer and four decoding layers which are sequentially connected;
each coding layer comprises a multi-scale convolution attention module and a stripe pooling module which are cascaded; wherein the input of the coding layer is used as the input of a multi-scale convolution attention module, the output of the multi-scale convolution attention module in the coding layer is used as the input of a strip pooling module, and the output of the strip pooling module is used as the output of the coding layer;
each decoding layer comprises two cascaded convolution layers and a convolution up-sampling module; wherein the input of the decoding layer is used as the input of a first convolution layer, the output of the first convolution layer is used as the input of a second convolution layer, the output of the second convolution layer is used as the input of a convolution up-sampling module, and the output of the convolution up-sampling module is used as the output of the decoding layer;
the bottleneck layer comprises two cascaded convolution layers;
the bridge crack image input to the attention fusion feature extraction network is used as the input of a first coding layer; the output of each coding layer serves as the input of the next layer it connects in the attention fusion feature extraction network; the output of the fourth coding layer is used as the input of the bottleneck layer, and after convolution up-sampling, the output of the bottleneck layer is spliced with the output of the multi-scale convolution attention module in the fourth coding layer to be used as the input of the fourth decoding layer; the input of each decoding layer is a splice diagram of the output of a multi-scale convolution attention module in a corresponding encoding layer and the output of the decoding layer at the upper layer in the attention fusion characteristic extraction network; the output of the first decoding layer serves as the overall output of the attention fusion feature extraction network.
3. The method for detecting the bridge concrete cracks under the complex background based on deep learning according to claim 1, wherein the shallow feature extraction network comprises two multi-scale convolution attention modules, a strip pooling module and a transducer module;
the bridge crack image input to the shallow feature extraction network is respectively used as the input of a first multi-scale convolution attention module and a transform module, the output of the first multi-scale convolution attention module is used as the input of a strip pooling module, the output of the strip pooling module is spliced with the output of the transform module to be used as the input of a second multi-scale convolution attention module, and the output of the second multi-scale convolution attention module is used as the integral output of the shallow feature extraction network.
4. The method for detecting the bridge concrete cracks in the complex background based on deep learning according to claim 2 or 3, wherein the processing procedure of the multi-scale convolution attention module comprises the following steps:
after 5×5 depth convolution is performed on the input feature map, branch depth stripe convolution of two branches is performed on the depth convolution map respectively, wherein one branch performs 1×7 and 7×1 stripe convolution, and the other branch performs 1×11 and 1×11 stripe convolution; after two different scale feature images obtained by the strip convolution processing of the two branches are spliced with the depth convolution image, a multichannel feature image is obtained by convolution calculation with a convolution kernel of 1; and finally, multiplying the multichannel characteristic diagram with the input characteristic diagram to obtain an attention characteristic diagram serving as output.
5. A method for detecting a bridge concrete crack in a complex background based on deep learning as claimed in claim 2 or 3, wherein the processing procedure of the strip pooling module comprises the following steps:
respectively carrying out pixel width banding pooling treatment and pixel height banding pooling treatment on the input feature map to respectively obtain a pixel width pooling feature map with the size of 1 XW and a pixel height pooling feature map with the size of H X1, wherein H, W is the pixel height and the pixel width of the input feature map; then, the pixel width pooling feature images are subjected to one-dimensional convolution expansion operation to obtain pixel width-based expansion feature images with the size of H multiplied by W, the pixel height pooling feature images are subjected to one-dimensional convolution expansion operation to obtain pixel height-based expansion feature images with the size of H multiplied by W, the two expansion feature images are subjected to superposition fusion, and then a convolution operation with the convolution kernel of 1 and a Sigmoid activation function operation are sequentially carried out to obtain an activation feature image; and finally, multiplying the activated feature map with the input feature map to obtain a pooled downsampling feature map serving as output.
6. The method for detecting the bridge concrete cracks in the complex background based on deep learning according to claim 1, wherein the bridge crack identification and detection model is trained by the following modes:
and taking the bridge crack sample image with the crack region segmentation mask mark which is finished in advance as a training sample to form a training sample set, inputting the bridge crack identification detection model, constructing a total loss function comprising cross entropy loss and Dice loss for evaluating the bridge crack identification performance, optimizing and updating model parameters of the bridge crack identification detection model with the minimum total loss function as a target, and further training the bridge crack identification detection model.
7. The method for detecting the bridge concrete cracks under the complex background based on deep learning according to claim 6, wherein the total loss function is:
L Total =L CE +λL Dice ;
wherein L is Total Represents the total loss function, L CE Represents cross entropy loss, L Dice Representing the Dice loss; g i True segmentation mask value, p, representing the i-th pixel position in a bridge crack sample image i The prediction segmentation mask value of the ith pixel position in the bridge crack sample image predicted and output by the bridge crack recognition detection model is represented, i epsilon {1,2, …, H multiplied by W }, and H, W are the pixel height and the pixel width of the bridge crack sample image respectively; λ is the weight coefficient.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310874021.5A CN116823800A (en) | 2023-07-17 | 2023-07-17 | Bridge concrete crack detection method based on deep learning under complex background |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310874021.5A CN116823800A (en) | 2023-07-17 | 2023-07-17 | Bridge concrete crack detection method based on deep learning under complex background |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116823800A true CN116823800A (en) | 2023-09-29 |
Family
ID=88141069
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310874021.5A Pending CN116823800A (en) | 2023-07-17 | 2023-07-17 | Bridge concrete crack detection method based on deep learning under complex background |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116823800A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117291902A (en) * | 2023-10-17 | 2023-12-26 | 南京工业大学 | Detection method for pixel-level concrete cracks based on deep learning |
CN117292266A (en) * | 2023-11-24 | 2023-12-26 | 河海大学 | Method and device for detecting concrete cracks of main canal of irrigation area and storage medium |
CN117952977A (en) * | 2024-03-27 | 2024-04-30 | 山东泉海汽车科技有限公司 | Pavement crack identification method, device and medium based on improvement yolov s |
CN117974740A (en) * | 2024-04-01 | 2024-05-03 | 南京师范大学 | Acupoint positioning method and robot based on aggregation type window self-attention mechanism |
CN118096645A (en) * | 2023-12-20 | 2024-05-28 | 国网四川省电力公司电力科学研究院 | Method and system for detecting shock-damper defects based on guidance of structural feature extraction module |
-
2023
- 2023-07-17 CN CN202310874021.5A patent/CN116823800A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117291902A (en) * | 2023-10-17 | 2023-12-26 | 南京工业大学 | Detection method for pixel-level concrete cracks based on deep learning |
CN117291902B (en) * | 2023-10-17 | 2024-05-10 | 南京工业大学 | Detection method for pixel-level concrete cracks based on deep learning |
CN117292266A (en) * | 2023-11-24 | 2023-12-26 | 河海大学 | Method and device for detecting concrete cracks of main canal of irrigation area and storage medium |
CN117292266B (en) * | 2023-11-24 | 2024-03-22 | 河海大学 | Method and device for detecting concrete cracks of main canal of irrigation area and storage medium |
CN118096645A (en) * | 2023-12-20 | 2024-05-28 | 国网四川省电力公司电力科学研究院 | Method and system for detecting shock-damper defects based on guidance of structural feature extraction module |
CN117952977A (en) * | 2024-03-27 | 2024-04-30 | 山东泉海汽车科技有限公司 | Pavement crack identification method, device and medium based on improvement yolov s |
CN117952977B (en) * | 2024-03-27 | 2024-06-04 | 山东泉海汽车科技有限公司 | Pavement crack identification method, device and medium based on improvement yolov s |
CN117974740A (en) * | 2024-04-01 | 2024-05-03 | 南京师范大学 | Acupoint positioning method and robot based on aggregation type window self-attention mechanism |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116823800A (en) | Bridge concrete crack detection method based on deep learning under complex background | |
Hou et al. | Inspection of surface defects on stay cables using a robot and transfer learning | |
CN111861978A (en) | Bridge crack example segmentation method based on Faster R-CNN | |
CN114913150B (en) | Intelligent identification method for concrete dam defect time sequence image | |
Li et al. | Automatic bridge crack identification from concrete surface using ResNeXt with postprocessing | |
CN111080609B (en) | Brake shoe bolt loss detection method based on deep learning | |
CN114049356B (en) | Method, device and system for detecting structure apparent crack | |
CN113034444A (en) | Pavement crack detection method based on MobileNet-PSPNet neural network model | |
CN114705689A (en) | Unmanned aerial vehicle-based method and system for detecting cracks of outer vertical face of building | |
CN113239838B (en) | Unmanned aerial vehicle intelligent inspection identification method for power transmission tower | |
CN114612803B (en) | Improved CENTERNET transmission line insulator defect detection method | |
CN114419421A (en) | Subway tunnel crack identification system and method based on images | |
CN117952898A (en) | Water delivery tunnel crack detection method based on UNet network | |
CN118334007A (en) | Crack detection and early warning method and system for hydraulic concrete structure | |
CN118097436A (en) | Building surface defect detection system and detection method | |
CN118072193A (en) | Dam crack detection method based on unmanned aerial vehicle image and deep learning | |
CN117078591A (en) | Real-time road defect detection method, system, equipment and storage medium | |
Jia et al. | Research on Industrial Production Defect Detection Method Based on Machine Vision Technology in Industrial Internet of Things. | |
CN116206222A (en) | Power transmission line fault detection method and system based on lightweight target detection model | |
CN115760720A (en) | Crack online detection method and system based on mobile device and super-resolution reconstruction segmentation network | |
CN113902739A (en) | NUT wire clamp defect identification method, device and equipment and readable storage medium | |
Liu et al. | A hierarchical semantic segmentation framework for computer vision-based bridge damage detection | |
YAMANE et al. | Reflection of the position of cracks in a 3-d model of a bridge using semantic segmentation | |
Liu et al. | A multi-scale residual encoding network for concrete crack segmentation | |
CN118230253B (en) | Iron tower video image farmland extraction method and device based on attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |