CN113610778B - Bridge surface crack detection method and system based on semantic segmentation - Google Patents

Bridge surface crack detection method and system based on semantic segmentation Download PDF

Info

Publication number
CN113610778B
CN113610778B CN202110817766.9A CN202110817766A CN113610778B CN 113610778 B CN113610778 B CN 113610778B CN 202110817766 A CN202110817766 A CN 202110817766A CN 113610778 B CN113610778 B CN 113610778B
Authority
CN
China
Prior art keywords
crack
image
feature
bridge
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110817766.9A
Other languages
Chinese (zh)
Other versions
CN113610778A (en
Inventor
卢涛
饶茜雅
吴志豪
张彦铎
吴云韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Institute of Technology
Original Assignee
Wuhan Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Institute of Technology filed Critical Wuhan Institute of Technology
Priority to CN202110817766.9A priority Critical patent/CN113610778B/en
Publication of CN113610778A publication Critical patent/CN113610778A/en
Application granted granted Critical
Publication of CN113610778B publication Critical patent/CN113610778B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30108Industrial image inspection
    • G06T2207/30132Masonry; Concrete
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention discloses a bridge surface crack detection method and system based on semantic segmentation, comprising the following steps: collecting a bridge crack image, performing pixel-level semantic annotation on the image, and manufacturing a bridge crack segmentation data set; constructing a semantic segmentation network based on feature encoding and decoding and utilizing cross attention-based jump connection for combining high-level semantics and low-level fine-grained surface layer information; based on the general classification loss, using a type of consistency loss to enable the network to map each pixel in the image to n-dimensional vectors in the feature space, so that the feature vectors of the pixels belonging to the type are close, and the feature vectors of the pixels of different types are far away; extracting a crack morphology skeleton after obtaining a crack image generated by segmentation, and performing crack short branch elimination to calculate crack length; width pixels are calculated based on the image cracks in the skeleton direction. The method can realize rapid and accurate segmentation of the bridge cracks and enable the segmented crack image structure to be more complete.

Description

Bridge surface crack detection method and system based on semantic segmentation
Technical Field
The invention belongs to the field of bridge crack detection, and particularly relates to a method and a system for detecting a bridge surface crack based on semantic segmentation.
Background
As an important traffic infrastructure, bridges need to be inspected and maintained regularly, and the occurrence of cracks on the bridge surface is an important potential safety hazard, so that the occurrence of cracks is an important point of inspection. Traditional bridge surface crack detection is based on manual measurement or digital image processing technology. The crack length detection method based on manual measurement generally adopts a tape measure and the like to carry out field measurement, and a crack width detector and the like are generally adopted for a width detection instrument of a concrete bridge crack, so that a great deal of labor cost is required, and the traditional manual measurement method cannot carry out real-time detection. With the popularization of video monitoring equipment, digital image processing technology is widely applied to the field of actual production and life. In recent years, a plurality of scholars at home and abroad adopt a digital image processing technology to carry out target detection and target segmentation to replace the traditional manual inspection, thereby obtaining better effects. However, the traditional digital image processing method is greatly affected by the environment, the accuracy is not high enough, and the effect improvement difficulty is high. At present, researchers at home and abroad have conducted researches for solving the problem in the field of computer vision by using a deep learning method, and have proved that the detection precision can be remarkably improved. However, unlike general target detection, bridge surface crack detection requires that the crack be segmented and the actual size of the crack calculated for the actual scenario. Therefore, the method based on deep learning is studied, crack segmentation is carried out on the premise of maintaining the structural integrity of the cracks, and the actual size of the cracks is calculated as accurately as possible, so that the method has theoretical and practical significance.
Disclosure of Invention
The invention aims to provide a method and a system for detecting cracks on the surface of a bridge based on semantic segmentation, which solve the problems of detecting the cracks of the bridge and improving the structural integrity of the cracks by utilizing image semantic segmentation.
In order to solve the technical problems, the technical scheme of the invention is as follows: a bridge surface crack detection method and system based on semantic segmentation comprises the following steps:
(1) Collecting a bridge crack image, performing pixel-level semantic annotation on the collected image, and manufacturing a bridge surface crack data set;
(2) Constructing a feature encoder module, and extracting semantic features from an original picture through convolution operation and downsampling;
(3) Constructing a cross attention-based feature decoder module, gradually recovering position information from high-level semantic features through up-sampling and convolution, performing jump connection with each corresponding layer decoder of an encoder, constructing two continuous cross attention modules at jump connection positions to extract context association, and more effectively combining high-level semantic and low-level fine-granularity surface layer information;
(4) Introducing a class consistency loss on the basis of the classification loss and training a network, so that each pixel in the image is mapped to an n-dimensional vector in a feature space by the network, the feature vectors of the pixels belonging to the class are close, the feature vectors of the pixels of different classes are far away, and a classification result of the pixel level is obtained;
(5) And (3) carrying out morphological skeleton extraction on the crack segmentation result output by the network, carrying out crack short branch elimination to calculate the crack length, calculating the width based on the image crack in the skeleton direction, and further estimating the actual size of the crack according to the proportional relation.
In some alternative embodiments, in step (1), a data set of bridge cracks is created based on the characteristics of the crack subject to omission. The dataset contained 600 crack image artwork from 10 bridges. 9 300×300 pixel minidrawings were randomly cut from each original, and then the images were divided into positive and negative samples, which represent cracked images and non-cracked images, respectively. Positive samples included reticulated cracks, lower edge tensile zone thin short cracks (with jitter blur), vertical cracks on the web (including low contrast images), vertical cracks at the diaphragm (with complex background texture), pier foundation concrete cracks (with water stain interference). Negative examples include honeycomb pitting, flaking off corners (high similarity to corner edges), void holes, steel bar rust (with lighting shadows), sky, trees, water stains and shadows. Because the positive and negative samples are unbalanced (there are no crack images far more than there are crack images), 2400 crack images and 3000 crack-free images are screened from the small image to produce a data set, and 800 images are taken as a test set (positive and negative sample ratio 2:3).
In some alternative embodiments, in step (2), a feature encoder is constructed, the encoding path of the encoder comprising threeThe steps are respectively denoted as s 1 ,s 2 ,s 3 The input to each step is denoted as e i0 ,e i1 And e i2 Wherein e is i0 For the original image, each step adopts the same operation: two convolutional layers are used, each followed by a ReLU active layer, and then a maximum pooling of downsampling is performed in steps of 2, and the number of channels is doubled in each downsampling. Each step of the encoder extracts image semantic features with different scales, and the output of each step is marked as e o0 ,e o1 And e o2 . The output of each step is the input of the next step, so there is e ok =e i k+1 K is 0,1, then e o2 Inputting two convolution layers, wherein each convolution layer is connected with a ReLU activation layer, and taking output as decoder input;
in some alternative embodiments, in step (3):
the decoding path of the feature decoder is composed of three steps s 4 ,s 5 Sum s 6 The method comprises the following steps: an up-sampling and halving the number of channels convolutional layers, then two convolutional layers are used, each followed by a ReLU activation function. The output of each step is defined as d o0 ,d o1 ,d o2
In particular, a jump connection incorporating two consecutive crisscross attention modules is established at corresponding steps of the encoder and decoder to incorporate dense predictions of different granularity. The jump connection is established at s i And s 7-i Between i e 1,2,3, the corresponding feature maps extracted over multiple scales by the encoder are first connected to the output of each step of the decoder: encoder output e o2 Feature map and d o0 Splicing in the channel dimension, denoted as T 0 E, in the same way o1 And d o1 Splice T 1 ,e o0 And d o2 Spliced into T 2 As input to the cross attention group, input d to the decoder i0 ,d i1 ,d i2 The definition is as follows: croA (CroA (T) 0 )),CroA(CroA(T 1 )),CroA(CroA(T 2 ) CroA represents the crisscross attention module, i.e. the first moduleThe T' of the block output is taken as input to the second module. And acquiring the association relation from each feature point to the whole feature map through the crisscross attention module, and carrying out associated feature enhancement.
The crisscrossed attention module extracts context information in both the horizontal and vertical directions to enhance the functionality of the pixel feature representation. Each crisscross attention module operates to:
the feature map T is input. Two low-dimensional features Q and K are generated through a 1 x 1 convolution layer. Q and K further generate attention patterns A through a "similarity" operation. Obtaining a vector Q at each position u in the spatial dimension of Q u . At the same time, omega is obtained by extracting the feature vector by K in the same row or the same column as the position u u Aggregation by Ω i,u Representing omega u Is the i-th element of (c).d i,u ∈D,/>Wherein d i,u Is characteristic Q u And omega i,u Degree of correlation between the two, i= [1, …, H+W-1]Then applying softmax to D on the channel scale gets attention seeking graph a;
another 1 x 1 convolution layer is applied over T to generate V for feature adaptation. Each position u in the spatial dimension of V obtains a vector V u Sum of phi u . Aggregate phi u Is a set of feature vectors in V, which are located in the same row or column of position u, byThe context information is added to the local feature T to enhance the pixelwise representation. Wherein (1)>For extracting context information, T u 'is the eigenvector at position u in T', A i,u Is the scalar value at position u in channels i and a. Thus, the first and second substrates are bonded together,it has a broad context Wen Shitu and selectively extracts context based on spatial attention attempts.
The feature vectors are mapped to the required number of classes at the final layer of the decoder using a 1 x 1 convolution.
In some alternative embodiments, in step (4), at a classification loss l seg Introduces class consistency loss l on the basis of class For semantic segmentation tasks, pixels belonging to the same class should have similar features, while pixel features from different classes differ more, and this feature is named class consistency. This class consistency penalty may cause the network to map each pixel in the image to an n-dimensional vector in the feature space such that feature vectors for pixels belonging to that class are close, feature vectors for pixels of different classes are far, and the final penalty function is defined as: l=l seg +ml class 。l class For class consistency loss, l class =αl var +βl dis +γl reg Where m, α, β and l are weight coefficients.
By using l var Penalizing the same class of features with larger distances,by using l dis Punishment of different class features with smaller distances, < +.>c a ≠c b The method comprises the steps of carrying out a first treatment on the surface of the By using l reg Pushing all class features to the mean point of the class in which they are located,/->Where C is a group of classes present in the small lot image, N c The number of active elements belonging to class c, c.epsilon. C, h i E H is the eigenvector of spatial location i. Mu (mu) c Is the average feature of class C e C.
Wherein the method comprises the steps ofIs a function of the distance between the segments,
feature h i And mu c Distance d of (2) v =‖μ c -h i II, when d v Greater than delta d Time of dayAs a quadratic function, when d v At (delta) vd ]Is a linear function when d v ≤δ v Time->Zero.
Wherein the method comprises the steps ofIs a function of the distance between the segments,
features (e.g. a character)And->Distance of->When d v Less than 2 delta d Time->As a quadratic function, when d v Greater than 2 delta d Time->Zero.
δ v And delta d The set margins respectively.
To reduce the computational effort, the size is first reduced using convolutional layers on the output of the crisscross attention module, and then the three above-mentioned losses are applied on feature maps with fewer channels.
In some alternative embodiments, step (5) comprises: and (3) skeletonizing crack morphology, eliminating crack short branches, calculating crack length and calculating crack width.
The method of using the maximum disk defines a skeleton, wherein the target skeleton consists of the circle centers of all inscribed disks in the target, and the maximum disk method is expressed as:wherein->Wherein B is a structural element, (-) -A>Representing a succession of j etches for a,
aiming at the short branch formed in the skeletonizing process, firstly traversing the crack picture, and iteratively deleting the boundary point of an area, wherein the definition of the boundary point is as follows: and at least one pixel point with a pixel value of 1 and at least one adjacent pixel value of 0, wherein 0 represents a point of a background area, 1 represents a point of a crack area, 8 fields are adopted, and two steps are adopted to judge whether the boundary point is a deletion point. In step I, a contour point p is determined if the following conditions are met 1 Marked as delete points:
in the middle of,N(p 1 ) Representing the number of non-zero adjacent pixels, T (p 1 ) Represents p 2, p 3,… p 9, The number of the grabbing and changing times of the sequencing sequence from 0 to 1;
in step II, the conditions of the above formulas (a) and (b) are maintained, and the conditions of (c) and (d) are changed to:
the judging method comprises the following steps: the first step is to apply the condition of step I, if at least one of the conditions (a) - (d) in step I is violated, then p 1 If the value is unchanged, marking as a deleting point, and deleting after the value of the deleting point is modified to 0 after the step I is applied to all boundary points; the second step applies the condition of the step II to the result of the first step, and the rule is the same as that of the first step; the image formed by the finally obtained points is the trimmed crack skeleton.
The crack length can be directly calculated because the crack discontinuity problem does not occur after the intermittent crack connection process.
Based on the skeleton method, a method for calculating the width pixels of the image cracks is applied. The image on the skeleton is a single-pixel image, the tangential direction of each point on the skeleton line can be calculated according to the skeleton image, the tangential line of each point in the crack is calculated, then the normal line perpendicular to the tangential line on the skeleton line is calculated, and the intersection point distance of the normal line and the crack boundary is the width of the crack.
And estimating the crack size according to the proportional relation between the actual bridge crack and the picture crack. And finally, multiplying the proportional relation by the bridge crack image obtained by segmentation to obtain the actual length and width of the crack in the graph.
According to another aspect of the present invention, there is provided a bridge surface crack detection system based on semantic segmentation, comprising:
the data acquisition module is used for manufacturing a bridge crack semantic segmentation data set;
the encoder module is used for extracting semantic features from the original picture through downsampling by convolution and pooling operations;
the feature decoder module based on cross attention is used for extracting context association, combining high-level semantic and low-level fine granularity surface layer information more effectively, performing jump connection on the features acquired from each corresponding layer of the encoder and the decoder, constructing two continuous cross attention modules at jump connection positions, and gradually recovering position information from the high-level semantic features through up-sampling and convolution;
the loss calculation module is used for introducing a type of consistency loss based on the classification loss and training a network to obtain a classification result of the pixel level;
and the crack calculation module is used for extracting a crack morphology framework, carrying out crack short branch elimination to calculate the crack length, and calculating the image crack width based on the morphology framework.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a bridge surface crack detection method and system based on semantic segmentation, which utilizes two crisscross attentions to combine high-level semantics and low-level fine granularity surface layer information more effectively at a jump joint of a network, and utilizes a type of consistency loss on the basis of classification loss so as to enable feature vectors of pixels belonging to the type to be close and feature vectors of pixels of different types to be far away. Global attention is achieved, thereby enhancing the feature expression capabilities of the network.
Drawings
FIG. 1 is a schematic flow diagram of a semantic segmentation bridge crack detection method based on crisscross attention and class consistency loss provided by an embodiment of the present invention;
FIG. 2 is a diagram of a network structure for detecting cracks on a bridge surface based on semantic segmentation according to an embodiment of the present invention;
FIG. 3 is a block diagram of a crisscross attention module provided by embodiments of the present invention;
FIG. 4 is a schematic diagram of a bridge surface crack detection system based on semantic segmentation according to an embodiment of the present invention;
fig. 5 is a graph showing a comparison of test results according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
The semantic segmentation bridge crack detection method based on the crisscross attention and the class consistency loss, which is disclosed by the embodiment of the invention, is shown in fig. 1, and comprises the following steps of:
s1: manufacturing a bridge surface crack data set;
in the embodiment of the invention, a bridge surface crack data set is manufactured according to the structural characteristics of the existence of the bridge surface crack which is easy to miss. The data set comprises 600 crack images of 10 bridges from the Wuhan in Hubei province, the selected images are from different scenes in order to improve the generalization capability of the model, and the scene range shot by the camera and the image size proportion under each scene are kept uniform for facilitating the calculation of the actual crack size.
Several equi-sized (300 x 300 pixels) minigrams were randomly cropped from each original. The image is divided into a positive sample and a negative sample, the positive and negative samples representing a cracked image and a non-cracked image, respectively. Positive samples included reticulated cracks, lower edge tensile zone thin short cracks (with jitter blur), vertical cracks on the web (including low contrast images), vertical cracks at the diaphragm (with complex background texture), pier foundation concrete cracks (with water stain interference). Negative examples include honeycomb pitting, flaking off corners (high similarity to corner edges), void holes, steel bar rust (with lighting shadows), sky, trees, water stains and shadows. Because the positive and negative samples are unbalanced (there are no crack images far more than there are crack images), 2400 crack images and 3000 crack-free images are screened from the small image to produce a data set, and 800 images are taken as a test set (positive and negative sample ratio 2:3).
S2: constructing a feature encoder module, and extracting semantic features from an original picture through convolution operation and downsampling;
in the embodiment of the invention, a feature encoder is constructed, and the encoding path of the encoder comprises three steps, which are respectively denoted as s 1 ,s 2 ,s 3 The input to each step is denoted as e i0 ,e i1 And e i2 Wherein e is i0 For the original image, each step adopts the same operation: two 3 x 3 convolutions are used, each convolution layer followed by a ReLU activation layer and a maximum pooling, then downsampling by a step size of 2, and doubling the number of channels in each downsampling. Each step of the encoder extracts image semantic features with different scales, and the output of each step is marked as e o0 ,e o1 And e o2 . The output of each step is the input of the next step, so there is e ok =e i k+1 K is 0,1, then e o2 Inputting two 3×3 convolution layers, each convolution layer being followed by a ReLU activation layer, taking the output as decoder input;
s3: gradually recovering position information from the high-level semantic features through up-sampling and convolution, performing jump connection with each corresponding layer decoder of the encoder, constructing two continuous crisscross attention modules at jump connection positions to extract context association, and more effectively combining the high-level semantic and low-level fine granularity surface layer information;
in the embodiment of the present invention, as shown in fig. 2, the decoding path of the feature decoder is composed of three steps s 4 ,s 5 Sum s 6 The method comprises the following steps: an up-sampling and halving the number of channels by a 2 x 2 convolution, then two 3 x 3 convolutions are used, each followed by a ReLU activation function. The output of each step is defined as d o0 ,d o1 ,d o2
In particular, a jump connection incorporating two consecutive crisscross attention modules is established at corresponding steps of the encoder and decoder to incorporate dense predictions of different granularity. The jump connection is established at s i And s 7-i I epsilon 1,2,3, corresponding features extracted on multiple scales by the encoderThe diagram first connects the output of each step of the decoder to: encoder output e o2 Feature map and d o0 Splicing in the channel dimension, denoted as T 0 E, in the same way o1 And d o1 Splice T 1 ,e o0 And d o2 Spliced into T 2 As input to the cross attention group, input d to the decoder i0 ,d i1 ,d i2 The definition is as follows: croA (CroA (T) 0 )),CroA(CroA(T 1 )),CroA(CroA(T 2 ) CroA represents the crisscross attention module, i.e. T' output by the first module is taken as input to the second module. And acquiring the association relation from each feature point to the whole feature map through the crisscross attention module, and carrying out associated feature enhancement.
In the embodiment of the invention, the local feature map T is input to the crisscross attention module, where H, W, C represents the height, width and number of channels, respectively, of the feature map. First, T is convolved by two 1X 1 to generate two low-dimensional features Q and K, < >>Wherein C' is less than C.
Q and K further generate attention attempts a through "similarity" operations,obtaining a vector Q at each position u in the spatial dimension of Q u ,/>At the same time, omega is obtained by extracting the feature vector by K in the same row or the same column as the position u u Gather (S)>By omega i,u Representing omega u Is the i-th element of (c).d i,u ∈D,/>Wherein d i,u Is characteristic Q u And omega i,u Degree of correlation between the two, i= [1, …, H+W-1]Then applying softmax to D on the channel scale gets attention seeking graph a; applying another 1 x 1 convolution layer on T to generate V for feature adaptation,/->Each position u in the spatial dimension of V obtains a vector V u Sum of phi u ,/>Aggregate phi u Is a set of feature vectors in V, which are located in the same row or column of position u by +.>The context information is added to the local feature T to enhance the pixelwise representation. Wherein (1)>For extracting context information, T u 'is the eigenvector at position u in T'>A i,u Is the scalar value at position u in channels i and a. Thus, it has a broad context Wen Shitu and selectively extracts context according to a spatial attention map. The crisscrossed attention module structure is shown in figure 3.
A single crisscross attention can capture context information in both horizontal and vertical directions, but there is no connection between one pixel and its non-intersecting path pixels anyway, so using two crisscross attention modules in succession, an arbitrary position association is established, full image context information can be obtained from all pixels to generate new features with dense and rich context information.
S4: introducing a class consistency loss on the basis of the classification loss and training a network to obtain a pixel-level classification result;
at a classification loss l seg Introduces class consistency loss l on the basis of class For semantic segmentation tasks, pixels belonging to the same class should have similar features, while pixel features from different classes differ more, and this feature is named class consistency. This class consistency penalty may cause the network to map each pixel in the image to an n-dimensional vector in the feature space such that feature vectors for pixels belonging to that class are close, feature vectors for pixels of different classes are far, and the final penalty function is defined as: l=l seg +ml class 。l class For class consistency loss, l class =αl var +βl dis +γl reg Where m, α, β and l are weight coefficients. In a specific implementation, m=1, α=β=1, γ=0.001 is set.
By using l var Penalizing the same class of features with larger distances,by using l dis Punishment of different class features with smaller distances, < +.>c a ≠c b The method comprises the steps of carrying out a first treatment on the surface of the By using l reg Pushing all class features to the mean point of the class in which they are located,/->Where C is a group of classes present in the small lot image, N c The number of active elements belonging to class c, c.epsilon. C, h i E H is the eigenvector of spatial location i. Mu (mu) c Is the average of class C e CFeatures.
Wherein the method comprises the steps ofIs a function of the distance between the segments,
feature h i And mu c Distance d of (2) v =‖μ c -h i II, when d v Greater than delta d Time of dayAs a quadratic function, when d v At (delta) vd ]Is a linear function when d v ≤δ v Time->Zero.
Wherein the method comprises the steps ofIs a function of the distance between the segments,
features (e.g. a character)And->Distance of->When d v Less than 2 delta d Time->As a quadratic function, when d v Greater than 2 delta d Time->Zero.
δ v And delta d Respectively, the margins are set, in a specific implementation, delta is set v =0.5,δ d =1.5。
To reduce the computational effort, the size is first reduced using a convolutional layer on the output of the crisscross attention module, in a specific implementation 16 is set as the number of channels for dimension reduction, and then the three losses described above are applied on the feature map with fewer channels.
S5: and (3) carrying out morphological skeleton extraction on the crack segmentation result output by the network, carrying out crack short branch elimination to calculate the crack length, calculating the width based on the image crack in the skeleton direction, and further estimating the actual size of the crack according to the proportional relation.
In the embodiment of the invention, the framework is defined by using the method of the maximum disc, the target framework of the framework is composed of the circle centers of all inscribed discs in the target, and the maximum disc method is expressed as follows:wherein the method comprises the steps ofWherein B is a structural element, (-) -A>Representing a succession of j etches for a,
aiming at the short branch formed in the skeletonizing process, firstly traversing the crack picture, and iteratively deleting the boundary point of an area, wherein the definition of the boundary point is as follows: and at least one pixel point with a pixel value of 1 and at least one adjacent pixel value of 0, wherein 0 represents a point of a background area, 1 represents a point of a crack area, 8 fields are adopted, and two steps are adopted to judge whether the boundary point is a deletion point. At the position ofIn step I, a contour point p is determined if the following conditions are satisfied 1 Marked as delete points:
wherein N (p) 1 ) Representing the number of non-zero adjacent pixels, T (p 1 ) Represents p 2, p 3,… p 9, The number of the grabbing and changing times of the sequencing sequence from 0 to 1;
in step II, the conditions of the above formulas (a) and (b) are maintained, and the conditions of (c) and (d) are changed to:
the judging method comprises the following steps: the first step is to apply the condition of step I, if at least one of the conditions (a) - (d) in step I is violated, then p 1 If the value is unchanged, marking as a deleting point, and deleting after the value of the deleting point is modified to 0 after the step I is applied to all boundary points; the second step applies the condition of the step II to the result of the first step, and the rule is the same as that of the first step; the image formed by the finally obtained points is the trimmed crack skeleton.
Because the crack discontinuity problem can not occur after the intermittent crack connection process, the crack length can be directly obtained.
Based on the skeleton method, a method for calculating the width pixels of the image cracks is applied. The image on the skeleton is a single-pixel image, and the tangential direction of each point on the skeleton line can be calculated according to the skeleton image. If M is any point on the skeleton picture, two points which are 1 are arranged on 8 adjacent areas of the M points. FIG. 4-3 is a tangent diagram of eight neighborhood regions of point M, N 1 、N 2 Two points out of eight neighbors of point M, respectively, where the tangential direction of point M is equal to MN 1 With MN 2 Average in tangential direction, i.e. After calculating the tangent line of each point in the crack, calculating the normal line perpendicular to the tangent line on the skeleton line. The intersection distance of the normal line and the crack boundary is the width of the crack. Considering that the variation of the crack width is slow in the actual scene, in order to reduce the calculation amount and improve the efficiency, a point with a certain interval on the crack skeleton (the size of the interval is comprehensively determined according to factors such as the image resolution, the crack type and the like) can be selected, and the quantized width at the point can be obtained for the point with the certain interval. And finally, selecting the point with the largest quantization width from the points, and solving the quantization width of each point on the front section and the rear section of the points on the skeleton, wherein the maximum value of the quantization width is the maximum value of the crack area width of the section.
And calculating the proportional relation between the actual scene and the picture according to the measured data and the picture, and finally multiplying the proportional relation by the segmented bridge crack image to estimate the actual length and width of the crack in the picture.
The invention also provides a semantic segmentation bridge crack detection system based on crisscross attention and class consistency loss, as shown in fig. 4, comprising:
the data acquisition module is used for manufacturing a bridge crack semantic segmentation data set;
the encoder module is used for extracting semantic features from the original picture through downsampling by convolution and pooling operations;
the feature decoder module based on cross attention is used for extracting context association, combining high-level semantic and low-level fine granularity surface layer information more effectively, performing jump connection on the features acquired from each corresponding layer of the encoder and the decoder, constructing two continuous cross attention modules at jump connection positions, and gradually recovering position information from the high-level semantic features through up-sampling and convolution;
and the loss calculation module is used for introducing a type of consistency loss based on the classification loss and training the network to obtain a pixel-level classification result.
And the crack calculation module is used for extracting a crack morphology framework, carrying out crack short branch elimination to calculate the crack length, and calculating the image crack width based on the morphology framework.
The specific implementation of each module may refer to the description of the method embodiment, and the embodiment of the present invention will not be repeated.
In the test example of the invention, several classical segmentation algorithms including FCN algorithm, RAU-Net algorithm and Mask-CNN algorithm are selected to verify the validity of the experiment, and verification is performed on the bridge data set manufactured by the user. The pairs of the method and each classical algorithm proposed by the invention are shown in table 1.
Table 1 comparison of individual algorithm accuracy and time spent in bridge fracture dataset
Method mIoU Time/ms
FCN 0.432 123
Mask R-CNN 0.443 183
Ours 0.458 128
From the above table it can be seen that the present invention can achieve a better result with less time consumption while achieving a better result in accuracy. Compared with FCN, the method has higher accuracy and higher speed compared with RAU-Net. Therefore, the algorithm provided by the chapter can achieve a better effect in precision and speed. The segmentation effect of each algorithm in the bridge crack dataset is compared with that of fig. 5, and as can be seen from fig. 5, the method provided by the invention can obtain a more complete crack structure compared with each algorithm.
Table 2 comparison of measured actual and measured values of two split pictures extracted
Experiment Calculating the length/cm Calculating width/cm Length error/% Width error/%
First sheet of 48.04 1.01 3.66 3.78
Second sheet 114.88 0.3754 4.57 4.28
In order to detect the length and width of a bridge crack in an actual scene and prevent the influence of the change of natural illumination conditions on experiments, the independent uniform light source is adopted for shielding natural light, and about 30 photos are taken to verify the accuracy of our method. The practical result shows that the average error of the length measurement of 30 crack pictures is 2.76%, the average error of the width measurement is 4.34%, the experimental result reflects that the precision of skeletonizing to calculate the crack length is higher, the system design requirement is met, and the precision of calculating the maximum width of the crack by a crack width quantization model method also meets the system requirement. For the width characteristics of the cracks, the average width of the cracks can reflect the overall damage degree of the cracks, the maximum width of the cracks can reflect the local damage degree of the cracks, and the two comprehensive consideration can better reflect the real damage condition of the cracks.
It should be noted that each step/component described in the present application may be split into more steps/components, or two or more steps/components or part of the operations of the steps/components may be combined into new steps/components, as needed for implementation, to achieve the object of the present invention.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (7)

1. The bridge surface crack detection method based on semantic segmentation is characterized by comprising the following steps of:
step 1, acquiring a bridge crack image, performing pixel-level semantic annotation on the acquired image, and manufacturing a bridge crack segmentation data set;
step 2, constructing a feature encoder module, and extracting multi-level semantic features from an original picture through multiple convolution operations and downsampling;
step 3, constructing a characteristic decoder module based on cross attention, gradually recovering position information from high-level semantic features in an original picture through up-sampling and convolution operation for a plurality of times, performing jump connection on each decoder layer corresponding to the encoder module, constructing two continuous cross attention modules at jump connection positions, extracting context association, and combining high-level semantic and low-level fine-grained surface layer information;
step 4, introducing a class consistency loss based on the classification loss, and training a network to map each pixel in the image to an n-dimensional vector in a feature space by the network so that the feature vectors of the pixels belonging to the class are close and the feature vectors of the pixels of different classes are far, thereby obtaining a classification result of the pixel level;
step 5, morphological skeleton extraction is carried out on the crack segmentation result output by the network, crack short branches are eliminated to calculate crack length, width is calculated based on image cracks in the skeleton direction, and the actual size of the cracks is estimated according to the proportional relation;
the process of constructing the feature decoder module based on the crisscross attention in the step 3 is as follows: the decoding path of the feature decoder module is composed of three steps s 4 ,s 5 Sum s 6 The method comprises the following steps: one up-sample and halving the number of channels in each up-sample, then using two convolution layers, each calculated to perform a ReLU activation function calculation, the output of each step is defined as d, respectively o0 ,d o1 ,d o2 The output of each step serves as the input for the next step;
the jump connection is established at s i And s 7-i Between i e 1,2,3, i.e. the corresponding feature map extracted by the encoder module over multiple scales is connected to the output of each step of the feature decoder module: encoder module output e o2 Feature map and d o0 Splicing in the channel dimension, denoted as T 0 E, in the same way o1 And d o1 Splice is marked as T 1 ,e o0 And d o2 Splice is marked as T 2 The T is 0 、T 1 And T 2 Respectively used as the input of three corresponding layers of the crisscross attention module;
the s is 4 The method comprises the following steps: at a classification loss l seg Is introduced on the basis of (a)Class consistency loss l class For semantic segmentation tasks, pixels belonging to the same class should have similar features, while the features of pixels from different classes differ more, this feature is named class consistency, which is a loss of class consistency for the network to map each pixel in the image to an n-dimensional vector in feature space to bring the feature vector of the pixel belonging to that class closer, the feature vector of the pixel of the different class farther away, and the final loss function/is defined as: l=l seg +ml class Where m is a weight coefficient.
2. The method for detecting the surface cracks of the bridge based on semantic segmentation according to claim 1, wherein the step 1 is specifically: collecting a plurality of different crack image original pictures of a plurality of bridges, randomly cutting out a plurality of square images with the same pixel size from each original picture, dividing the square images into a positive sample and a negative sample, wherein the positive sample and the negative sample respectively represent crack images and crack-free images, screening out a plurality of crack images and a plurality of crack-free images from the square images to manufacture a bridge crack segmentation data set, and taking a plurality of images in the bridge crack segmentation data set as a test set.
3. The method for detecting cracks on the surface of a bridge based on semantic segmentation according to claim 2, wherein the positive sample comprises a netlike crack image, a crack image with shake blur, a crack image with low contrast, a crack image with complex background texture and a crack image with water stain interference; the negative samples include honeycomb pitting, peeling off corners, cavity holes, steel bar rust, sky, trees, water stains and shadows.
4. The method for detecting the surface cracks of the bridge based on semantic segmentation according to claim 1, wherein the step 2 is specifically: constructing a characteristic encoder module, wherein the encoding path of the encoder module comprises three steps which are sequentially continuous and respectively marked as s 1 ,s 2 ,s 3 Wherein the input of each step is denoted as e i0 ,e i1 And e i2 Wherein e is i0 For the original image, each step comprises: using two convolution layers, calculating a ReLU activation function after each convolution layer is calculated, then carrying out maximum pooling of one downsampling with the step length of 2, and doubling the number of channels in each downsampling; the encoder module extracts image semantic features of different scales at each step, and the output of each step is denoted as e o0 ,e o1 And e o2 The output of each step serves as the input to the next step, i.e. e ok =e ik+1 ,k∈0,1。
5. The bridge surface crack detection method based on semantic segmentation according to claim 1, wherein the working process of the crisscross attention module is as follows: inputting a feature map T, generating two low-dimensional features Q and K through 1X 1 convolution layer calculation, and further generating attention force map A through similarity operation; obtaining a vector Q at each position u in the spatial dimension of Q u At the same time, Ω is obtained by extracting feature vectors from K in the same row or column as the position u u Aggregation by Ω i,u Representing omega u Is the i-th element of (a); the "similarity" operation is defined as:wherein d i,u Is characteristic Q u And omega i,u Degree of correlation between, i= [1, ], h+w-1]Then applying softmax to D on the channel scale gets attention seeking graph a;
applying another 1 x 1 convolution layer calculation on T to generate V for feature adaptation, obtaining a vector V at each position u in the spatial dimension of V u Sum of phi u The set phi u Is a set of feature vectors in V, the feature vectors in V are located in the same row or column of the position u by Adding context information to the local feature T to enhance the pixelwise representation; wherein said->For extracting context information, said T' u Is a feature vector at position u in T', said A i,u Is the scalar value at position u in lanes i and a;
the feature vectors are mapped to the required number of classes at the final layer of the feature decoder module using a 1 x 1 convolutional layer calculation.
6. The method for detecting cracks on the surface of a bridge based on semantic segmentation according to claim 1, wherein s is 5 The method comprises the following steps: the method of using the maximum disk defines a skeleton, wherein the target skeleton consists of the circle centers of all inscribed disks in the target, and the maximum disk method is expressed as:wherein->Wherein B is a structural element, (-) -A>Representing a succession of j etches for a,
aiming at the short branch formed in the skeletonizing process, firstly traversing the crack picture, and iteratively deleting the boundary point of an area, wherein the definition of the boundary point is as follows: at least one pixel point with a pixel value of 1 and at least one adjacent pixel point with a pixel value of 0, wherein 0 represents a point of a background area, 1 represents a point of a crack area, which is represented by 8 fields, and two are adoptedJudging whether the boundary point is a deleted point, in the step I, if the following condition is satisfied, a contour point p 1 Marked as delete points:
wherein N (p) 1 ) Representing the number of non-zero adjacent pixels, T (p 1 ) Represents p 2 ,p 3 ,...p 9 The number of the grabbing and changing times of the sequencing sequence from 0 to 1;
in step II, the conditions of the above formulas (a) and (b) are maintained, and the conditions of (c) and (d) are changed to:
the judging method comprises the following steps: the first step is to apply the condition of step I, if at least one of the conditions (a) - (d) in step I is violated, then p 1 If the value is unchanged, marking as a deleting point, and deleting after the value of the deleting point is modified to 0 after the step I is applied to all boundary points; the second step applies the condition of the step II to the result of the first step, and the rule is the same as that of the first step; the finally obtained image formed by the points is the trimmed crack skeleton;
the crack length is directly calculated because the problem of crack discontinuity can not occur after the intermittent crack connection process;
based on the skeleton method, a method for calculating width pixels of an image crack is applied: because the image on the skeleton is a single-pixel image, the tangential direction of each point on the skeleton line is calculated according to the skeleton image, the tangential line of each point in the crack is calculated, and then the normal line perpendicular to the tangential line on the skeleton line is calculated, wherein the intersection point distance between the normal line and the crack boundary is the width of the crack;
estimating the crack size according to the proportion relation between actual bridge cracks and picture cracks: and according to the proportional relation between the actual scene and the photo, finally, multiplying the proportional relation by the segmented bridge crack image to estimate the actual length and width of the crack in the graph.
7. A system for utilizing a semantic segmentation based bridge surface crack detection method as set forth in claim 1, comprising:
the data acquisition module is used for acquiring bridge crack images and carrying out pixel-level semantic annotation on the acquired images to manufacture a bridge crack segmentation data set;
the encoder module is used for extracting multi-level semantic features from the original picture through multiple convolution operations and downsampling;
the characteristic decoder module based on the crisscross attention is used for gradually recovering position information from high-level semantic characteristics in an original picture through up-sampling and convolution for a plurality of times, performing jump connection with each corresponding layer decoder of the encoder module, constructing two continuous crisscross attention modules at jump connection positions, extracting context association, and combining high-level semantic and low-level fine-granularity surface layer information;
the loss calculation module is used for introducing a type of consistency loss based on the classification loss and training a network, so that each pixel in the image is mapped to an n-dimensional vector in a feature space by the network, the feature vectors of the pixels belonging to the type are close, the feature vectors of the pixels of different types are far away, and a classification result of the pixel level is obtained;
and the crack calculation module is used for extracting a morphological skeleton from the crack segmentation result output by the network, eliminating crack short branches to calculate the crack length, calculating the width based on the image crack in the skeleton direction, and estimating the actual size of the crack according to the proportional relation.
CN202110817766.9A 2021-07-20 2021-07-20 Bridge surface crack detection method and system based on semantic segmentation Active CN113610778B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110817766.9A CN113610778B (en) 2021-07-20 2021-07-20 Bridge surface crack detection method and system based on semantic segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110817766.9A CN113610778B (en) 2021-07-20 2021-07-20 Bridge surface crack detection method and system based on semantic segmentation

Publications (2)

Publication Number Publication Date
CN113610778A CN113610778A (en) 2021-11-05
CN113610778B true CN113610778B (en) 2024-03-26

Family

ID=78337969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110817766.9A Active CN113610778B (en) 2021-07-20 2021-07-20 Bridge surface crack detection method and system based on semantic segmentation

Country Status (1)

Country Link
CN (1) CN113610778B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114581764B (en) * 2021-12-24 2024-04-26 中交基础设施养护集团有限公司 Underground structure crack disease discriminating method based on deep learning algorithm
CN114240945B (en) * 2022-02-28 2022-05-10 科大天工智能装备技术(天津)有限公司 Bridge steel cable fracture detection method and system based on target segmentation
CN115239733B (en) * 2022-09-23 2023-01-03 深圳大学 Crack detection method and apparatus, terminal device and storage medium
CN115393725B (en) * 2022-10-26 2023-03-07 西南科技大学 Bridge crack identification method based on feature enhancement and semantic segmentation
CN117649154B (en) * 2024-01-29 2024-04-19 新疆三联工程建设有限责任公司 Concrete test block manufacturing whole process management system and method based on digitization

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108520516A (en) * 2018-04-09 2018-09-11 陕西师范大学 A kind of bridge pavement Crack Detection and dividing method based on semantic segmentation
CN112348770A (en) * 2020-09-09 2021-02-09 陕西师范大学 Bridge crack detection method based on multi-resolution convolution network
CN112560895A (en) * 2020-11-20 2021-03-26 陕西师范大学 Bridge crack detection method based on improved PSPNet network
CN112884747A (en) * 2021-02-28 2021-06-01 长安大学 Automatic bridge crack detection system integrating cyclic residual convolution and context extractor network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108520516A (en) * 2018-04-09 2018-09-11 陕西师范大学 A kind of bridge pavement Crack Detection and dividing method based on semantic segmentation
CN112348770A (en) * 2020-09-09 2021-02-09 陕西师范大学 Bridge crack detection method based on multi-resolution convolution network
CN112560895A (en) * 2020-11-20 2021-03-26 陕西师范大学 Bridge crack detection method based on improved PSPNet network
CN112884747A (en) * 2021-02-28 2021-06-01 长安大学 Automatic bridge crack detection system integrating cyclic residual convolution and context extractor network

Also Published As

Publication number Publication date
CN113610778A (en) 2021-11-05

Similar Documents

Publication Publication Date Title
CN113610778B (en) Bridge surface crack detection method and system based on semantic segmentation
Ali et al. Attention-based generative adversarial network with internal damage segmentation using thermography
Ali et al. Structural crack detection using deep convolutional neural networks
CN110570371B (en) Image defogging method based on multi-scale residual error learning
CN113516135B (en) Remote sensing image building extraction and contour optimization method based on deep learning
CN115049936B (en) High-resolution remote sensing image-oriented boundary enhanced semantic segmentation method
CN111985552B (en) Method for detecting diseases of thin strip-shaped structure of airport pavement under complex background
CN115984850A (en) Lightweight remote sensing image semantic segmentation method based on improved Deeplabv3+
CN111524117A (en) Tunnel surface defect detection method based on characteristic pyramid network
CN110717886A (en) Pavement pool detection method based on machine vision in complex environment
CN111353396A (en) Concrete crack segmentation method based on SCSEOCUnet
CN114155474A (en) Damage identification technology based on video semantic segmentation algorithm
Kumar et al. Detection of concrete cracks using dual-channel deep convolutional network
CN115761563A (en) River surface flow velocity calculation method and system based on optical flow measurement and calculation
CN115170520A (en) Metal mesh defect detection method based on structure contrast information lamination
CN111104850A (en) Remote sensing image building automatic extraction method and system based on residual error network
Meng et al. A modified fully convolutional network for crack damage identification compared with conventional methods
CN114092467A (en) Scratch detection method and system based on lightweight convolutional neural network
Jiang et al. Automatic pixel-level detection and measurement of corrosion-related damages in dim steel box girders using Fusion-Attention-U-net
CN111951289B (en) Underwater sonar image data segmentation method based on BA-Unet
CN113989653A (en) Method for extracting geometric indexes and topological structures of river planes
CN113326846A (en) Rapid bridge apparent disease detection method based on machine vision
CN114612450B (en) Image detection segmentation method and system based on data augmentation machine vision and electronic equipment
CN112686204B (en) Video flow measurement method and device based on sparse pixel point tracking
Li et al. Classification of the qilou (arcade building) using a robust image processing framework based on the Faster R-CNN with ResNet50

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant