CN115984233A

CN115984233A - Image crack detection method based on grouping normalization and shifting

Info

Publication number: CN115984233A
Application number: CN202310039698.7A
Authority: CN
Inventors: 张建明; 黄凤翔; 吕雅茹
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2023-01-12
Filing date: 2023-01-12
Publication date: 2023-04-18

Abstract

The invention discloses an image crack detection method based on grouping normalization and shifting, which is used for enhancing crack characteristics based on an attention mechanism of channel grouping normalization. Grouping normalization compresses the features from the space dimension into vectors, and performs normalization according to channel grouping; after the grouping normalization operation is completed, linear change is used for compensating loss in the normalization and compression processes, and learnable parameters of the linear change are used for weighting the original feature map from the channel layer, so that the effect of restraining meaningless features and highlighting meaningful features is achieved. The method provided by the invention can cope with the interference of various complex conditions in a natural scene, can also better and finely detect the fine cracks, and has high robustness and generalization of the model.

Description

Image crack detection method based on grouping normalization and shifting

Technical Field

The invention belongs to the field of image processing, and particularly relates to an image crack detection method based on grouping normalization and shifting.

Background

In the process of building and using infrastructure such as bridges, expressways and the like, structural or non-structural damage inevitably occurs under the influence of factors such as construction materials, building processes, extreme weather, vehicle overload and the like, and then apparent diseases such as honeycombs, rib leakage, holes, cracks and the like are formed. Periodic inspection and condition assessment of these infrastructures is therefore necessary to ensure safety and utility. The traditional method of manual detection has the defects of strong subjectivity, time consumption, high cost and the like, and in recent years, although a few algorithms have obtained good effects, certain defects still exist, such as a plurality of interference information of lane indication arrows and the like on a road surface, a model is interfered by the environment, the performance is reduced, and the generalization and the robustness are lacked. In addition, for the feature with thin texture, there are cases such as discontinuous segmentation and a lot of noise in segmentation.

At present, the mainstream crack detection method adopts an algorithm based on deep learning, and the deep learning method can obtain higher precision under the condition of enough computing resources. The fracture segmentation algorithm based on the deep convolutional neural network mainly improves fracture segmentation performance from two aspects: one idea is to design a network structure specially for crack detection, and the other idea is to design a special module to be embedded into the existing network structure according to crack characteristics.

The method is a relatively conceivable point for enhancing the network-to-crack segmentation performance from the design of the network structure, but the designed network often makes the model be relatively unwieldy, the speed of the model is slower although the performance of the model is eliminated, and a large amount of computing resources are needed for training the model. Thanks to its simplicity and high efficiency, convolutional neural network-based attention mechanisms are becoming increasingly popular in the field of crack detection. The SE network compresses the feature graph from the space dimension for the first time, and performs linear transformation on the obtained feature vector twice to obtain a feature weight vector, wherein the feature weight vector represents the importance degree of each channel of the feature, and the importance degree is the lowest when the numerical value is larger. Many variant attention mechanisms such as CBAM and BAM, etc., that combine channel and spatial attention have subsequently emerged. These attention mechanisms are also applied to the fracture detection network, so as to further improve the performance of the fracture detection network without causing too large calculation amount.

Although many excellent fracture segmentation algorithms are mentioned above, these algorithms still suffer from a number of deficiencies. For example, when it is desired to design a new network with high efficiency, a combination of factors such as efficiency, accuracy and applicability is required. The self-designed fracture segmentation algorithm is usually not pre-trained by ImageNet, and the effect is slightly worse. Once the network is deep and the number of channels increases, the speed of the network decreases, such as deep crack, and supervised training of deep networks is an excellent idea, but it is very slow when the network is trained and inferred.

In addition, the attention added to the previously designed network is only to simply combine the existing networks or modules, but does not consider applicability to the crack task, such as background interference, light interference or noise interference existing in the crack segmentation. If the designed model is simply combined with structures and modules of other people, the accuracy is not high due to the fact that the model is not suitable for a crack task, and the robustness and the generalization of the model are not high due to the fact that the surface characteristics of the civil structure are diverse.

Based on this, it is necessary to design an image crack detection method based on grouping normalization and shifting.

Disclosure of Invention

Technical problem (I)

Based on the technical defects, the invention provides an image crack detection method based on grouping normalization and shifting, which can not only cope with the interference of various complex conditions in natural scenes, but also better and finely detect fine cracks, and has high robustness and generalization of the model.

(II) technical scheme

The invention provides an image crack detection method based on grouping normalization and shifting, which improves the original SegNet network and specifically comprises the following steps:

step 1: for crack information in an image, convolution is adopted to extract features, an encoder in a network is divided into five parts, feature output of each convolution layer in each stage is used for splicing, and then dimension reduction is carried out through convolution to serve as output of the whole stage; the down-sampling adopts maximum pooling, index information of the down-sampled pixels is stored during the maximum pooling, the stored index of the encoder stage is used at the decoder stage, and the pixels are restored to the position indicated by the index during the up-sampling decoding; a residual structure is adopted at an encoder stage, and a residual structure is not adopted at a decoder stage;

and 2, step: divide the C-dimensional channels into N groups as { g } ₁ ，g ₂ ，…g _N Each group of channels is

Function(s)

Represents rounding down on x; each group calculates the mean value μ for the weights in the C' channels _i Sum variance σ _i ，x _j Represents the jth channel within feature map x:

for each group g _i Their group normalization operation GN _i (t) is shown in formula (2), and all weight vectors are normalized by taking a group as a unit; weight on each channel in a groupAfter normalization, a linear transformation is performed again, with the coefficient of each channel being γ _t And beta _t Wherein the subscript t e {1,2,3.. C' }, the function GN _i (t) represents a vector g for the ith group _i And (3) performing grouping planning operation:

and step 3: the packet normalized attention module is defined as equation (3), where x represents the input feature map, and the functions GAP () and GMP () represent global average pooling and global maximum pooling, respectively, for compressing the feature into a vector for spatial dimension, a "-" represents a channel-by-channel multiplication operation:

f(x)＝Sigmoid(γ)⊙(GN(GAP(x))+GN(GMP(x)))⊙x (9)

wherein f (x) is a characteristic function obtained by weighting attention, and gamma is a channel coefficient gamma _t Sigmoid () is a Sigmoid growth curve function.

Preferably, the method also comprises the following steps 4-6:

and 4, step 4: dividing the characteristic diagram into S groups according to channels, and taking S as an odd number; each set of feature maps is shifted by d pixels along the x-axis, respectively, where d e [ -pad, pad],

After the movement is finished, performing point convolution to map the features of which the spatial dimensions are not aligned to the space of the same dimension, and performing reverse movement operation on the feature graph after the point convolution is finished to align the feature graph so as to avoid misleading the model;

and 5: sensing crack features at longer distances by using depth convolution, and extracting the features of fine cracks;

step 6: and (4) repeating the operation of the step (4) along the y-axis direction, firstly grouping channels, then grouping feature maps, moving d pixels along the y-axis, performing point convolution mapping to a space with the same dimension, and finally recovering the feature maps by an inverse moving operation.

Preferably, the method also comprises the following step 7:

and 7: in the decoder, each layer uses a bilinear interpolation up-sampling predicted feature map, the up-sampled feature map is spliced with a feature map with the same size output by an encoder stage, the up-sampling uses the maximum inverse pooling operation, and the down-sampling index stored in the encoder stage is reused, so that the feature is effectively restored to a specific position, and finally a single-channel probability map is output.

Preferably, the method further comprises the following step 8:

and step 8: the loss calculation between the prediction probability map y and the truth mask y is shown in formulas (4) to (6), and the model finally outputs a single-channel probability map:

Loss _total (y,y)＝ Loss _BCE (y,y)+Loss _Dice (y,y) (12)。

preferably, step 1 further comprises:

in an encoder, a basic block of each stage is a coding block-k, the coding block-k is a structure formed by K basic blocks of convolution-group standardization-CGG of a GELU activation function, a coding block-2 represents a module formed by connecting 2 CGG blocks in series, a coding block-3 is formed by connecting 3 CGG blocks, and the encoder is formed by connecting 2 coding blocks-2 and 3 coding blocks-3 in series; before the CGG block, a CBG is adopted for channel number doubling, the CBG is composed of a base block convolution-batch standardization-GELU activation function, and after the feature extraction of the CGG block is completed, the CBG block is used for reducing the feature dimension to the required dimension.

Preferably, step 1 further comprises:

in a decoder, a basic block of a decoder stage is a decoding block-k, a core extraction part is also composed of CGG blocks, a decoding block-2 and a decoding block-3 are respectively composed of 2 CGG blocks and 3 CGG blocks, and the decoder is formed by connecting 2 decoding blocks-2 and 3 decoding blocks-3 in series; firstly, performing inverse maximum pooling operation in a decoding block-k, and restoring the position of a pixel in upsampling by using index information stored in an encoder stage; similarly, after the inverse maximum pooling operation, the CBG is adopted to reduce the channel number of the feature map, the features are extracted through the CGG, the feature dimension is adjusted to the dimension required by the invention by using a CBG block during output, and the dimension is different from a coding block-k block, namely, the decoding block-k block is not connected by adopting a residual error.

In another aspect, the present invention also discloses an image crack detection system based on group normalization and shift, including:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein:

the memory stores program instructions executable by the processor to invoke the program instructions capable of performing the group normalization and shift-based image crack detection method of any of the above.

In another aspect, the present invention also discloses a non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform the group normalization and shift-based image crack detection method according to any one of the above.

(III) advantageous effects

Compared with the prior art, the image crack detection method based on grouping normalization and shifting has the following advantages:

(1) Compared with an original SegNet network model, the invention provides an image crack detection method based on grouping normalization attention of an improved SegNet semantic segmentation model, which has the following advantages: firstly, in order to deal with the interference of various complex conditions in a natural scene, the invention provides an attention mechanism based on grouping normalization, wherein the characteristics are compressed into vectors from spatial dimensions by the grouping normalization, and the normalization is carried out according to channel grouping; and the learnable parameters which change linearly in the grouping normalization are used for weighting from the channel level, so that the effects of inhibiting meaningless features and highlighting meaningful features are achieved. Secondly, in order to better detect the fine cracks, the invention provides a 'filling-moving-shearing' convolution module, all feature maps are divided into channels and are moved by different pixel values along the horizontal direction and the vertical direction, information of each part of the feature maps can focus on each other after the pixels are moved, and the model can acquire crack information at a longer distance during learning.

(2) In addition, the invention provides an attention method based on grouping normalization and a 'filling-shifting-shearing' convolution, wherein the two methods can supplement each other, the original SE module is improved by using the grouping normalization attention method GNAM, so that the characteristics are suitable for the crack characteristics, the characteristics are compressed from the space dimension through maximum pooling and average pooling to be a vector with only the channel dimension, then the vector performs grouping normalization operation in the channel dimension, in order to compensate information loss in the characteristic compression and normalization processes, linear change is performed once after grouping normalization is performed, learnable parameters with linear change and the normalized characteristic vector are subjected to channel-by-channel multiplication to be used as weight vectors of an attention mechanism, and the weight vectors act on an original characteristic diagram. In consideration of the problem of small receptive field of a compensation model under the condition of limited depth, based on the designed 'filling-shifting-shearing' convolution, information of each position in the feature map can be noticed mutually through filling and shifting the feature map, so that the receptive field of the model can be greatly improved, and the problem that the expansion convolution loses tiny cracks due to large expansion rate can be solved. In addition, the slightly modified Dice loss is carried out by using the BCE loss based on the loss function, so that the foreground object can be better mined. The method has good effect on the data sets of CrackForestDataset, deepCrack, crackTree260 and CrackLS315, particularly, the method obtains the best performance on each index on the data set of CrackTree260, and the method proves the excellent performance of the method for detecting the image crack.

Drawings

FIG. 1 is an overview of the encoder-decoder model in the image crack detection method of the present invention;

FIG. 2 is a flow diagram of an attention module based on group normalization;

FIG. 3 is a flow chart of a "fill-shift-clip" convolution;

FIG. 4 is a flow chart of feature variations in a "fill-shift-shear" convolution;

FIG. 5 is a graph comparing the performance of the process of the present invention with other processes on CrackForestDataset;

FIG. 6 is a comparison of the performance of the method of the present invention compared to other methods on a CrackTree 260;

FIG. 7 is a graph comparing the performance of the method of the present invention with other methods on CrackLS 315;

FIG. 8 is a graph comparing the performance of the process of the present invention with other processes on deep crack;

FIG. 9 is a graph comparing the predicted effect of the method of the present invention on a CrackForestDataset data set with other models; the method comprises the steps of sequentially obtaining an RAW original image, a GT truth mask image, an effect image predicted by an HED model, an effect image predicted by an RCF model, an effect image predicted by a SegNet model, an effect image predicted by a DeepCrack model and an effect image predicted by the method from top to bottom.

Fig. 10 is a comparison graph of the predicted effect of the crackle 260 data set and other models according to the method of the present invention, and includes RAW original graph, GT truth mask graph, effect graph predicted by HED model, effect graph predicted by RCF model, effect graph predicted by SegNet model, effect graph predicted by deep crack model, and effect graph predicted by the present invention from top to bottom.

Fig. 11 is a comparison graph of the predicted effect of the method of the present invention and other methods on the crackles 315 data set with other models, and sequentially shows, from top to bottom, the RAW original image, the GT true value mask image, the effect graph predicted by the HED model, the effect graph predicted by the RCF model, the effect graph predicted by the SegNet model, the effect graph predicted by the DeepCrack model, and the effect graph predicted by the present invention.

Fig. 12 is a comparison graph of the predicted effect of the method of the present invention and other methods on the DeepCrack data set with the predicted effect of other models, and the RAW original graph, the GT truth mask graph, the effect graph predicted by the HED model, the effect graph predicted by the RCF model, the effect graph predicted by the SegNet model, the effect graph predicted by the DeepCrack model, and the effect graph predicted by the present invention are sequentially from top to bottom.

FIG. 13 (a) is a graph of the P-R curve on CrackForestDataset for this and other methods; (b) The P-R plot of the present method versus other methods on the CrackTree260 is shown; (c) The P-R curve of the method and other methods on CrackLS315 is shown; (d) The P-R curve of the method and other methods on deep crack is shown; .

Detailed Description

The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

In order to solve the problems of poor interference resistance, insufficient crack detection precision, low model robustness and generalization and the like, the network structure of the invention is improved on the SegNet network in the prior art, and as shown in fig. 1, the encoder-decoder model in the new SegNet network of the invention specifically comprises the following improvements:

(a) In the encoder, the basic block of each stage is "coding block-k", which is the structure of CGG consisting of k basic block convolution (Conv) -Group Normalization (GN) -GELU activation functions, "coding block-2" represents a module formed by connecting 2 CGG blocks in series, and coding block-3 consists of 3 CGG blocks. Before the CGG block, a CBG is adopted for channel number doubling, and the CBG consists of a basic block convolution (Conv) -batch standardization (BN) -GELU activation function, because as the network deepens, the invention needs to obtain richer semantic information by increasing the number of channels. After the feature extraction of the CGG blocks is completed, the feature dimensionality is reduced to the dimensionality required by the invention by using the CBG blocks, the resolution is reduced by using the maximum pooling step length of 2, the pixel position index information in the down-sampling process is reserved, and the pixel information is restored to the index position in the up-sampling process.

(b) In the decoder, the invention adopts a structure similar to a coding block-k, the basic block of the decoder stage is a decoding block-k, the core extraction part is also composed of CGG blocks, and a decoding block-2 and a decoding block-3 are respectively composed of 2 CGG blocks and 3 CGG blocks. In "decoding block-k" the invention first performs an inverse max pooling operation, using the index information stored at the encoder stage to restore the position of the pixel at the time of upsampling. And after the inverse maximum pooling operation, the invention reduces the channel number of the feature map by adopting CBG and extracts the features by CGG. And when outputting, the CBG block is used for adjusting the characteristic dimension to the dimension required by the invention. Unlike the "coding block-k" block, the "decoding block-k" block does not use residual concatenation, because in the decoder stage, the element-by-element addition of feature maps at different stages introduces noise, which degrades the performance of the model.

Therefore, the network result of fig. 1 designs a special embedded module aiming at the problems of crack characteristics, interference prevention and the like, and the network is divided into an encoder, a decoder and a residual connecting part. In the encoder part, the invention adopts a pyramid structure similar to a residual error network to extract the characteristics, and the whole network realizes five times of down sampling. Each stage feature fuses the output of all the convolution layers in the stage, so that the most abundant semantic information can be obtained in the last layer, and the lost semantic information during maximum pooling downsampling can be compensated to the maximum extent by adopting a residual error structure in the encoder stage. The residual structure is not used at the decoder stage because it introduces some noise, resulting in more noisy output mask patterns. In order to eliminate various noise interference conditions which can exist in noise and crack images introduced by using a residual structure in an encoder stage, the invention also provides an attention module based on grouping normalization. In order to compensate for information loss in the normalization and space compression processes, linear change is performed on the normalized vector, the finally obtained vector is the attention weight, and the weight represents the importance of information of each channel of the characteristic diagram. In addition, in the end stage of extracting features by an encoder, in order to further enlarge the receptive field and enhance the capturing capability of fine cracks, the invention also provides convolution based on filling-shifting-shearing. Firstly, filling certain pixels around the feature maps, and then enabling the information of the various positions of the feature maps to be mutually noticed by performing shifting operations in the vertical direction and the horizontal direction on the feature maps of different channels.

In another embodiment, the present invention further provides an image crack detection method based on group normalization and shift, which improves the existing SegNet network, and specifically includes the following steps:

feature extraction downsampling

Step 1: for crack information in an image, convolution is adopted to extract features, an encoder in a network is divided into five parts, feature output of each convolution layer in each stage is used for splicing, and then dimension reduction is carried out through convolution to serve as output of the whole stage, so that richer semantic information is carried when final encoding is completed; the down-sampling adopts maximum pooling, index information of the down-sampled pixels is stored during the maximum pooling, the stored index of the encoder stage is used at the decoder stage, the pixels are restored to the position indicated by the index during the up-sampling decoding, and more semantic information is reserved; furthermore, the residual structure is employed at the encoder stage, whereas it is not employed at the decoder stage.

Specifically, in step 1, the network structure of the present invention may preferably be the network structure specifically shown in fig. 1, and may also be other modified SegNet networks that only use a residual structure at the encoder stage. Because each stage feature in the network is fused with the output of all the convolution layers at the stage, the most abundant semantic information can be obtained at the last layer, a residual error structure is adopted at an encoder stage, the semantic information lost during maximum pooling down-sampling can be made up to the greatest extent, and the residual error structure is not adopted at a decoder stage, and because some noise is introduced into the residual error structure, the output mask map has more noise points.

Attention calculation based on grouping normalization

Step 2: divide the C dimension channels into N groups denoted as { g ₁ ，g ₂ ，…g _N N =32 may be set as a default value, and the channels of each group are

Function->

Represents rounding down on x; the mean value μ is calculated for the weights in C' channels for each group _i Sum variance σ _i ，x _j Represents the jth channel within the feature map x:

for each group g _i Their group normalization operation GN _i (t) as shown in formula (2), normalizing all weight vectors by taking a group as a unit; after the weights on each channel are normalized within the group, to compensate for possible information loss during the network learning process, a linear transformation is performed again, with the coefficient of each channel being γ _t And beta _t Wherein the subscript t e {1,2,3.. C' }, the function GN _i (t) represents a vector g for the ith group _i And (3) performing grouping planning operation:

and step 3: the packet normalized attention module is defined as equation (3), wherein x represents the input feature map, and the functions GAP () and GMP () represent the global average pooling and the global maximum pooling, respectively, for compressing the feature into one vector for the spatial dimension, a one-channel multiplication operation for multiplying the calculated weight by the corresponding feature map and feature vector,

f(x)＝Sigmoid(γ)⊙(GN(GAP(x))+GN(GMP(x)))⊙x (15)

wherein f (x) is a characteristic function obtained by weighting attention, and gamma is a channel coefficient gamma _t The Sigmoid () is an S-type growth curve function, and in the information science, due to the properties of single increment and single increment of an anti-function, the Sigmoid function is often used as an activation function of a neural network, and a variable is mapped between 0 and 1.

As shown in fig. 2, as can be seen from the above attention mechanism based on channel grouping normalization in steps 1 to 3 for enhancing fracture features, each layer of the symmetric U-type network normalizes features in the whole network architecture by using a grouping normalization method, and new grouping normalization attention is designed by using learnable parameters of grouping normalization to better adapt to the whole network architecture. Grouping normalization compresses the features from the space dimension into vectors, and performs normalization according to channel grouping; after the grouping normalization operation is completed, linear change is used for compensating loss in the normalization and compression processes, and learnable parameters of the linear change are used for weighting the original feature map from the channel layer, so that the effect of restraining meaningless features and highlighting meaningful features is achieved. Therefore, various noise interference conditions which are introduced by using a residual structure in the encoder stage and can exist in the crack image and the crack detection anti-interference performance are enhanced.

"fill-shift-shear" convolution

And 4, step 4: firstly, dividing the characteristic diagram into S groups according to channels, and taking S as an odd number (S =5 can be preferably set) in consideration of symmetry during translation; each set of feature maps is shifted by d pixels along the x-axis, respectively, where d e [ -pad, pad],

After the movement is finished, because the feature maps are not aligned on the space, the feature maps with unaligned space dimensionality are mapped into the space with the same dimensionality by performing point convolution, and after the point convolution is finished, the feature maps are subjected to reverse movement operation to be aligned, so that misleading of the model is avoided.

And 5: depth convolution is used to sense longer distance crack features and extract fine crack features.

Step 6: and (4) repeating the operation of the step 4 along the y-axis direction, firstly grouping the channels, then grouping the feature maps, moving d pixels along the y-axis, performing point convolution mapping to a space with the same dimension, and finally restoring the feature maps by an inverse moving operation.

As shown in fig. 3-4, through the resource-friendly "filling-shifting-shearing" convolution operation of the above steps 4-6, in semantic segmentation, a simple and effective expansion convolution is often adopted to increase a receptive field, however, for a relatively thin crack feature, when the expansion rate is large, an interval between every learnable parameters in a convolution kernel is large, and the neighborhood information of a pixel point around a certain specific pixel point is lost, resulting in the loss of the fine pixel information. The "fill-shift-shear" convolution of the present invention allows the pixel information at different locations in the feature map to focus on each other to obtain a larger receptive field by shifting the feature map subchannels by different pixel values in the horizontal and vertical directions. The filling-shifting-shearing convolution operation does not lose information of fine cracks like expansion convolution, and due to the fact that pixels on all positions of each channel of the shifted feature diagram are not aligned, the convolution can extract cracks which need to be paid attention to by a model in the unaligned feature diagram, robustness of the model is improved, the problem that fine cracks are lost due to large expansion rate of the expansion convolution is solved, and detection accuracy of the cracks is improved.

Decoder side predicted image mask

And 7: in the decoder, each layer uses a bilinear interpolation up-sampling predicted feature map, the up-sampled feature map is spliced with a feature map with the same size output by an encoder stage, the up-sampling uses the maximum inverse pooling operation, and the down-sampling index stored in the encoder stage is reused, so that the feature can be effectively restored to a specific position, and finally a single-channel probability map is output.

Five, calculation of loss function

And 8: and (4) calculating the loss between the predicted probability graph y and the truth mask y as shown in formulas (4) to (6), and finally outputting a single-channel probability graph by the model.

Loss _total (y,y)＝ Loss _BCE (y,y)+Loss _Dice (y,y) (18)

Through the step 7, the BCE WithLogs loss and the Dice loss are respectively and correspondingly calculated through the formulas (4) to (5), the BCE calculates the loss between the output probability result and the binary mask, the guide model learns the semantic characteristics of the cracks, the Dice guide model learns the edge characteristics of the cracks more finely, the Dice loss only calculates the loss of the foreground generally, but the loss of the foreground and the background is extremely unbalanced due to the crack characteristics, in order to solve the problem, the loss of the Dice model is slightly modified, a penalty term for the background is added, and meanwhile, the model is trained and learned from the foreground and the background constraint model, so that the problem that the distribution of the model to the foreground and the background is uneven can be relieved, and the crack type probability of the part can be calculated more accurately through the calculation of the formula (6) which comprehensively considers the BCE WithLogs loss and the Dice loss.

It should be noted that step 8 is obviously not a necessary step in the method of the present invention, and is only for making the trained network more robust.

In addition, in order to verify the performances of the packet normalization attention-based method and the 'filling-shifting-shearing' convolution crack detection network, the method provided by the invention is verified on four public data sets of CrackForest, deep Crack, crackTree260 and CrackLS315, wherein FIG. 5 shows the performance comparison of the method provided by the invention and other methods on CrackForest; FIG. 6 shows a comparison of the performance of the method of the present invention with other methods on the CrackTree260 dataset; FIG. 7 shows the performance of the method of the invention compared to other methods on CrackLS 315; figure 8 shows the performance of the process of the invention compared to other processes on deep crack. FIG. 9 illustrates a comparison of segmentation results on CFD data sets using different methods; FIG. 10 shows a comparison of segmentation results using different methods on the CrackTree260 dataset; FIG. 11 shows a comparison of segmentation results on a CrackLS315 data set applying different methods; fig. 12 shows a comparison of segmentation results on the DeepCrack dataset using different methods. FIG. 13 shows the effect of the P-R curve on four data sets of the method of the present invention along with other methods.

Therefore, the detection method and the network belong to a new U-shaped crack segmentation network, and the improved SegNet network obtains better Precision, recall, F1-score, mIoU and IoU results on the public data sets of CrackTree260, deep crack, CFD and CrackLS 315. It is worth mentioning that the model of the invention achieved 1.27% and 0.94% higher mean intersection (mIoU) and F1-score, respectively, on the CrackTree260 dataset than the DeepCrack model.

The above-described image crack detection method based on group normalization and shifting of the present invention may be converted into software program instructions, either implemented by running a software analysis system comprising a processor and a memory, or implemented by computer instructions stored in a non-transitory computer-readable storage medium.

Finally, the method of the present invention is only a preferred embodiment and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An image crack detection method based on grouping normalization and shifting is characterized in that the method improves the original SegNet network, and specifically comprises the following steps:

Function->

for each group g _i Their group normalization operation GN _i (t) is shown in formula (2), and all weight vectors are normalized by taking a group as a unit; after the weight on each channel is normalized within the group, a linear transformation is performed again, with the coefficient of each channel being γ _t And beta _t Wherein the subscript t e {1,2,3.. C' }, the function GN _i (t) represents a vector g for the ith group _i Performing grouping planning operation:

f(x)＝Sigmoid(γ)⊙(GN(GAP(x))+GN(GMP(x)))⊙x (3)

wherein f (x) is a characteristic function obtained by weighting attention, and gamma is a channel coefficient gamma _t Sigmoid () is an S-type growth curve function.

2. The image crack detection method based on grouping normalization and shift according to claim 1, characterized by further comprising the following steps 4-6:

After moving, performing point convolution to map the features with unaligned spatial dimensions into a space with the same dimensions, and performing reverse moving operation on the feature graph after the point convolution is performed to align the feature graph so as to avoid misleading the model;

and 6: and (4) repeating the operation of the step (4) along the y-axis direction, firstly grouping channels, then grouping feature maps, moving d pixels along the y-axis, performing point convolution mapping to a space with the same dimension, and finally recovering the feature maps by an inverse moving operation.

3. The image crack detection method based on grouping normalization and shift according to claim 2, characterized by further comprising the following steps 7:

and 7: in the decoder, each layer uses a bilinear interpolation up-sampling predicted characteristic diagram, the up-sampled characteristic diagram is spliced with a characteristic diagram with the same size output in the encoder stage, the up-sampling uses the maximum anti-pooling operation, and the down-sampling index stored in the encoder stage is reused, so that the characteristic is effectively restored to a specific position, and finally a single-channel probability diagram is output.

4. The image crack detection method based on grouping normalization and shift according to claim 3, characterized by further comprising the following steps 8:

and 8: calculating the loss between the predicted probability graph y and the truth mask y as shown in formulas (4) to (6), and finally outputting a single-channel probability graph by the model;

Loss _total (y,y)＝ Loss _BCE (y,y)+Loss _Dice (y,y) (6)。

5. the image crack detection method based on grouping normalization and shift according to claim 4, further comprising in step 1:

in the encoder, a basic block of each stage is a coding block-k, the coding block-k is a structure consisting of k basic blocks of convolution-group standardization-CGG of a GELU activation function, a coding block-2 represents a module formed by connecting 2 CGG blocks in series, a coding block-3 consists of 3 CGG blocks, and the encoder is formed by connecting 2 coding blocks-2 and 3 coding blocks-3 in series; before the CGG block, a CBG is adopted for channel number doubling, the CBG is composed of a base block convolution-batch standardization-GELU activation function, and after the feature extraction of the CGG block is completed, the CBG block is used for reducing the feature dimension to the required dimension.

6. The image crack detection method based on grouping normalization and shift according to claim 5, further comprising in step 1:

in a decoder, a basic block of a decoder stage is a decoding block-k, a core extraction part is also composed of CGG blocks, a decoding block-2 and a decoding block-3 are respectively composed of 2 CGG blocks and 3 CGG blocks, and the decoder is formed by connecting 2 decoding blocks-2 and 3 decoding blocks-3 in series; firstly, performing inverse maximum pooling operation in a decoding block-k, and restoring the position of a pixel during upsampling by using index information stored in an encoder stage; similarly, after the inverse maximum pooling operation, the CBG is adopted to reduce the channel number of the feature map, the features are extracted through the CGG, the feature dimension is adjusted to the dimension required by the invention by using a CBG block during output, and the dimension is different from a coding block-k block, namely, the decoding block-k block is not connected by adopting a residual error.

7. An image crack detection system based on group normalization and shift, comprising:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein:

the memory stores program instructions executable by the processor, the processor invoking the program instructions capable of performing the group normalization and shift-based image crack detection method of any of claims 1-6.

8. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the group normalization and shift-based image crack detection method according to any one of claims 1 to 6.