CN116503366A

CN116503366A - Concrete crack detection method and system based on dynamic coordinate convolution

Info

Publication number: CN116503366A
Application number: CN202310498526.6A
Authority: CN
Inventors: 李杨; 张诗豪
Original assignee: Institute of Automation Shandong Academy of Sciences
Current assignee: Institute of Automation Shandong Academy of Sciences
Priority date: 2023-04-27
Filing date: 2023-04-27
Publication date: 2023-07-28

Abstract

The invention provides a concrete crack detection method and system based on dynamic coordinate convolution, which relate to the technical field of target detection, and specifically comprise the following steps: acquiring a concrete surface crack image set, and preprocessing the image to obtain a data set; training the constructed crack detection model by using a data set based on a small batch random gradient descent optimization algorithm; inputting the image of the concrete surface crack to be detected into a trained crack detection model for reasoning to obtain a crack detection result; the crack detection model is based on YOLOv5, a bottleneck residual error structure based on a dynamic coordinate convolution method is added, crack characteristics are extracted, and cracks on the surface of the concrete are identified and classified; according to the invention, the dynamic coordinate convolution is fused with the YOLOv5 frame system, and the concrete crack detection precision is greatly improved under the condition that the number of model parameters is almost unchanged.

Description

Concrete crack detection method and system based on dynamic coordinate convolution

Technical Field

The invention belongs to the technical field of target detection, and particularly relates to a method and a system for detecting concrete cracks based on dynamic coordinate convolution.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Most bridges built in the world are concrete bridges, and the bridges need to be detected regularly in the service process so as to be beneficial to making corresponding maintenance countermeasures; cracks are one of the main diseases of the concrete bridge and become an important content for detecting and maintaining the concrete bridge; the traditional manual detection method has the defects of inaccuracy and low efficiency, and once a bridge crack is not detected in time, the bridge collapses, so that great property loss can be brought.

At present, the partially installed concrete crack detection system has successfully detected and repaired a plurality of bridges on site manually, so that the collapse problem caused by the cracking of bridge decks or pavements is prevented, but the existing integral detection system still has a plurality of problems in the application process:

(1) Limited crack feature identification: the existing target detection algorithm is mainly aimed at object identification and classification, and may not sufficiently capture tiny features and details of the crack, which may lead to poor detection effect and challenges to accurate identification and positioning of the crack.

(2) Insufficient adaptability to complex backgrounds: in practical applications, cracks may occur in various complex backgrounds, such as different textures, lighting conditions, etc.; the existing target detection algorithms may not adapt well to these complex backgrounds, affecting the accuracy of crack detection.

(3) Real-time performance and calculation efficiency: in crack detection, real-time and computational efficiency are critical, especially in large-scale infrastructure detection, however, many existing target detection algorithms require a lot of computational resources and time, and may not meet the real-time requirements.

(4) Robustness and generalization capability: the shape, size and direction of the crack may be greatly different, and the current target detection algorithm cannot show good robustness and generalization capability under these different conditions, so that the recognition result is limited.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a concrete crack detection method and system based on dynamic coordinate convolution, which are used for fusing the dynamic coordinate convolution with a YOLOv5 frame system and greatly improving the concrete crack detection precision under the condition that the number of model parameters is almost unchanged.

To achieve the above object, one or more embodiments of the present invention provide the following technical solutions:

the first aspect of the invention provides a concrete crack detection method based on dynamic coordinate convolution;

a concrete crack detection method based on dynamic coordinate convolution comprises the following steps:

acquiring a concrete surface crack image set, and preprocessing the image to obtain a data set;

training the constructed crack detection model by using a data set based on a small batch random gradient descent optimization algorithm;

inputting the image of the concrete surface crack to be detected into a trained crack detection model for reasoning to obtain a crack detection result;

and the crack detection model is based on YOLOv5, a bottleneck residual error structure based on a dynamic coordinate convolution method is added, crack characteristics are extracted, and concrete surface cracks are identified and classified.

Further, the preprocessing includes:

(1) Clustering the image set by using a K-means method to generate a priori frame;

(2) And carrying out data enhancement on the image set based on the principle of positive and negative sample balance.

Further, the crack detection model comprises three parts, namely a diaphysis, a neck and a head;

the backbone is used for extracting image features;

the neck is used for feature fusion;

and the head is used for decoding and outputting the predicted frame coordinate information and the classification information of the crack.

Further, the backbone consists of Conv, C3, SPPF and a bottleneck residual structure, and the specific connection sequence is as follows: conv, conv, C3, conv, C3, conv, bottleneck residual structure, SPPF.

Further, the bottleneck residual structure comprises Conv and a residual module;

and carrying out feature fusion on the input features and residual items obtained by Conv in the channel dimension through Conv and residual module processing to obtain bottleneck residual image features.

Further, the residual error module comprises Conv and a dynamic coordinate convolution module;

and generating hidden layer output through Conv, and inputting the hidden layer output into a dynamic coordinate convolution module for operation to obtain the image characteristics with coordinate information.

Further, the dynamic coordinate convolution module comprises coordinate convolution, conv and dynamic convolution which are sequentially connected;

the coordinate convolution adds two coordinate channels for the input features, and the coordinate channels are spliced with the input features through a Concat layer to extract the coordinate features;

the dynamic convolution calculates attention for the input features, dynamically integrates a plurality of parallel convolution kernels into one dynamic kernel according to the attention, and extracts dynamic features.

The second aspect of the invention provides a concrete crack detection system based on dynamic coordinate convolution.

A concrete crack detection system based on dynamic coordinate convolution comprises a data set construction module, a model training module and a crack reasoning module:

a dataset construction module configured to: acquiring a concrete surface crack image set, and preprocessing the image to obtain a data set;

a model training module configured to: training the constructed crack detection model by using a data set based on a small batch random gradient descent optimization algorithm;

a crack reasoning module configured to: inputting the image of the concrete surface crack to be detected into a trained crack detection model for reasoning to obtain a crack detection result;

A third aspect of the present invention provides a computer readable storage medium having stored thereon a program which when executed by a processor performs the steps of a method for concrete crack detection based on dynamic coordinate convolution according to the first aspect of the present invention.

A fourth aspect of the invention provides an electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the steps in a method for detecting concrete cracks based on dynamic coordinate convolution according to the first aspect of the invention when executing the program.

The one or more of the above technical solutions have the following beneficial effects:

1. by taking a deep learning algorithm as a theoretical basis, a dynamic coordinate convolution CRConv method is introduced to construct a crack detection model YOLOV5-CR, so that the detection performance under different crack shapes, sizes and directions is effectively enhanced, the recognition capability of the model on crack characteristics is enhanced, the recognition and detection precision of concrete cracks is improved, the cracks generated by bridge deck cracking at the early stage of the bridge concrete cracks can be timely detected, and the improved system can quickly respond according to detection results and has an important role in measures such as preventing bridge collapse.

2. A crack detection model suitable for the scene is constructed, and a set of perfect database model which is suitable for requirements is formed, so that the learning capacity of coordinate convolution and dynamic convolution is improved.

3. By taking the requirement as a guide, the network model algorithm is adopted, more accurate crack images and crack prediction types are realized by training a large number of data sets, and the concrete crack recognition precision is greatly improved; the generalization capability of the model is improved by gradually increasing the number of sample sets, and the robustness and stability of the detection algorithm are further optimized.

Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

Fig. 1 is a flow chart of a method of a first embodiment.

FIG. 2 is a diagram showing a construction of a first embodiment crack detection model YOLOV 5-CR.

Fig. 3 is a block diagram of the dynamic coordinate convolution (CRConv) of the first embodiment.

Fig. 4 is a block diagram of a first embodiment residual block (CRBottleneck).

Fig. 5 is a block diagram of a bottleneck residual structure (CRDCSP) of the first embodiment.

Fig. 6 is a system configuration diagram of the second embodiment.

Detailed Description

The invention will be further described with reference to the drawings and examples.

Example 1

The embodiment discloses a concrete crack detection method based on dynamic coordinate convolution, which realizes the process of identifying concrete surface cracks and categories, and mainly comprises the following three steps: data set construction, model training and fracture reasoning.

As shown in fig. 1, a method for detecting concrete cracks based on dynamic coordinate convolution includes:

step S1: and acquiring a concrete surface crack image set, and preprocessing the image to obtain a data set.

Firstly, in order to ensure that the shape, illumination, size and definition of a crack image are diversified, a large number of concrete surface crack image sets are collected, and then the image sets are preprocessed through a k-means algorithm and a positive and negative sample balance principle, specifically:

(1) The image sets are clustered using the K-means method, generating a priori frame.

In the target detection network (fast RCNN, SSD, YOLO v2& v3, YOLO v5, etc.), a priori frame, which is a frame with different size and different aspect ratio preset on the image in advance, is used.

The prior frame of the model in this embodiment is obtained by clustering using the K-means method instead of manually designing, so that prior frames suitable for the concrete surface crack image set are obtained for the model by using the K-means method before the subsequent work is carried out.

In this embodiment, 3 prior frame sizes are clustered on the image set, 80×80×128, 40×40×256, and 20×20×512, respectively.

(2) And carrying out data enhancement on the concrete surface crack image set based on the positive and negative sample balance principle.

The original concrete surface crack image set has the problem of sample unbalance, namely uneven distribution of positive samples and negative samples or a great number of differences between the positive samples and the negative samples, and can influence the training effect of a model, so the embodiment is based on the principle of balancing the positive samples and the negative samples, carries out data enhancement on the image set to adjust the distribution proportion of the positive samples and the negative samples, and currently has three main methods:

(1) Adjusting the value of θ

And adjusting the theta value according to the positive and negative sample proportion of the image set.

(2) Oversampling

The class (minority class) with a smaller number of samples in the image set is oversampled and new samples are synthesized to alleviate class imbalance, for example, the classical oversampling algorithm SMOTE.

(3) Undersampling

Undersampling the class (majority class) with a large number of samples in the image set, and discarding some samples to relieve class imbalance.

Step S2: based on a small batch random gradient descent optimization algorithm, training the constructed crack detection model by using a data set.

The following is a description from two viewpoints of construction and training, respectively:

step S201 builds a crack detection model

The crack detection model YOLOV5-CR constructed by the invention is based on a YOLOv5 model, a dynamic coordinate convolution method and a bottleneck residual structure based on deep learning are innovatively adopted, the crack characteristics are extracted with high precision and high efficiency, and the cracks on the concrete surface are identified and classified.

Fig. 2 is a model structure diagram of a crack detection model YOLOV5-CR, as shown in fig. 2, where YOLOV5-CR is divided into three parts, namely a backbone (backbone), a neck (neg), and a head (head), the backbone is responsible for extracting image information features, the neck is responsible for feature fusion, the head is responsible for decoding and outputting prediction frame coordinate information and classification information of a crack, and the size of a prediction frame corresponds to three prior frame sizes obtained by clustering in step S1, which is abbreviated as: large, medium and small crack prediction frames.

The crack detection model YOLOV5-CR is improved on the basis of the original YOLOV5 model, specifically, in the backbone (backbone) of the original YOLOV5 model, C3 of the sixth layer and the eighth layer are replaced by a bottleneck residual structure (CRDCSP) newly proposed by the invention, that is, the backbone of the crack detection model YOLOV5-CR comprises Conv, conv, C, conv, C3, conv, bottleneck residual structure (CRDCSP) and SPPF which are connected in sequence.

(1) Conv consists of convolution, batch normalization and a SiLu activation layer, batch normalization, which is a Sigmoid weighted linear combination, also known as a swish function, has the effect of preventing overfitting and accelerating convergence.

(2) SPPF is improved from SPP, which reduces the input channel by half through a standard convolution, and then makes maximum pooling with convolution kernel sizes of 5,9, 13; and concatemers are carried out on the result of the three maximum pooling and the data which are not subjected to pooling operation, and the channel number after final combination is 2 times of the original number.

The SPPF (Spatial Pyramid Pooling-Fast) uses 3 5×5 max pooling to replace the original 5×5,9×9 and 13×13 max pooling, and a plurality of small-size pooling cores are cascaded to replace a single large-size pooling core in the SPP module, so that the running speed is further improved under the condition of retaining the original functions, namely fusing the characteristic diagrams of different receptive fields and enriching the expression capability of the characteristic diagrams.

(3) C3 is CSP architecture, comprising 3 standard convolution layers, and learns residual characteristics, and the structure is divided into two branches: one uses a plurality of Bottleneck stacks, the other passes through only one basic convolution module, and finally the two branches are spliced by performing a concat operation.

(4) Bottleneck residual structure (CRDCSP), based on the structure of C3, makes the following improvements:

the Bottleneck in C3 is replaced with a designed residual module (CRBottleneck) according to the invention, wherein the residual module (CRBottleneck) is built on the basis of dynamic coordinate convolution (CRConv).

Specifically, in order to ensure that the model has high real-time performance and smaller parameters and calculation amount, and simultaneously obviously improve the recognition accuracy, a bottleneck residual structure (CRDCSP) is constructed based on the dynamic coordinate convolution (CRConv) method designed by the invention, and the description is respectively carried out according to the sequence of the dynamic coordinate convolution (CRConv), the residual module (CRBottleneck) and the bottleneck residual structure (CRDCSP).

Dynamic coordinate convolution (CRConv)

Combining the coordinate convolution and the dynamic convolution to obtain a dynamic coordinate convolution (CRConv), fig. 3 is a structural diagram of the dynamic coordinate convolution (CRConv), and as shown in fig. 3, the previously extracted image feature (i.e. feature map) x is input into the dynamic coordinate convolution (CRConv) to obtain a feature y with a dynamic weight coordinate, which specifically includes the following steps:

first, the feature map is input into a coordinate convolution to extract a coordinate feature.

Specifically, two coordinate channels are added for the feature map, the x coordinate and the y coordinate are respectively represented, the feature map is spliced with the initial feature map through a Concat layer, at the moment, the feature map has translation dependency, and the translation dependency is represented by a formula:

y ₁ ←σ(BN(w ₁ (x)+w _x +w _y ))

wherein y is ₁ Representing the coordinate features, x representing the input featuresSign map, w ₁ Representing the convolution kernel, w _x 、w _y Respectively, the coordinate convolution kernels, BN for batch normalization, σ for ReLU activation function.

The channel dimension is then changed by a standard convolution of 1x1 or downsampled by a standard convolution of 3x3, outputting the coordinate features.

And finally, sending the output coordinate features into dynamic convolution to extract dynamic features.

The basic idea of dynamic convolution is to adaptively adjust the convolution parameters according to the input feature images, and to check the same operation on all the input feature images by the same convolution for the static convolution corresponding to the dynamic convolution, while the dynamic convolution can adjust different feature images and process the feature images by using more suitable convolution parameters.

In particular, dynamic convolution does not use a single convolution kernel at each layer, but rather dynamically aggregates multiple parallel convolution kernels according to attention, which dynamically adjusts the weight of each convolution kernel according to input, thereby generating an adaptive dynamic convolution; attention was calculated by ROUTE and expressed as:

y ₂ ←ROUTE(y ₁ )

y ₃ ←σ(BN(w ₃ (COMBINE(y ₂ ))))

y ₄ ←Add(y ₁ ,y ₃ )

out←σ(BN(w ₃ (y ₄ )))

wherein y is ₂ Representing multi-scale features, y ₃ Representing the convolution characteristics, y ₄ Representing dynamic features, BN representing batch normalization, σ representing a ReLU activation function, concat representing channel dimension feature fusion, ROUTE representing attention mechanism, COMBINE representing weighting operations, add representing addition operations.

Residual module (CRBottleneck)

On the basis of dynamic coordinate convolution (CRConv), a residual module (CRBottleneck) is constructed, and FIG. 4 is a structural diagram of the residual module (CRBottleneck), as shown in FIG. 4, in the operation process of the residual module, firstly, hidden layer output is generated through a layer of Conv, then, the generated hidden layer output is used as input of the dynamic coordinate convolution (CRConv) to perform operation again, so that image features with coordinate information are obtained, the detail information of a feature map is enhanced, and loss of semantic information in the forward operation process is reduced.

Bottleneck residual structure (CRDCSP)

On the basis of a residual module (CRBottleneck), a bottleneck residual structure (CRDCSP) is built, and FIG. 5 is a structural diagram of the bottleneck residual structure (CRDCSP), as shown in FIG. 5, in the bottleneck residual structure, input features are firstly processed by Conv of a main branch and the bottleneck residual structure (CRDCSP), and then are subjected to feature fusion with residual items output by a residual block on another branch in a channel dimension to obtain bottleneck residual image features; this design improves the depth and performance of the network while reducing the computational effort, expressed as:

out←w ₃ (Concat(W(w ₁ (x)),w ₂ (x)))

wherein w is ₁ ,w ₂ ,w ₃ Respectively representing three layers of Conv convolution kernels, W representing a residual module (CRBottleneck), concat representing channel dimension feature fusion, out representing bottleneck residual image features, and x representing an input feature map.

Based on the bottleneck residual image characteristics, the neck is further fused with the characteristics, the head automatically marks cracks in the image based on the fused characteristics to obtain a crack prediction frame, and based on the crack prediction frame, the type of the crack is predicted by adopting a dense connection technology to obtain crack classification information, wherein the type of the crack comprises horizontal, vertical, inclined, crotch and rhagadia.

Step S202 training crack detection model

In the model training stage, a data set is utilized, and a small batch of random gradient descent optimization algorithm is adopted to update the gradient of the model. In order to ensure the accuracy and stability of the model, the data set can be divided into a training set and a verification set according to a preset proportion, the verification set is used for calculating the generalization error, and the weight parameter corresponding to the minimum generalization error is stored.

Step S3: inputting the image of the concrete surface crack to be detected into a trained crack detection model for reasoning to obtain a crack detection result.

The crack detection model YOLOV5-CR increases coordinate information through coordinate convolution in a dynamic coordinate convolution model, achieves more accurate target positioning, automatically generates convolution kernels according to different input features through dynamic convolution in the model, and achieves improvement of generalization capability and performance of the model; the inter-channel information association is realized through a bottleneck residual structure (CRDCSP), and the feature extraction capability of the network is improved. The dynamic coordinate convolution (CRConv) is adopted to acquire the image features with the coordinate information, so that the demand on the calculation force of the mobile equipment is effectively reduced, the problems of low efficiency and high false detection rate of manual detection in a complex environment are solved, and the accuracy of network model identification is greatly improved.

The crack detection model YOLOV5-CR has higher accuracy, lower detection cost and stronger environmental adaptability, realizes the real-time identification of cracks on the concrete surface, and provides convenience for early detection and timely repair of building structure diseases; meanwhile, the model has the capability of judging the crack type, deduces the damage degree of the crack to the building structure according to the type, strives for time for preparing repair measures in advance, greatly reduces loss and prevents accidents, and provides powerful support for guaranteeing the safety of the building structure.

Example two

The embodiment discloses a concrete crack detection system based on dynamic coordinate convolution;

as shown in fig. 6, a concrete crack detection system based on dynamic coordinate convolution includes a data set construction module, a model training module and a crack reasoning module:

Example III

An object of the present embodiment is to provide a computer-readable storage medium.

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps in a method for concrete crack detection based on dynamic coordinate convolution as described in the first embodiment of the present disclosure.

Example IV

An object of the present embodiment is to provide an electronic apparatus.

An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor implements the steps in a method for detecting concrete cracks based on dynamic coordinate convolution according to the first embodiment of the disclosure when executing the program.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The concrete crack detection method based on dynamic coordinate convolution is characterized by comprising the following steps of:

2. The method for detecting concrete cracks based on dynamic coordinate convolution according to claim 1, wherein the preprocessing comprises:

3. The method for detecting concrete cracks based on dynamic coordinate convolution according to claim 1, wherein the crack detection model comprises a backbone, a neck and a head;

the backbone is used for extracting image features;

the neck is used for feature fusion;

4. The method for detecting concrete cracks based on dynamic coordinate convolution as claimed in claim 3, wherein the backbone consists of Conv, C3, SPPF and bottleneck residual structures, and the specific connection sequence is as follows: conv, conv, C3, conv, C3, conv, bottleneck residual structure, SPPF.

5. The method for detecting concrete cracks based on dynamic coordinate convolution according to claim 1, wherein the bottleneck residual structure comprises Conv and a residual module;

6. The method for detecting concrete cracks based on dynamic coordinate convolution according to claim 5, wherein the residual error module comprises Conv and a dynamic coordinate convolution module;

7. The method for detecting concrete cracks based on dynamic coordinate convolution as defined in claim 6, wherein the dynamic coordinate convolution module comprises coordinate convolution, conv and dynamic convolution which are connected in sequence;

8. The concrete crack detection system based on dynamic coordinate convolution is characterized by comprising a data set construction module, a model training module and a crack reasoning module:

9. An electronic device, comprising:

a memory for non-transitory storage of computer readable instructions; and

a processor for executing the computer-readable instructions,

wherein the computer readable instructions, when executed by the processor, perform the method of any of the preceding claims 1-7.

10. A storage medium, characterized by non-transitory storing computer-readable instructions, wherein the instructions of the method of any one of claims 1-7 are performed when the non-transitory computer-readable instructions are executed by a computer.