CN116309536A

CN116309536A - Pavement crack detection method and storage medium

Info

Publication number: CN116309536A
Application number: CN202310441866.5A
Authority: CN
Inventors: 曹霆; 胡劲元; 李军怀; 王怀军; 王宇航; 田程; 张欣荣
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2023-04-23
Filing date: 2023-04-23
Publication date: 2023-06-23

Abstract

The invention discloses a pavement crack detection method and a storage medium, which relate to the technical field of target detection and comprise the following steps: collecting an image to be detected; inputting a plurality of images to be detected into an MSF-transducer model, and outputting a prediction frame and a category label; and detecting cracks in the image to be detected according to the prediction frame and the category label. According to the invention, the acquired two-dimensional images are input into an MSF algorithm model to obtain feature graphs fused by different sizes, so that the difficulty that local texture characteristics and global texture relations are not accurate enough when the target detection field aims at slender and tiny feature target detection is properly relieved, meanwhile, a transducer multi-layer encoder-decoder structure is adopted, and the post-processing steps of prior knowledge constraint such as Anchor and non-maximum suppression are omitted by combining with position coding, so that end-to-end target detection is realized, and a target detection algorithm is greatly simplified.

Description

Pavement crack detection method and storage medium

Technical Field

The invention relates to the technical field of target detection, in particular to a pavement crack detection method and a storage medium.

Background

The detection of road surface cracks has been one of the important research contents of road traffic safety. In recent years, research on deep learning and target detection has prompted the intelligent development of crack detection methods. The crack detection is used as the basis of road health assessment and road surface maintenance measures, becomes the research focus in the fields of roads, bridges, tunnels and the like, and has important research value and wide application prospect.

The crack detection is to mark the cracks on the pavement according to a proper detection frame, and the confidence and the corresponding category of the cracks are displayed. Crack types fall into three categories: transverse cracks, longitudinal cracks and repaired cracks, the basic cracks contain 6-dimensional features, X, Y coordinates, width and height of the test frame, confidence and test class, respectively. With the advent of large-scale data sets, the reduction of computer hardware cost and the improvement of GPU parallel computing capability, deep learning gradually takes an absolute dominant position in the field of target detection and crack detection, target detection based on a DETR network has been researched and used by a plurality of students, the model does not have prior knowledge and constraint such as Anchor and the like, and meanwhile, the post-processing step of non-maximum suppression is abandoned, and the whole network model realizes end-to-end target detection, so that a target detection algorithm is greatly simplified.

The classical DETR model mainly comprises Encoder, decoder of Backbone, transformer of CNN and four final prediction layers FFN, in general, the network adopted in the backbond part of DETR is friendly to the feature extraction and processing of large-size targets, but in actual engineering, road cracks are affected by different acquisition equipment, acquisition distances, crack sizes, noise such as illumination and shadow, if the characteristics of the road cracks are extracted by the DETR model, local texture features of the cracks are easily lost, and the training of the subsequent network model and the detection effect of the cracks are all affected to a certain extent.

Disclosure of Invention

The invention provides a pavement crack detection method and a storage medium, which utilize a MSF algorithm and a transducer model based on deep learning to carry out target detection training and detect an image to be detected, thereby greatly simplifying the target detection algorithm.

The invention provides a pavement crack detection method, which comprises the following steps:

collecting an image to be detected;

inputting the image to be detected into an MSF-converter model, and outputting an optimal prediction frame, a category label and a confidence coefficient;

detecting cracks in the image to be detected according to the optimal prediction frame, the category label and the confidence coefficient;

inputting the image to be detected into an MSF-transducer model, and outputting a prediction frame, a category label and a confidence coefficient, wherein the method comprises the following steps:

performing multi-scale feature extraction on the image to be detected based on the MSF model to obtain a fusion feature map;

constructing position codes with the same dimension according to the fusion feature map;

encoding and decoding the fusion feature map and the position code based on a transducer model to obtain a decoding result;

and predicting the decoding result based on the prediction layer FFN to obtain a prediction frame, a category label and a confidence coefficient.

Preferably, the image to be detected needs to be preprocessed before multi-scale feature extraction is performed on the image to be detected through the MSF model.

Preferably, the pretreatment process comprises the steps of:

the size of the image to be detected is processed to be 200×200DPI;

graying the size-processed image to be detected, adding Gaussian noise, and then carrying out median filtering;

and labeling the image to be detected after median filtering by using a picture labeling tool according to three categories of transverse cracks, longitudinal cracks and repaired cracks.

Preferably, the multi-scale feature extraction is performed on the image to be detected based on the MSF model to obtain a fusion feature map, which comprises the following steps:

inputting the preprocessed image to be detected into a CBL module consisting of convolution, normalization and activation functions;

sequentially inputting images to be detected passing through the CBL module into a plurality of convolution layers and residual error structures to obtain a plurality of feature images with different sizes;

upsampling the plurality of different sized feature maps into a plurality of same sized feature maps;

and stacking, fusing and corresponding convolution are carried out on the feature images with the same size, so that a fused feature image is obtained.

Preferably, the position coding is constructed by:

wherein PE represents position code, pos represents the position of the current pixel in the input feature map, d _model Representing the dimensions of the pixel, i represents a fused feature map of different positions, where even positions use sin and odd positions use cos.

Preferably, the transducer model comprises a multi-layer encoder and a multi-layer decoder, each layer encoder comprising a multi-head attention layer and a feed-forward connection layer, each layer decoder comprising a masked multi-head attention layer, a multi-head attention layer and a feed-forward connection layer.

Preferably, the decoding result is predicted based on a prediction layer FFN to obtain an optimal prediction frame, which includes the following steps:

inputting the multi-layer decoding result into a prediction layer FFN after parameter sharing to obtain a plurality of prediction frames,

constructing a prediction frame set according to the multiple prediction frames, and constructing a true value set;

and carrying out bipartite graph matching through a Hungary algorithm, matching the prediction frame set with the truth value set, and carrying out optimal selection on a plurality of prediction frames to obtain an optimal prediction frame.

Preferably, the hungarian algorithm is as follows:

in the method, in the process of the invention,

is the optimal allocation result set, σ (i) represents the index,/->

Is true value element y of road surface crack _i Paired matching costs with the index σ (i), where c (i) is the target classTag b (i) is a vector, < >>

Probability of c (i),>

is a prediction frame, y _i Represents the ith truth element,>

representing a collection element representing a prediction box, L _box Represents definition frame loss, N represents the number of prediction sets, +.>

Representing the collection element.

Preferably, the transducer model is trained by a loss function, which is as follows

The illustration is:

where y represents the true value set of the detection object,

then represent the prediction box set,/->

Is c _i Is used to determine the logarithmic probability of (1),

is Hungary loss, L _iou Is a generalized IoU loss, lambda _iou 、λ _L1 Is a super parameter.

A computer-readable storage medium storing computer instructions for causing the computer to execute the road surface crack detection method.

Compared with the prior art, the invention has the beneficial effects that:

according to the invention, the acquired two-dimensional images are input into an MSF algorithm model to obtain feature graphs fused by different sizes, so that the difficulty that local texture characteristics and global texture relations are not accurate enough when the target detection field aims at slender and tiny feature target detection is properly relieved, meanwhile, a transducer multi-layer encoder-decoder structure is adopted, and the post-processing steps of prior knowledge constraint such as Anchor and non-maximum suppression are omitted by combining with position coding, so that end-to-end target detection is realized, and a target detection algorithm is greatly simplified.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a block flow diagram of a pavement crack detection method of the present invention;

fig. 2 is a model training effect diagram of a pavement crack detection method of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, the invention provides a pavement crack detection method, which comprises the following steps:

the first step: and acquiring road surface images by using an acquisition vehicle to obtain a plurality of images to be detected.

Preprocessing a plurality of images to be detected, including the following steps:

(1) The size of the plurality of images to be detected is processed to 200×200DPI.

(2) Graying the plurality of size-processed images to be detected, adding Gaussian noise, and then carrying out median filtering;

(3) Labeling the multiple images to be detected after median filtering according to three categories of transverse cracks, longitudinal cracks and repaired cracks by using a Labelimg picture labeling tool, and randomly combining and generating according to the ratio of 0.85:0.15 of the training set and the verification set. The MSF-transducer model is trained by a training set, and verified by a verification set.

And a second step of: and inputting the images to be detected into a trained MSF-transducer model, and outputting an optimal prediction frame, a category label and a confidence coefficient. Comprising the following steps:

and carrying out Multi-scale feature extraction on the image to be detected based on an MSF (Multi-scale fusion) model to obtain a fusion feature map.

(1) The preprocessed image to be detected is input to a CBL module consisting of convolution, normalization and activation functions (Conv, batchNormalization and LeakyReLU), and then is ready to start feature extraction.

(2) And inputting the image to be detected passing through the CBL module into a plurality of convolution layers and residual structures to obtain the feature images of 1/8/, 1/16 and 1/32 of the original image, and completing the feature image extraction of the three sizes.

(3) The feature maps of different sizes are unified into feature maps of the same size by an up-sampling method similar to Feature Pyramid (FPN).

(4) And stacking, fusing and corresponding convolution are carried out on the feature images with the same size, so that a fused feature image is obtained.

And constructing position codes with the same dimension according to the fusion characteristic diagram.

Position encoding is constructed by:

And encoding and decoding the fusion feature map and the position code based on the transducer model to obtain a decoding result.

(1) And inputting the multiple fusion feature maps and the multiple position codes into a multi-layer encoder to obtain a coding result. Each layer of encoder consists of a multi-head attention layer and a feed-forward connection layer.

(2) And inputting the encoding result into a multi-layer decoder to obtain a decoding result. Each layer of decoder is composed of a multi-head attention layer, a multi-head attention layer and a feedforward connection layer. Except for the first layer decoder, the remaining decoders have the output of the above layer decoder and the output of the multi-layer encoder as inputs.

And predicting the decoding result based on the prediction layer FFN to obtain an optimal prediction frame, a category label and a confidence coefficient.

(1) The output result of each layer of decoder structure is input to the prediction layer FFN for prediction through parameter sharing, and the loss function is calculated to realize deep supervision. The FFN of the present invention is calculated from a 3-layer linear layer with a ReLU activation function and with a hidden layer so that the center coordinates, height and width can be normalized by the prediction box and the prediction class labels can be obtained using softmax function activation.

(2) A set of prediction frames of a fixed size can be obtained by the above operation, but this obviously exceeds the number of prediction frames actually required, thus allowing for an optimal selection of a considerable number of prediction frames. The invention constructs a prediction frame set according to a plurality of prediction frames and constructs a true value set. Expanding the number of Ground Truth, namely true value, to be the same as that of a prediction frame, using an additional special class label to represent background class, realizing that the predicted value and the true value become a set of the same number of elements, performing bipartite graph matching through a Hungary algorithm at the moment, and enabling the elements of the prediction set and the true set to be in one-to-one correspondence so as to minimize matching loss:

the hungarian algorithm is as follows:

in the method, in the process of the invention,

is the optimal allocation result set, σ (i) represents the index,/->

Is true value element y of road surface crack _i Paired matching costs with the index σ (i), where c (i) is the target class label, b (i) is a vector, +.>

Probability of c (i),>

is a prediction frame, y _i Represents the ith truth element,>

Representation setAnd (5) combining elements.

(3) Calculating a loss function according to the obtained group Truth, namely the corresponding relation between the true value and the predicted target frame;

where y represents the true value set of the detection object,

then represent the prediction box set,/->

Is c _i Is used to determine the logarithmic probability of (1),

is Hungary loss, L _iou Is a generalized IoU loss, lambda _iou And lambda (lambda) _L1 Is a hyper-parameter, normalized by the number of objects in the batch.

And a third step of: and judging the cracks in the image to be detected according to the optimal prediction frame, the category label and the confidence level. And selecting the cracks in the image by a prediction frame, classifying the types of the cracks by a class label, and displaying the confidence of the cracks.

Referring to fig. 2, after the model is trained by 500 epochs, it is obvious that the training set and the verification set vibrate and descend well and the difference between the two is not particularly large, which can completely prove that the task of detecting the crack of the road surface has good applicability and enough supporting force.

The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium is stored with computer executable instructions which can execute the pavement crack detection method.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The pavement crack detection method is characterized by comprising the following steps of:

collecting an image to be detected;

judging cracks in the image to be detected according to the optimal prediction frame, the class label and the confidence coefficient;

2. The method for detecting pavement cracks according to claim 1, wherein the image to be detected is preprocessed before multi-scale feature extraction is performed on the image to be detected by using an MSF model.

3. The pavement crack detection method as set forth in claim 2, wherein the pretreatment process comprises the steps of:

the size of the image to be detected is processed to be 200×200DPI;

4. The pavement crack detection method as set forth in claim 3, wherein the MSF model-based multi-scale feature extraction is performed on the image to be detected to obtain a fusion feature map, and the method comprises the following steps:

5. The pavement crack detection method as set forth in claim 4, wherein the position code is constructed by:

in the method, in the process of the invention,PE represents position encoding, pos represents the position of the current pixel in the input feature map, d _model Representing the dimensions of the pixel, i represents a fused feature map of different positions, where even positions use sin and odd positions use cos.

6. The pavement crack detection method of claim 5, wherein the fransformer model comprises a multi-layer encoder and a multi-layer decoder, each layer encoder comprising a multi-head attention layer and a feed-forward tie layer, each layer decoder comprising a masked multi-head attention layer, a multi-head attention layer and a feed-forward tie layer.

7. The method for detecting a pavement crack according to claim 6, wherein the decoding result is predicted based on a prediction layer FFN to obtain an optimal prediction frame, comprising the steps of:

8. The pavement crack detection method as set forth in claim 1, characterized in that the hungarian algorithm is as follows:

in the method, in the process of the invention,

is the optimal allocation result set, σ (i) represents the index,/->

Probability of c (i),>

is a prediction frame, y _i Represents the ith truth element,>

Representing the collection element.

9. The method of claim 8, wherein the transducer model is trained by a loss function, the loss function being as follows:

where y represents the true value set of the detection object,

then represent the prediction box set,/->

Is c _i Is used to determine the logarithmic probability of (1),

10. A computer-readable storage medium storing computer instructions for causing the computer to perform the pavement crack detection method of any one of claims 1-9.