CN114821316A

CN114821316A - Three-dimensional ground penetrating radar crack disease identification method and system

Info

Publication number: CN114821316A
Application number: CN202210436380.8A
Authority: CN
Inventors: 黄志勇; 唐嘉明; 李伟雄; 陈搏; 罗传熙; 刘嘉俊; 陈紫情; 邓芳琴
Original assignee: Guangzhou Xiaoning Institute Of Roadway Engineering Co ltd
Current assignee: Guangzhou Xiaoning Institute Of Roadway Engineering Co ltd
Priority date: 2022-04-25
Filing date: 2022-04-25
Publication date: 2022-07-29

Abstract

The invention provides a three-dimensional ground penetrating radar crack disease identification method and a system, belonging to the field of road disease detection, wherein the identification method comprises the following steps: acquiring a training sample set; the training sample set comprises a plurality of road sample images and label information of each pixel in each road sample image; the tag information includes 0 and 1; 0 indicates that the corresponding pixel is not a crack, and 1 indicates that the corresponding pixel is a crack; constructing a split Unet neural network based on a spatial attention mechanism, VGG16 and the Unet neural network; training a fracture Unet neural network according to the training sample set to obtain a fracture identification model; scanning a road to be identified through a three-dimensional ground penetrating radar to obtain an image of the road to be identified; and determining the cracks in the road to be identified according to the road image to be identified, based on the crack identification model and according to the road image to be identified. By classifying each pixel in the image, cracks in the road can be accurately identified.

Description

Three-dimensional ground penetrating radar crack disease identification method and system

Technical Field

The invention relates to the field of road disease detection, in particular to a method and a system for identifying a three-dimensional ground penetrating radar crack disease.

Background

Under the action of running load, temperature, ultraviolet rays and the like, the road pavement structure inevitably has damage such as cracks, subsidence, interlayer hollowing and the like along with the increase of operation time; the method has the advantages that the position and the type of the pavement structure disease are quickly, effectively and accurately found, the severity of the pavement structure disease is determined, the development condition of the disease is predicted, and the method has great significance for scientifically and reasonably determining the maintenance time and formulating the maintenance scheme. In various types of diseases of the pavement structure, the crack diseases are the diseases with the largest quantity, and the three-dimensional ground penetrating radar is an effective means for detecting the damage of the cracks in the pavement structure. However, in the aspect of three-dimensional ground penetrating radar crack damage image interpretation, an automatic identification means is not available at present, and manual interpretation is mainly used. The manual interpretation of radar images has more problems, such as: the professional requirement of the interpretation process is high, and enough data interpreters are lacked; the interpretation process is subjective, the same radar image is interpreted by different personnel, and the obtained interpretation results are different; the manual interpretation process is long in time consumption, huge in workload and low in interpretation efficiency. In addition, in the process of identifying the crack diseases of the three-dimensional ground penetrating radar image of the single channel, auxiliary judgment information is lacked, and misjudgment is easy to generate. And the radar data is a three-dimensional lattice which is not uniformly distributed, which easily causes the displayed image to be deformed and not to be in accordance with the actual image, thereby increasing the difficulty of identification.

Researchers have made a lot of research on automatic interpretation of radar images. Methods such as Canny edge detection, Hough transformation, SVM (Support Vector Machine), compressive sensing and convolutional neural network are provided. The Canny edge detection can realize accurate segmentation of the crack, but because the edge detection only considers the local gradient change of the image, a large number of false alarm signals exist, and the identification precision is low; hough transformation is an effective method for detecting and positioning straight lines and analyzing curves, algorithm precision is less influenced by random noise and signal discontinuity, but Hough transformation is not suitable for automatic identification of crack signals because cracks have no fixed signal morphological characteristics; the SVM needs a large number of training samples to perform image feature selection and model parameter optimization, is mainly used for region classification, and cannot realize accurate segmentation of crack disease boundaries. Classifiers based on convolutional neural networks that have emerged in recent years can accomplish the task of identifying similar features in other unlabeled images by learning the correctly labeled images. However, in the crack disease identification, the accuracy and the misjudgment rate index of the method are greatly lower than those of other target objects, and the identification result is a rectangular area, so that the crack disease is not accurately segmented.

Based on the above problems, a new crack defect identification method is needed to improve the accuracy of crack identification.

Disclosure of Invention

The invention aims to provide a three-dimensional ground penetrating radar crack disease identification method and a system, which can improve the accuracy of road crack identification.

In order to achieve the purpose, the invention provides the following scheme:

a three-dimensional ground penetrating radar crack disease identification method comprises the following steps:

acquiring a training sample set; the training sample set comprises a plurality of road sample images and label information of each pixel in each road sample image; the tag information includes 0 and 1; 0 indicates that the corresponding pixel is not a crack, and 1 indicates that the corresponding pixel is a crack;

constructing a split Unet neural network based on a spatial attention mechanism, VGG16 and the Unet neural network;

training a fracture Unet neural network according to the training sample set to obtain a fracture identification model;

scanning a road to be identified through a three-dimensional ground penetrating radar to obtain an image of the road to be identified;

and determining the cracks in the road to be identified according to the road image to be identified and the road image to be identified on the basis of the crack identification model.

Optionally, the scanning, by the three-dimensional ground penetrating radar, of the road to be identified to obtain an image of the road to be identified specifically includes:

scanning a road to be identified by a three-dimensional ground penetrating radar through a multi-channel antenna array to obtain a radar image of each channel;

for the radar image of each channel, interpolating pixel points in the radar image by a bilinear interpolation method to obtain an interpolated radar image of the corresponding channel;

and splicing the interpolation radar images of all channels to obtain a road image to be identified.

Optionally, the split uet neural network comprises an encoding subnetwork and a decoding subnetwork; the encoding sub-network is connected with the decoding sub-network;

according to the training sample set, training a fracture Unet neural network to obtain a fracture recognition model, and specifically comprising the following steps:

standardizing the road sample images in the training sample set to obtain corresponding standardized images and label information of each pixel in each standardized image;

for each standardized image, coding the standardized image through the coding sub-network to obtain a coding feature map;

decoding the coding feature map through the decoding sub-network to obtain a decoding feature map;

classifying each pixel in the decoding feature map based on a softmax activation function, and determining a prediction result of each pixel in the decoding feature map; the prediction results include 0 and 1;

and determining a loss function according to the prediction result of each pixel in the decoding characteristic diagram and the label information of each pixel in the standardized image, and performing iterative training on the coding sub-network and the decoding sub-network according to the loss function until the loss function is converged to obtain an optimal fracture Unet neural network, wherein the optimal fracture Unet neural network is a fracture identification model.

Optionally, the encoding feature map includes a first residual map, a second residual map, a third residual map, a fourth residual map, and a final encoding feature map;

the coding sub-network comprises a first coding layer, a first maximum pooling layer, a second coding layer, a second maximum pooling layer, a third coding layer, a third maximum pooling layer, a fourth coding layer, a fourth maximum pooling layer and a fifth coding layer which are connected in sequence; the first coding layer, the second coding layer, the third coding layer, the fourth coding layer and the fifth coding layer are also connected with the decoding subnetwork;

the encoding the standardized image through the encoding sub-network to obtain an encoding feature map specifically includes:

coding the standardized image through a first coding layer to obtain a first coding feature map;

performing pooling operation on the first coding feature map through a first maximum pooling layer to obtain a first pooling feature map;

performing residual compression on the first coding feature map to obtain a first residual map;

coding the first pooling characteristic diagram through a second coding layer to obtain a second coding characteristic diagram;

performing pooling operation on the second coding feature map through a second maximum pooling layer to obtain a second pooling feature map;

performing residual compression on the second coding feature map to obtain a second residual map;

coding the second pooling characteristic diagram through a third coding layer to obtain a third coding characteristic diagram;

performing pooling operation on the third coding feature map through a third maximum pooling layer to obtain a third pooled feature map;

performing residual compression on the third coding feature map to obtain a third residual map;

coding the third pooling characteristic map through a fourth coding layer to obtain a fourth coding characteristic map;

performing pooling operation on the fourth coding feature map through a fourth maximum pooling layer to obtain a fourth pooling feature map;

performing residual compression on the fourth coding feature map to obtain a fourth residual map;

and coding the fourth pooling feature map through a fifth coding layer to obtain a final coding feature map.

Optionally, the decoding sub-network includes a first up-sampling layer, a first decoding layer, a second up-sampling layer, a second decoding layer, a third up-sampling layer, a third decoding layer, a fourth up-sampling layer, and a fourth decoding layer, which are connected in sequence; the first coding layer is in jump connection with the fourth decoding layer; the second coding layer is in jump connection with the third decoding layer; the third coding layer is in jump connection with the second decoding layer; the fourth coding layer is in jump connection with the first decoding layer; the fifth coding layer is connected with the first up-sampling layer;

the decoding the encoded feature map through the decoding subnetwork to obtain a decoded feature map specifically includes:

the coding feature map is up-sampled through a first up-sampling layer to obtain a first up-sampling feature map;

splicing the first up-sampling feature map and the fourth residual map to obtain a first spliced map;

decoding the first spliced graph through a first decoding layer to obtain a first decoding graph;

the first decoding image is up-sampled through a second up-sampling layer to obtain a second up-sampling feature image;

splicing the second up-sampling feature map and the third residual map to obtain a second spliced map;

decoding the second spliced graph through a second decoding layer to obtain a second decoding graph;

the second decoding image is up-sampled through a third up-sampling layer to obtain a third up-sampling feature image;

splicing the third up-sampling feature map and the second residual map to obtain a third spliced map;

decoding the third spliced graph through a third decoding layer to obtain a third decoding graph;

the third decoding image is up-sampled through a fourth up-sampling layer to obtain a fourth up-sampling feature image;

splicing the fourth up-sampling feature map and the first residual map to obtain a fourth spliced map;

and decoding the fourth splicing image through a fourth decoding layer to obtain a decoding characteristic image.

Optionally, the first coding layer includes a first convolution-feature selection module and a first parallel multidirectional attention mechanism module connected in sequence; the first parallel multidirectional attention mechanism module is further connected with the first max-pooling layer and the decoding subnetwork;

the encoding the normalized image through the first encoding layer to obtain a first encoding feature map specifically includes:

performing feature extraction on the standardized image through a first convolution-feature selection module to obtain a first feature map;

and establishing an incidence relation for the pixels in the first feature map through a first parallel multidirectional attention mechanism module to obtain a first coding feature map.

Optionally, the first convolution-feature selection module includes four two-dimensional convolution kernels and a channel attention module; the sizes of the four two-dimensional convolution kernels are respectively 3 multiplied by 3, 2 multiplied by 2 and 2 multiplied by 2;

the extracting the features of the normalized image by the first convolution-feature selection module to obtain a first feature map specifically includes:

performing convolution operation on the standardized image through four two-dimensional convolution cores respectively to obtain four corresponding convolution characteristic graphs;

splicing the four convolution characteristic graphs to obtain a spliced convolution characteristic graph;

and weighting the spliced convolution feature map through a channel attention module, and performing feature selection to obtain a first feature map.

Optionally, the first parallel multidirectional attention mechanism module comprises a 2 × 1 convolution kernel, a 1 × 2 convolution kernel, a 2 × 2 convolution kernel, a 1 × 1 convolution kernel;

the establishing, by the first parallel multidirectional attention mechanism module, an association relationship for the pixels in the first feature map to obtain a first encoded feature map specifically includes:

checking each pixel point in the first characteristic diagram and a pixel point on the left side of the first characteristic diagram through 2 x 1 convolution to establish an association relation, and obtaining a left pixel association diagram;

checking each pixel point in the first characteristic diagram and the pixel point below the pixel point to establish an association relationship through 1 x 2 convolution to obtain a lower pixel association diagram;

establishing an association relationship between each pixel point in the first characteristic diagram and four adjacent pixel points on two diagonal lines of the first characteristic diagram through a 2 multiplied by 2 convolution kernel and a 1 multiplied by 1 convolution kernel to obtain a diagonal pixel association diagram;

connecting the left pixel correlation diagram, the lower pixel correlation diagram and the diagonal pixel correlation diagram to obtain a connection pixel correlation diagram;

and determining a first associated feature map according to the connection pixel associated map and the first feature map.

In order to achieve the above purpose, the invention also provides the following scheme:

a three-dimensional ground penetrating radar crack disease recognition system comprises:

the device comprises a sample acquisition unit, a training sample set acquisition unit and a training sample acquisition unit, wherein the sample acquisition unit is used for acquiring the training sample set; the training sample set comprises a plurality of road sample images and label information of each pixel in each road sample image; the tag information includes 0 and 1; 0 indicates that the corresponding pixel is not a crack, and 1 indicates that the corresponding pixel is a crack;

the network construction unit is used for constructing a crack Unet neural network based on a space attention mechanism, VGG16 and the Unet neural network;

the training unit is respectively connected with the sample acquisition unit and the network construction unit and used for training a fracture Unet neural network according to the training sample set to obtain a fracture identification model;

the scanning unit is used for scanning the road to be identified through the three-dimensional ground penetrating radar to obtain an image of the road to be identified;

and the recognition unit is respectively connected with the training unit and the scanning unit and is used for determining the cracks in the road to be recognized according to the road image to be recognized and the road image to be recognized on the basis of the crack recognition model.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects: based on a space attention mechanism, VGG16 and a Unet neural network, a crack Unet neural network is constructed and trained according to a training sample set to obtain a crack recognition model, and in actual application, the crack recognition model is used for accurately recognizing the crack in the road by classifying each pixel in the image according to the road image to be recognized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flow chart of a three-dimensional ground penetrating radar crack disease identification method of the invention;

FIG. 2 is a schematic structural diagram of a split Unet neural network;

FIG. 3 is a schematic diagram of a residual compression module;

FIG. 4 is a schematic diagram of the structure of a multi-way convolution-feature selection module;

FIG. 5 is a schematic diagram of a parallel multidirectional attention mechanism module;

FIG. 6 is a schematic diagram of a three-dimensional ground penetrating radar full-section coverage detection;

FIG. 7 is a diagram of bilinear interpolation;

FIG. 8(a) is a graph illustrating a loss drop curve of a training set;

FIG. 8(b) is a graph showing a validation set loss decrease curve;

FIG. 9 is a schematic block structure diagram of the three-dimensional ground penetrating radar crack damage identification system of the present invention.

Description of the symbols:

the system comprises a sample acquisition unit-1, a network construction unit-2, a training unit-3, a scanning unit-4 and an identification unit-5.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention aims to provide a three-dimensional ground penetrating radar crack disease identification method and system, a crack Unet neural network is constructed based on a space attention mechanism, VGG16 and an Unet neural network, and cracks in a road can be accurately identified by classifying each pixel in an image.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

As shown in fig. 1, the method for identifying a three-dimensional ground penetrating radar crack disease comprises the following steps:

s1: a training sample set is obtained. The training sample set comprises a plurality of road sample images and label information of each pixel in each road sample image. The tag information includes 0 and 1; 0 indicates that the corresponding pixel is not a crack, and 1 indicates that the corresponding pixel is a crack.

S2: and constructing a split Unet neural network based on a spatial attention mechanism, the VGG16 and the Unet neural network.

S3: and training the Unet neural network of the crack according to the training sample set to obtain a crack identification model.

S4: and scanning the road to be identified through the three-dimensional ground penetrating radar to obtain an image of the road to be identified.

S5: and determining the cracks in the road to be identified according to the road image to be identified and the road image to be identified on the basis of the crack identification model.

Further, the split Unet neural network comprises an encoding sub-network and a decoding sub-network; the encoding subnetwork is connected with the decoding subnetwork. Specifically, the cracked Unet neural network (cracknet) used in the invention is an Unet semantic segmentation network improved based on VGG16, and the network structure is shown in fig. 2, wherein a gray rectangle in the graph is a feature graph of each step output.

Step S3 specifically includes:

s31: and carrying out standardization processing on the road sample images in the training sample set to obtain corresponding standardized images and label information of each pixel in each standardized image. In order to enable the trained model to have better generalization capability and improve the training efficiency, the road sample images in the training sample set are subjected to standardization processing, and the images are centered through the mean value. The data after data centralization more accords with the distribution rule. Specifically, the road sample image is normalized using the following formula:

wherein, X' is a standardized image matrix, X is a road sample image matrix, mu is the mean value of the road sample image matrix, and sigma is the standard deviation of the road sample image matrix.

S32: and for each standardized image, coding the standardized image through the coding sub-network to obtain a coding feature map. Specifically, the encoding feature map includes a first residual map, a second residual map, a third residual map, a fourth residual map, and a final encoding feature map. The coding sub-network comprises a first coding layer, a first maximum pooling layer, a second coding layer, a second maximum pooling layer, a third coding layer, a third maximum pooling layer, a fourth coding layer, a fourth maximum pooling layer and a fifth coding layer which are connected in sequence; the first coding layer, the second coding layer, the third coding layer, the fourth coding layer and the fifth coding layer are further connected with the decoding subnetwork.

Step S32 specifically includes: and coding the standardized image through a first coding layer to obtain a first coding feature map. And performing pooling operation on the first coding feature map through a first maximum pooling layer to obtain a first pooling feature map. And performing residual compression on the first coding feature map to obtain a first residual map. In this embodiment, the picture size of the normalized image is 512 × 512, which facilitates down-sampling with a multiple of 2, and the size is made as close as possible to the original size, thereby preventing the loss of features due to the reduction of the picture.

And coding the first pooling characteristic diagram through a second coding layer to obtain a second coding characteristic diagram. And performing pooling operation on the second coding feature map through a second maximum pooling layer to obtain a second pooling feature map. And performing residual compression on the second coding feature map to obtain a second residual map.

And coding the second pooling characteristic diagram through a third coding layer to obtain a third coding characteristic diagram. And performing pooling operation on the third coding feature map through a third maximum pooling layer to obtain a third pooling feature map. And performing residual compression on the third coding feature map to obtain a third residual map.

And coding the third pooling feature map through a fourth coding layer to obtain a fourth coding feature map. And performing pooling operation on the fourth coding feature map through a fourth maximum pooling layer to obtain a fourth pooled feature map. And performing residual compression on the fourth coding feature map to obtain a fourth residual map.

In this embodiment, RS (Residual dequeue Block) is used to perform Residual compression on each coded feature map, so as to compress feature channels and reduce parameters of the network model. In the split Unet neural network, the decoding layer is connected with the characteristics of the corresponding coding layer for supplement, but the coding layer contains a large amount of rough information compared with the decoding layer belonging to a shallow region, and the characteristics are completely connected, so that the decoding layer needs to refine the characteristics again when supplementing information. It is therefore only necessary to supplement the decoding layer with appropriate information and it is not necessary to connect all features into the decoding layer. Therefore, the aim of reducing the parameter number is achieved by adopting the RS module to remove redundant channels, and the structure of the residual error compression module adopted by the invention is very simple, as shown in figure 3. After entering the residual compression module, the feature map is subjected to maximum pooling with the step length of 2, feature values with the maximum response are reserved, resolution is restored by using inverse pooling, and the most effective features on the space are reserved for connection. Finally, the characteristic channel is changed by using 2 × 2 ordinary convolution, and the compression rate in this embodiment is 0.25, i.e., the number of channels is compressed to 1/4.

In this embodiment, the first encoding layer includes a first convolution-feature selection module and a first parallel multi-direction attention mechanism module connected in sequence. The first parallel multi-direction attention mechanism module is also connected with the first max-pooling layer and the decoding subnetwork. Encoding the standardized image through a first encoding layer to obtain a first encoding feature map, which specifically comprises:

and performing feature extraction on the standardized image through a first convolution-feature selection module to obtain a first feature map. Specifically, the first convolution-feature selection module includes four two-dimensional convolution kernels and a SESEQUEZE-Excitation (channel attention) module. The kernel sizes of the four two-dimensional convolution kernels are 3 × 3, 2 × 2, and 2 × 2, respectively. And performing convolution operation on the standardized image through four two-dimensional convolution cores respectively to obtain four corresponding convolution characteristic graphs. And splicing the four convolution characteristic graphs to obtain a spliced convolution characteristic graph. And weighting the spliced convolution feature map through a channel attention module, and performing feature selection to obtain a first feature map.

In particular, the multipath convolution-feature selection module can provide diverse features and give higher weight to the effective features from the dimensions of the channel. The MC-FS (Multi Convolution-Feature Selection Block) module consists of parallel Convolution, dilation Convolution and channel attention modules. The structure is shown in fig. 4, where w, h, and c represent the width, height, and channel number of the input features, respectively. The input features are firstly subjected to four parallel convolutions, wherein two of the four convolutions are extended convolutions with the expansion rate of 2 multiplied by 3 and 2 multiplied by 2, so that a larger receptive field is obtained by a small amount of parameters, the chessboard effect of the extended convolutions is avoided by adding the standard convolutions of 3 multiplied by 3 and 2 multiplied by 2, and the problems of discontinuity of the extracted features of the extended convolutions and local feature loss are eliminated. Thus, for one input signature, the MC-FS module generates 4 signatures, and the number of lanes for each signature is 1/4 for the output signature, and the number of output lanes is recovered after connecting the four signatures. And finally, a channel attention module is added to weight each channel, so that feature selection is realized, and higher weight is given to effective channels. For the standardized image U e R ^H×W×C The compression operation, typically global pooling, is performed to compress the feature shapes to 1 × 1 × C, and then an activation operation, typically the use of a normal convolution or full-connected layer, is performed to learn the weights of the channels. Finally, weighting the characteristic diagram U to obtain a first characteristic diagram

Likewise, the shape of the feature is not changed before and after use of the channel attention. The combination of the dilation convolution and the standard convolution captures a larger receptive field and a small receptive field at the same time, thereby avoiding the chessboard effect and reducing the parameters at the same timeThe number of the cells. The four-path parallel convolution enables the extracted features to be combined more variously, and the extracted features can be more accurate and effective by using the feature weighting of the diversity of the channel attention module.

And establishing an incidence relation for the pixels in the first feature map through a first parallel multidirectional attention mechanism module to obtain a first coding feature map. Specifically, the first parallel multidirectional attention mechanism module includes a 2 × 1 convolution kernel, a 1 × 2 convolution kernel, a 2 × 2 convolution kernel, and a 1 × 1 convolution kernel. And (3) checking each pixel point in the first characteristic diagram and the pixel point at the left side of the first characteristic diagram through 2 multiplied by 1 convolution to establish an association relationship, so as to obtain a left pixel association diagram. And (3) establishing an association relationship between each pixel point in the first characteristic diagram and the pixel point below the pixel point through 1 x 2 convolution check to obtain a lower pixel association diagram. And establishing an association relationship between each pixel point in the first characteristic diagram and four adjacent pixel points on two diagonal lines of the first characteristic diagram by using a 2 multiplied by 2 convolution kernel and a 1 multiplied by 1 convolution kernel to obtain a diagonal pixel association diagram. And connecting the left pixel correlation diagram, the lower pixel correlation diagram and the diagonal pixel correlation diagram to obtain a connection pixel correlation diagram. And determining a first associated feature map according to the connection pixel associated map and the first feature map.

The attention mechanism is a common lightweight module for improving the network performance, and the principle is to weight a specific dimension of a feature, and the weighting is self-learned by a neural network without manual intervention. The PMDA (Parallel Multi-Directional Attention mechanism) adopted by the present invention is a spatial Attention mechanism, and the principle thereof is to decompose a 3 × 3 convolution kernel into 4 parts, and artificially establish an association relationship between a pixel point and an adjacent pixel point structurally, and the structure thereof is shown in fig. 5.

The PMDA can be divided into four paths, and when viewed from top to bottom, for an input feature a with a shape of H × W × C, the first two paths respectively use a 2 × 1 convolution and a 1 × 2 convolution to check that each pixel point in the feature a (first feature map) establishes an association relationship with the pixel point on the left and below, and the two convolutions respectively obtain features B and C. And the third and fourth paths use a 2 × 2 convolution kernel with an expansion rate of 2 and a 1 × 1 convolution kernel, and the two paths play a role in combination, so as to establish an association relationship between each pixel and four adjacent pixels on two diagonal lines of each pixel, and thus Add operation is performed on the results of the two paths to obtain a fused feature D. And then, merging the channels of the characteristic F to obtain a characteristic G by using a 2 x 2 convolution kernel on the connection result F of the characteristic B, C, D, and converting the value of the characteristic G into the weight of the (0,1) interval by using a sigmoid activation function. And finally multiplying the G by the original feature map A to complete the pixel weighting in the feature map to obtain A' (a first associated feature map). Since the number of convolution kernels used is only 5 and the convolution kernels are small, the attention module only adds few parameters to the model and is easy to implement.

Because the cracks are generally elongated and continuous, if one pixel is a crack pixel, there is a high probability that a crack pixel exists in its neighboring pixels; if the pixel is a background, the adjacent area of the pixel is also likely to be the background, so that the correlation relationship established for the pixel is more favorable for crack feature extraction. In the split Unet neural network, a PMDA module is added in each layer of a coding sub-network so as to improve the split feature extraction capability of an encoder.

Further, the second coding layer comprises a second convolution-feature selection module and a second parallel multidirectional attention mechanism module which are connected in sequence. The second convolution-feature selection module is further coupled to the first maximum pooling layer, and the second parallel multi-directional attention mechanism module is further coupled to the second maximum pooling layer and the third decoding layer. And performing feature extraction on the first pooled feature map through a second convolution-feature selection module to obtain a second feature map. And establishing an incidence relation for the pixels in the second feature map through a second parallel multidirectional attention mechanism module to obtain a second coding feature map.

The third coding layer comprises a third convolution-feature selection module, a first two-dimensional convolution module and a third parallel multidirectional attention mechanism module which are sequentially connected. The third convolution-feature selection module is also coupled to the second maximum pooling layer. The third parallel multidirectional attention module is also connected with a third maximum pooling layer and a second decoding layer. And performing feature extraction on the second pooled feature map through a third convolution-feature selection module to obtain a third feature map. And performing feature extraction on the third feature map through a first two-dimensional convolution module to obtain a first two-dimensional feature map. And establishing an incidence relation for the pixels in the first two-dimensional feature map through a third parallel multidirectional attention mechanism module to obtain a third encoding feature map.

The fourth coding layer comprises a fourth convolution-feature selection module, a second two-dimensional convolution module and a fourth parallel multidirectional attention mechanism module which are sequentially connected. The fourth convolution-feature selection module is also coupled to the third maximum pooling layer. The fourth parallel multidirectional attention module is further connected with a fourth maximum pooling layer and the first decoding layer. And performing feature extraction on the third pooled feature map through a fourth convolution-feature selection module to obtain a fourth feature map. And performing feature extraction on the fourth feature map through a second two-dimensional convolution module to obtain a second two-dimensional feature map. And establishing an incidence relation for the pixels in the second two-dimensional feature map through a fourth parallel multidirectional attention mechanism module to obtain a fourth encoding feature map.

The fifth coding layer comprises a fifth convolution-feature selection module, a third two-dimensional convolution module and a fifth parallel multidirectional attention mechanism module which are sequentially connected. The fifth convolution-feature selection module is also coupled to the fourth maximum pooling layer. The fifth parallel multidirectional attention module is also connected with the first upsampling layer. And performing feature extraction on the fourth pooled feature map through a fifth convolution-feature selection module to obtain a fifth feature map. And performing feature extraction on the fifth feature map through a third two-dimensional convolution module to obtain a third two-dimensional feature map. And establishing an incidence relation for the pixels in the third two-dimensional feature map through a fifth parallel multidirectional attention mechanism module to obtain a final coding feature map.

The first two-dimensional convolution module, the second two-dimensional convolution module and the third two-dimensional convolution module are convolution kernels of 3 multiplied by 3. The processing procedures of the second convolution-feature selection module, the third convolution-feature selection module, the fourth convolution-feature selection module and the fifth convolution-feature selection module are the same as the processing procedure of the first convolution-feature selection module, and the processing procedures of the second parallel multidirectional attention mechanism module, the third parallel multidirectional attention mechanism module, the fourth parallel multidirectional attention mechanism module and the fifth parallel multidirectional attention mechanism module are the same as the processing procedure of the first parallel multidirectional attention mechanism module, and are not described herein again.

A multi-path convolution-feature selection module in the network is placed at the first position of each layer to increase the number of channels and extract various features, and then the features are selected to ensure the effectiveness of the features input in the subsequent operation so as to improve the coding capacity of a coding sub-network. The choice of using a convolution kernel of 3 × 3 for further feature extraction is made in the network because the MC-FS module, which includes the dilation convolution, already has a larger receptive field, and at this time, the use of a small convolution kernel can reduce the overall parameters of the model. And then the parallel multi-direction attention mechanism module establishes an incidence relation for the pixels of the characteristics, so that the crack pixels are weighted, and the extraction capability of the network crack characteristics is enhanced. The final output features perform maximum pooling to reduce the resolution of the features.

S33: and decoding the coding feature map through the decoding sub-network to obtain a decoding feature map. Specifically, the decoding sub-network comprises a first up-sampling layer, a first decoding layer, a second up-sampling layer, a second decoding layer, a third up-sampling layer, a third decoding layer, a fourth up-sampling layer and a fourth decoding layer which are connected in sequence; the first coding layer is in jump connection with the fourth decoding layer; the second coding layer is in jump connection with the third decoding layer; the third coding layer is in jump connection with the second decoding layer; the fourth coding layer is in jump connection with the first decoding layer; the fifth encoding layer is connected to the first upsampling layer.

Step S33 specifically includes: and upsampling the coding characteristic diagram through a first upsampling layer to obtain a first upsampling characteristic diagram. And splicing the first up-sampling feature map and the fourth residual map to obtain a first spliced map.

And decoding the first spliced graph through a first decoding layer to obtain a first decoding graph. And upsampling the first decoding image through a second upsampling layer to obtain a second upsampling characteristic image. And splicing the second up-sampling feature map and the third residual map to obtain a second spliced map.

And decoding the second spliced graph through a second decoding layer to obtain a second decoding graph. And upsampling the second decoding image through a third upsampling layer to obtain a third upsampling characteristic image. And splicing the third up-sampling feature map and the second residual map to obtain a third spliced map.

And decoding the third spliced graph through a third decoding layer to obtain a third decoding graph. And upsampling the third decoding image through a fourth upsampling layer to obtain a fourth upsampling characteristic image. And splicing the fourth up-sampling feature map and the first residual map to obtain a fourth spliced map.

The last layer of the coding sub-network starts to enter the decoding sub-network, the output characteristics of the last layer of the coding sub-network are up-sampled, the resolution ratio is doubled to form the characteristics of the decoding sub-network part and is connected with a residual error image output by a corresponding coding layer, the characteristics of the coding layer firstly enter a compression module for channel compression, unnecessary shallow layer characteristics are removed, the network size is reduced and then connection is carried out, information loss is prevented, and then characteristic extraction is carried out twice. And after the steps are executed for four times, decoding is finished, the resolution of the decoded feature map is consistent with that of the original map, the number of channels is the number of categories, and then softmax activation is carried out on all the channels of each pixel to finish pixel-level classification.

S34: classifying each pixel in the decoding feature map based on a softmax activation function, and determining a prediction result of each pixel in the decoding feature map; the prediction results include 0 and 1.

S35: and determining a loss function according to the prediction result of each pixel in the decoding characteristic diagram and the label information of each pixel in the standardized image, and performing iterative training on the coding sub-network and the decoding sub-network according to the loss function until the loss function is converged to obtain an optimal fracture Unet neural network, wherein the optimal fracture Unet neural network is a fracture identification model.

Since the cracks are elongated features and the proportion of pixels in the image is small, the invention uses Dice Loss and Focal Loss as Loss functions:

where Loss is the Loss function value, N is the total number of pixels in the normalized image, p _i To normalize the prediction of the ith pixel in the image, y _i To normalize the label information of the ith pixel in the image, γ is a constant of 2, p _t The prediction probability that the model is positive for a prediction result,

further, step S4 specifically includes:

s41: and scanning the road to be identified by adopting a three-dimensional ground penetrating radar through a multi-channel antenna array to obtain a radar image of each channel. The multichannel antenna array is 1.5 meters in scanning width each time, and through setting up a plurality of detection way, the later stage combines accurate positioning data to splice many data to the realization is to the full section coverage scanning of any road, as shown in fig. 6. In the process of radar data identification processing, three-dimensional radar data are converted into two-dimensional picture data in a slicing mode, an XYZ three-dimensional coordinate system is established, the advancing direction of a radar antenna is taken as an x axis, the depth direction perpendicular to the ground is taken as a z axis, and the direction perpendicular to an XZ plane is taken as a y axis, so that three slices perpendicular to each other can be obtained, namely an xy slice, a yz slice and an XZ slice.

S42: and for the radar image of each channel, interpolating pixel points in the radar image by adopting a bilinear interpolation method to obtain an interpolated radar image of the corresponding channel.

Specifically, the transverse (y) interval of a data matrix acquired by a three-dimensional radar is 0.071m, the advancing direction (x) interval is 0.025m, and the depth direction (z) interval is about 0.009m, so that in order to make the data matrix uniform, the transverse (y) and the advancing direction (x) need to be interpolated, the intervals are close to the depth direction, and three-dimensional points of x, y and z are made to be adjacent to each otherThe distances are all 0.009m, and the interpolation calculation method is shown in FIG. 7. For any adjacent detected data point P1 (x) ₁ ,y ₁ ,f1)、P2(x ₁ ,y ₂ ,f ₂ )、P3(x ₂ ,y ₂ ,f ₃ )、P4(x ₂ ,y ₁ ,f ₄ )，f ₁ 、f ₂ 、f ₃ 、f ₄ Inserting a point Q (x, y, f) at any position in a rectangular area with four points as corner points for radar data values of four points, and determining a radar data value f of a random point by adopting the following formula:

the radar data value in any coordinate can be calculated based on the above formula. Crack signals are not significant in the YZ and XZ planes, and appear more significant in the XY plane, which is associated with crack lesions that are longer in length, but generally smaller in width and height. Therefore, the XY plane of the three-dimensional radar data is mainly intercepted to identify the crack diseases.

S43: and splicing the interpolation radar images of all channels to obtain a road image to be identified.

Because partial areas may be missed in the lane-dividing scanning process, black areas may be left in the picture, and each radar signal area divided by the black areas corresponds to one lane. And because an obstacle may be encountered in the driving process, the radar image of each lane may be curved or may have an overlapped part, but most effective radar information is not influenced, and the crack can still be segmented.

The method for acquiring the training sample set in step S1 is the same as the method for acquiring the road image to be recognized, and is not described herein again. The data set used by the training model consists of the ground penetrating radar XY plane. After scanning of each survey, the survey pictures need to be spliced to obtain a section of complete road radar picture, a large number of radar images of a road horizontal plane are obtained by adopting the method, and a proper number of crack images and disease-free images are selected from the images to form a data set. The radar image crack signal is the same as the crack shape seen on the road surface, and appears darker or lighter in color with a larger difference from the surroundings. The selection should ensure that the data set contains various common-shaped cracks, including transverse cracks, longitudinal cracks and cross cracks.

The data set used in the present invention contains 4915 pictures, each with a resolution of 604 × 604. And labeling the crack area of the image by using a Labelme tool after the selection is finished. Because the radar image is different from a common RGB image, more interference signals may exist or the edge pixels of the crack cannot be accurately determined due to the influence of different working conditions. The middle black region is a vacant region in connection and does not belong to a crack portion, and therefore, the black region is marked by dividing into two parts. Finally, 3915 pictures are used as a training set and a verification set, and the proportion of the two pictures is 8: 2, wherein the validation set does not participate in the training, and is used for validating the training result of each round. And the rest pictures are used as a test set, the test set is divided into two parts, the first part is a crack test set, each picture contains 500 cracks, and the crack test set is used for testing the splitting capacity of the model on the cracks. The remaining 500 sheets are crack-free test sets and are used for testing whether the model is easy to generate misjudgment.

In order to prove the Crack identification precision of the Crack Unet provided by the invention, the effects of different algorithms are contrasted and explained below.

The training configuration used in the experiment is a computing platform with a graphics processor model of NVIDIA 1080Ti, a video memory of 11GB, an operating system of CenterOS7, and the deep learning framework uses Keras with TensorFlow-gpu as a back end. Random horizontal inversion, translation and scaling are used for data enhancement of the data set in the training process. Adam is selected by the optimizer, the initial learning rate is set to 10-5, the batch is set to 3, the training round is set to 200, and a mechanism of automatic reduction of the learning rate and early termination is used, namely loss of the verification set is monitored, the verification loss does not decrease in 6 rounds, the learning rate is halved, and the training is terminated if 10 rounds are continued. Training set loss and validation set loss drop curves as shown in fig. 8(a) and 8(b), the model converges at the 63 rd iteration due to the automatic stopping strategy.

The present invention uses MPA (Mean Pixel Accuracy), MIoU (Mean Intersection over Union), FLOPs (Floating-point operations), and FPS (Frames Per Second) as evaluation indexes for a model. The average pixel accuracy is the proportion of the classified correct number of all pixels to all pixels, and can be used for evaluating the accuracy of pixel-level classification. Each pixel in the segmentation result corresponds to one of the following four categories:

(1) true case (True Positive, TP): the model is predicted to be a positive example, and the label is also a positive example;

(2) false Positive (FP): the model is predicted to be a positive example, and the label is a negative example;

(3) false Negative (FN): the model is predicted as a negative example, and the label is a positive example;

(4) true Negative (True Negative, TN): the model predicts as a counter-example, as does the label.

For one category in semantic segmentation, PA (Pixel Accuracy) is:

MPA is the average of the sum of all classes PA, and the semantic segmentation has a default existing class of background besides the target class to be segmented, so that the segmentation class of the invention has two types of cracks and background. IoU (Intersection over Union) is the Intersection of the label region and the prediction region divided by their Union, i.e. for the label region T _a And predicted region C _a IoU is expressed as:

MIoU is then the average of the sum of IoU for all classes. IoU can evaluate how much the segmented region coincides with the label expectation. FLOPs represent the number of floating-point number operations of a neural network, and can be used to evaluate the computational complexity or complexity of a model, and the FLOPs of a single convolutional layer are calculated as follows:

FLOPs＝(2×C _in ×K ² -1)×H×W×C _out ；

wherein, C _in Is the number of input channels, K is the size of the convolution kernel, H and W are the height and width of the output characteristic diagram, respectively, C _out Is the number of output channels.

The invention uses the current mainstream model and Crack UNet to compare the semantic segmentation index in the test set, the test carries out the related comparison experiment in the Crack data set, the compared model comprises depeplabv 3, PSPNet and Unet, all the test environments of the invention are the same as the training environment, and the comparison result is shown in Table 1:

TABLE 1 comparison of different model indices

It can be seen that the MPAs and mious of the Crack uet provided by the invention are higher than those of other models, wherein the MPA is higher than depeplabv 3, PSPNet and uet by 7.35%, 2.27% and 3.6% respectively, the mious is higher by 1.4%, 0.99% and 2.27% respectively, which are both greatly improved, and the FLOPs of the model are also the lowest. Although the FPS of the CrackNet is still low and the segmentation speed is slow, the level of real-time detection has also been reached. The method has the advantages that the identification capability of the diseases is more emphasized in the actual engineering application of road detection, so that the method has better segmentation capability in a crack segmentation task of a radar image compared with the current mainstream algorithm.

In order to test the accuracy of the model under the non-crack working condition, the invention tests all the models in the non-crack data set, and calculates the accuracy of each model in the non-crack data set under different false alarm percentage thresholds, wherein the false alarm percentage is the percentage of pixels with wrong classification in one picture in the total pixels, and the experiment uses three thresholds which are respectively < 1%, < 0.5% and < 0.1%. The test results are shown in table 2.

TABLE 2 segmentation accuracy of each model under different false alarm ratio thresholds in crack-free test set

Ratio of occupation of	DeepLabv3	Unet	PSPNet	Crack Unet
					<0.1％	92.80％	53.80％	90.80％	88.80％
<0.5％	98.20％	80.40％	95.80％	94.40％
					<1％	99.40％	93.40％	98.53％	97.20％

In the Crack-free dataset testing experiments, accuracy of the Unet at each threshold was lowest, and although the accuracy 88.80% of the Crack Unet was not the highest, it was comparable to deelabsv 3 and PSPNet. By combining the accuracy of the two data sets, it can be seen that the performances of deep lab v3 and Unet are not stable, deep lab v3 performs poorly in the crack test set, but performs best in the crack-free test set, so the separation capability of deep lab v3 is low. On the contrary, the Unet performs well in a crack test set, but generates more false reports in a crack-free test set, so that the Unet has better segmentation capability, is easy to classify by mistake and has low robustness. The PSPNet is more stable and has higher precision, but the PSPNet has lower precision in a Crack test set, and although the performance of Crack UNet in a Crack-free test set is not optimal, the gap between the Crack UNet and the deep lab v3 is smaller.

As shown in fig. 9, the three-dimensional ground penetrating radar crack disease identification system of the present invention includes: the device comprises a sample acquisition unit 1, a network construction unit 2, a training unit 3, a scanning unit 4 and a recognition unit 5.

Wherein, the sample acquiring unit 1 is used for acquiring a training sample set. The training sample set comprises a plurality of road sample images and label information of each pixel in each road sample image. The tag information includes 0 and 1; 0 indicates that the corresponding pixel is not a crack, and 1 indicates that the corresponding pixel is a crack.

The network construction unit 2 is used for constructing a split Unet neural network based on a spatial attention mechanism, VGG16 and the Unet neural network.

The training unit 3 is respectively connected with the sample acquisition unit 1 and the network construction unit 2, and the training unit 3 is used for training a fracture Unet neural network according to the training sample set to obtain a fracture identification model.

The scanning unit 4 is used for scanning the road to be identified through the three-dimensional ground penetrating radar to obtain the road image to be identified.

The recognition unit 5 is respectively connected with the training unit 3 and the scanning unit 4, and the recognition unit 5 is used for determining cracks in the road to be recognized according to the road image to be recognized and based on the crack recognition model and according to the road image to be recognized.

Specifically, the scanning unit 4 includes: the device comprises a channel scanning module, an interpolation module and a splicing module.

The channel scanning module is used for scanning the road to be identified by adopting a three-dimensional ground penetrating radar through the multi-channel antenna array to obtain a radar image of each channel.

The interpolation module is connected with the channel scanning module and is used for interpolating pixel points in the radar images by a bilinear interpolation method aiming at the radar images of each channel to obtain the interpolation radar images of the corresponding channels.

The splicing module is connected with the interpolation module and used for splicing the interpolation radar images of all channels to obtain a road image to be identified.

Compared with the prior art, the three-dimensional ground penetrating radar crack damage identification system has the same beneficial effects as the three-dimensional ground penetrating radar crack damage identification method, and the details are not repeated herein.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A three-dimensional ground penetrating radar crack disease identification method is characterized by comprising the following steps:

training the Unet neural network of the crack according to the training sample set to obtain a crack recognition model;

2. The method for identifying the three-dimensional ground penetrating radar crack diseases according to claim 1, wherein the step of scanning the road to be identified through the three-dimensional ground penetrating radar to obtain an image of the road to be identified specifically comprises the following steps:

3. The three-dimensional ground penetrating radar fracture disease identification method of claim 1, wherein the fracture Unet neural network comprises an encoding sub-network and a decoding sub-network; the encoding sub-network is connected with the decoding sub-network;

4. The three-dimensional ground penetrating radar crack disease identification method according to claim 3, wherein the coding feature map comprises a first residual map, a second residual map, a third residual map, a fourth residual map and a final coding feature map;

performing pooling operation on the third coding feature map through a third maximum pooling layer to obtain a third pooling feature map;

5. The three-dimensional ground penetrating radar crack damage identification method according to claim 4, wherein the decoding sub-network comprises a first up-sampling layer, a first decoding layer, a second up-sampling layer, a second decoding layer, a third up-sampling layer, a third decoding layer, a fourth up-sampling layer and a fourth decoding layer which are connected in sequence; the first coding layer is in jump connection with the fourth decoding layer; the second coding layer is in jump connection with the third decoding layer; the third coding layer is in jump connection with the second decoding layer; the fourth coding layer is in jump connection with the first decoding layer; the fifth coding layer is connected with the first up-sampling layer;

splicing the fourth up-sampling feature map and the first residual error map to obtain a fourth spliced map;

6. The three-dimensional ground penetrating radar crack disease identification method according to claim 4, wherein the first coding layer comprises a first convolution-feature selection module and a first parallel multi-direction attention mechanism module which are connected in sequence; the first parallel multidirectional attention mechanism module is further connected with the first max-pooling layer and the decoding subnetwork;

7. The three-dimensional ground penetrating radar crack disease identification method of claim 6, wherein the first convolution-feature selection module comprises four two-dimensional convolution kernels and a channel attention module; the sizes of the four two-dimensional convolution kernels are respectively 3 multiplied by 3, 2 multiplied by 2 and 2 multiplied by 2;

8. The three-dimensional ground penetrating radar crack disease identification method of claim 6, wherein the first parallel multi-direction attention mechanism module comprises a 2 x 1 convolution kernel, a 1 x 2 convolution kernel, a 2 x 2 convolution kernel, and a 1 x 1 convolution kernel;

connecting the left pixel association diagram, the lower pixel association diagram and the diagonal pixel association diagram to obtain a connection pixel association diagram;

9. A three-dimensional ground penetrating radar crack disease recognition system is characterized by comprising:

10. The three-dimensional ground penetrating radar crack damage identification system of claim 9, wherein the scanning unit comprises:

the channel scanning module is used for scanning the road to be identified by adopting a three-dimensional ground penetrating radar through the multi-channel antenna array to obtain a radar image of each channel;

the interpolation module is connected with the channel scanning module and is used for interpolating pixel points in the radar images by adopting a bilinear interpolation method aiming at the radar images of each channel to obtain interpolation radar images of the corresponding channels;

and the splicing module is connected with the interpolation module and used for splicing the interpolation radar images of all channels to obtain a road image to be identified.