CN111861978A

CN111861978A - Bridge crack example segmentation method based on Faster R-CNN

Info

Publication number: CN111861978A
Application number: CN202010473952.0A
Authority: CN
Inventors: 李良福; 冯建云
Original assignee: Shaanxi Normal University
Current assignee: Shaanxi Normal University
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2020-10-30
Anticipated expiration: 2040-05-29
Also published as: CN111861978B

Abstract

The invention belongs to the technical field of image target detection, and particularly relates to a bridge crack example segmentation method based on Faster R-CNN, which comprises the following steps of constructing a bridge crack data set; step two, marking a training sample; step three, building a bridge crack example segmentation model of improved Faster R-CNN; step four, training the example segmentation model built in the step three; step five, testing the example segmentation model trained in the step four; step six: and (5) actual detection. Compared with the prior art, the method has stronger robustness, not only can obtain accurate classification and positioning results of the bridge cracks, but also can generate a high-quality bridge crack segmentation mask for evaluating the damage degree of the bridge and formulating a corresponding maintenance scheme; in addition, the method can accurately detect multiple cracks in the image, so that the detection efficiency can be improved and the complete crack form can be obtained by combining the image splicing technology.

Description

Bridge crack example segmentation method based on Faster R-CNN

Technical Field

The invention belongs to the technical field of image target detection, and particularly relates to a bridge crack example segmentation method based on Faster R-CNN.

Background

The bridge is used as an important carrier for connecting two position points with larger span, and plays an important role in road transportation in China. However, when the bridge is subjected to long-term sun and rain and load operation, the generated internal stress can be transmitted to some weak parts along the bridge structure, so that cracks are easy to generate and develop on the surface of the structure at the position, the damage degree of the surface cracks in different directions to the bridge structure is different, and unsafe accidents are easy to cause if the extension direction of the surface cracks is vertical to the bearing surface of the structure.

Engineering practice and theoretical analysis show that most of bridges in service work with cracks, and potential hazards caused by the cracks of the bridges are not inconstant. In case comparatively serious crack appears in the concrete bridge, outside air and harmful medium can permeate the inside production carbonate through chemical reaction of concrete very easily, cause the basicity environment of reinforcing bar wherein to reduce, and the purification membrane on surface is suffered to destroy the back and is changeed and produce the corrosion, and in addition, concrete carbonization also can aggravate the shrinkage crack, produces serious harm to the safe handling of concrete bridge. As the most common disease characteristic in bridge construction, extremely fine cracks (smaller than 0.05mm) generally have little influence on the structural performance and can be allowed to exist; larger cracks can be continuously generated and expanded under the action of load or external physical and chemical factors to form through seams and deep seams, and indirectly or even directly influence the service life and the safety performance of the beam structure; if the width of the crack reaches more than 0.3mm, the integrity of the structure can be directly damaged, concrete carbonization, protective layer peeling and steel bar corrosion are caused, and mechanical discontinuities are formed in the bridge, so that the bearing capacity of the bridge is greatly reduced, and even collapse accidents happen in severe cases, and the normal use of the structure is damaged.

Thus, if a crack can be discovered at the beginning of its occurrence, an associated safety factor estimate is made to maintain it at the beginning of the formation of the hazard. The traditional detection method based on artificial vision is high in cost and low in efficiency, the detection accuracy is influenced by subjective factors, and the detection requirement of bridge cracks cannot be met more and more, the cracks are detected based on the fast R-CNN technology in the prior art, only the cracks in the image are marked by using a rectangular frame (such as application number 201910526241.2), the morphological characteristics of the cracks cannot be directly extracted, namely the damage degree of the cracks cannot be detected intuitively, and in addition, the prior art is basically used for detecting a single picture (local picture), so that not only is the detection efficiency low, but also the detected cracks are incomplete.

In view of the above, the present inventors have conducted a great deal of experimental research and then provide a bridge crack example segmentation method based on Faster R-CNN to solve the above problems.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a bridge crack example segmentation method based on Faster R-CNN, and the method not only can realize the position of a crack on an original image but also can generate a high-quality bridge crack segmentation mask according to the real form of the crack by improving a Faster R-CNN model; in addition, the method can accurately detect multiple cracks in the image, so that the detection efficiency can be improved and the complete crack form can be obtained by combining the image splicing technology.

The technical problem to be solved by the invention is realized by the following technical scheme: the invention provides a bridge crack example segmentation method based on FasterR-CNN, which comprises the following steps:

step one, constructing a bridge crack data set

1) Normalizing the acquired bridge crack image into a 256 multiplied by 256 resolution bridge crack image;

2) amplifying the number of the normalized bridge crack image samples by adopting a geometric transformation algorithm, a linear transformation algorithm and an image filtering algorithm;

3) dividing the amplified bridge crack data into a training set, a testing set and a verification set;

step two, marking training samples

Labeling the training set samples divided in the first step;

step three, building a bridge crack example segmentation model of improved Faster R-CNN

1) Adding crack mask branches;

2) replacing region of interest pooling with region of interest alignment;

3) addition of predictive crack mask intersection score

Step four, training the example segmentation model built in the step three

Training the bridge crack example segmentation model of the Faster R-CNN built in the third step by using the training samples marked in the second step;

step five, testing the example segmentation model trained in the step four

After the training is finished, testing the trained bridge crack example segmentation model of the fast R-CNN by using the test set sample in the step one, and verifying the robustness of the improved bridge crack example segmentation model of the fast R-CNN;

step six, actual detection

Inputting the bridge crack image to be identified into the bridge crack example segmentation model of the tested Faster R-CNN, judging whether the image is the bridge crack image, framing the position of the crack if the image is the bridge crack image, and generating a bridge crack segmentation mask.

Further, the specific process of step three, 1) is as follows:

the crack mask branch is a small-sized full convolution network applied to each crack interested region, a bridge crack segmentation mask is predicted in a pixel-to-pixel mode, and a positive region selected by an interested region classifier is used as the input of the crack mask branch network;

a corresponding soft mask represented by floating point numbers is then generated and is parallel to the branch for fracture classification and the bounding box regression branch in the Faster R-CNN network.

Further, the specific process of step three, step 2) is as follows:

firstly, no quantification is carried out on the region of interest or the space unit, and the deviation between the extracted features and the input is avoided;

Then, calculating the accurate values of the input characteristics of four regular sampling positions in each region of interest unit by adopting a bilinear interpolation method;

and finally, calculating a final result in a maximum value or average value mode.

Further, the specific process in step three 3) is as follows:

the forecasting crack mask intersection comparison branch model is used for describing the initial crack segmentation quality by utilizing the pixel level intersection ratio between the forecasting crack mask and the actually marked crack mask; using the characteristics of the alignment layer of the region of interest and the predicted crack mask connection as the input of the predicted crack mask intersection ratio model part, and outputting the intersection ratio between the predicted crack mask and the matched actually marked mask;

during connection, a maximum pooling layer with kernel size of 2 and stride of 2 is used so that the predicted crack mask has the same spatial size characteristics as the region of interest;

the prediction crack mask merging branch model part consists of 4 convolution layers and 3 complete connection layers, for 4 convolution layers, the kernel size and the filter number of all the convolution layers are respectively set to be 3 and 256 on the basis of a mask basic frame, for 3 complete connection layers, a basic frame of a regional convolution neural network is adopted, the outputs of the first two complete connection layers are set to be 1024, and the output of the last complete connection layer is set to be the number of classes.

Further, the general loss function formula of the bridge fracture instance segmentation model in the third step is as follows:

L＝L_cls+L_box+L_mask(1)

wherein L is_clsIs the loss of a classification branch, L_boxIs the loss of bounding box regression, L_maskIs the loss of the split mask branches.

Further, the training process of the fourth step is specifically:

1) sending the training sample marked in the step two into the bridge crack example segmentation model of the Faster R-CNN built in the step three, and firstly, processing the training sample through a main network, wherein the main network is a residual error network 101 and is used for preliminarily extracting crack characteristics;

2) inputting the preliminarily extracted crack feature result into a feature pyramid network for further processing so as to enhance the capability of representing cracks on multiple scales;

3) then, area generation network processing is carried out, wherein the area generation network processing is a small neural network, an image is processed in a sliding window mode, an area with a target in the image is found, a target candidate frame is generated, then whether the candidate frame is a foreground or a background is judged through a softmax function, an anchor frame is corrected through border frame regression, namely the candidate frame is further determined, finally, a border frame accurately containing a crack is selected according to the candidate frame of the area generation network, the position and the size of the border frame are finely adjusted, the target candidate frame of a detection result is obtained, and meanwhile backward transmission is completed;

4) And finally, aligning the interested regions to extract features from each candidate frame, performing crack classification by using crack classification branches of the network, generating a crack boundary frame by using boundary frame regression branches, generating a crack mask by using crack mask branches, and finally completing the example segmentation of the bridge crack.

Further, the division ratio of the data set of the step one is listed as a training set: and (3) test set: the validation set is 10:1: 1.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention relates to a bridge crack example segmentation method based on Faster R-CNN, because a bridge crack is not in a simple form, the background of a crack image is complex, the noise interference is large, the crack cannot be robustly detected by a traditional digital image processing algorithm and a shallow machine learning algorithm, but the detection result of the bridge crack can be improved by deep visual structure characteristics obtained by learning a crack detection method based on deep learning, but a target detection model (such as Faster R-CNN) in the deep learning only detects the crack in an original image and marks the position of the crack by a rectangular frame, the real form of the crack cannot be intuitively obtained, the damage degree of the crack cannot be clearly evaluated and further processed, in order to overcome the defects, the invention improves the traditional Faster R-CNN model, by adding the crack mask branches, the classification and positioning problems of bridge cracks can be completed, a high-quality bridge crack segmentation mask can be generated, and the detection effect of a target detection network and semantic segmentation is achieved; the pooling of the region of interest is replaced by the alignment of the region of interest, so that the problem of the pooling dislocation of the region of interest is solved, and the exact spatial position is accurately reserved; by adding the prediction crack mask intersection score, the problem that accurate frame-level positioning results and higher classification scores can be obtained by bridge crack instance segmentation, but corresponding bridge crack masks are inaccurate is solved, and accurate masks are finally generated; the bridge crack detection method and the bridge crack detection device can realize accurate detection of the bridge crack, and have stronger robustness compared with the prior art.

2. After the bridge crack image is detected, the damage degree of the bridge can be evaluated according to the generated high-quality bridge crack segmentation mask, and the extracted crack is further quantized, so that a reliable reference data index can be provided for maintenance and management of the bridge, and defects can be found and processed in time to avoid accidents. The method has important significance for ensuring safe operation and prolonging service life of the bridge, and can classify the damage degree of the bridge cracks through detection data (such as the crack area of the bridge or the width of the corresponding crack), so that bridge diseases can be found as early as possible, maintenance and reinforcement can be carried out timely, treatment can be carried out in advance, bridge maintenance cost can be saved, and comprehensive economic benefits of the bridge in the operation period can be improved.

3. According to the bridge crack example segmentation method based on the Faster R-CNN, after multiple times of experimental verification, multiple crack images are spliced into the multiple crack images by using an image splicing technology, accurate crack detection can still be performed by adopting the method, and the time consumption is equivalent to that of a single crack, so that the detection efficiency can be greatly improved; in addition, as the acquisition of the bridge crack image is local, after the image splicing technology is used for processing, the method is used for detecting and seeing more complete crack conditions, and the complete analysis on the actual crack conditions is facilitated.

Drawings

FIG. 1 is a flow chart of the steps of the method of the present invention;

FIG. 2 is a transformation diagram of a crack image of a portion of a bridge for use in the experiment of the present invention;

FIG. 3 is a result of segmentation and labeling of a part of bridge crack image examples for experiments according to the invention;

FIG. 4 is a schematic diagram of the operation of the region of interest alignment layer of the present invention;

FIG. 5 is a diagram of a different design choice for the PCMIoU input of the present invention;

FIG. 6 is a diagram of a bridge fracture example segmentation model structure according to the present invention;

FIG. 7 is a diagram of a characteristic pyramid network of the present invention;

FIG. 8 is a simplified diagram of a feature anchor of the present invention;

FIG. 9 is a graph showing the comparison of PCMIoU modules according to the present invention;

FIG. 10 is a graph comparing crack detection results using different data sets in accordance with the present invention;

FIG. 11 is a graph showing the results of the multi-crack detection of the present invention;

FIG. 12 is a graph showing the results of multiple fractures of the splice of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects of the present invention clearer, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention, the detailed description being as follows.

The invention provides a bridge crack example segmentation method based on fast R-CNN through a large number of experimental demonstrations, which solves the problem that the bridge crack cannot be well detected by the traditional digital image processing algorithm and the shallow machine learning algorithm, and not only can accurate bridge crack classification and positioning results be obtained, but also a high-quality bridge crack segmentation mask can be generated.

The present invention will be described in further detail with reference to the following examples and the accompanying drawings.

Example (b): as shown in the attached figure 1, the invention provides a bridge crack example segmentation method based on Faster R-CNN, which comprises the following steps:

step one, constructing a bridge crack data set

1) Firstly, normalizing 2000 acquired bridge crack images into 256 multiplied by 256 bridge crack images;

the method specifically comprises the steps that in order to guarantee the quantity balance of various types of bridge crack images in a data set, various types of bridge crack images are processed, namely cracked bridge crack images, reticular bridge crack images, transverse bridge crack images, longitudinal bridge crack images, pit slot type bridge crack images and crack-free bridge images, the bridge crack images after being subjected to digital image processing and a series of related algorithms (including geometric transformation, linear transformation, image filtering algorithm and the like) are expanded are shown in figure 2, the operation of expanding the data set does not affect the example segmentation of the bridge crack images, and the final quantity of the expanded bridge crack image data set reaches 24000 crack images;

in this embodiment, the expanded bridge crack data is as follows: and (3) test set: the validation set was divided at a ratio of 10:1:1, i.e. 20000 samples in the training set, 2000 samples in the testing set and 2000 images in the validation set.

Step two, marking training samples

And labeling the training set samples divided in the first step, labeling each training data by using an open source image labeling tool LabelMe in the labeling process, focusing on cracks in the image during labeling, finishing accurate labeling on each crack in the image, and sequentially naming the labeling as Bridgecrack1, Bridgecrack2, Bridgecrack3 and the like. Each image labeled will generate a corresponding json file in which labeled label information is stored. However, for the task of image instance segmentation, the corresponding label is required to be an image file in the format of. labelme _ json _ to _ dataset < filename >. json, a folder is generated by the corresponding json file, and 5 files are contained in the folder, namely img.png (original drawing), info.yaml, label.png, label _ names.txt and label _ viz.png, as shown in fig. 3. However, in practice, because the training set has large data, the code mode is adopted to realize batch conversion and obtain manually labeled image files from the folder in batch.

1) Adding crack mask branches

The Faster R-CNN model includes two stages after the input image passes through the convolutional layer, the region generation network is the first stage, which is used to output candidate object bounding boxes, and the second stage is the same as the part in the Fast R-CNN model, which uses region-of-interest pooling to extract features from each candidate box and performs classification and bounding box regression. Because the two stages share the same convolutional neural network, the input of the two stages is the same characteristic diagram, and the operation speed of the network is accelerated. The bridge fracture instance segmentation model has the same first stage, namely, a region generation network. But in the second stage, in addition to the tag of the output crack and the location of the boundary box of the crack, a binary mask is output for each crack region of interest, and this branch of the output crack binary mask is called the crack mask branch by the present invention, which is parallel to the crack classification branch and the boundary box regression branch.

The fracture mask branch is a small full convolution network applied to each fracture region of interest, predicting the bridge fracture split mask in a pixel-to-pixel fashion, taking the positive region selected by the region of interest classifier as the input to the fracture mask branch network, and then generating a corresponding soft mask represented by floating point numbers, in parallel with the branches used for fracture classification and bounding box regression in the existing fast R-CNN model.

When the Faster R-CNN model is trained, the invention takes the multitask loss function on each sampled interested region, namely the total loss function formula of the Faster R-CNN model of the invention as follows:

L＝L_cls+L_box+L_mask(1)

wherein L is_clsIs the loss of a classification branch, L_boxIs the loss of bounding box regression, L_maskIs the loss of the split mask branches. The loss of classification branching and the loss of bounding box regression are defined as in Faster R-CNN, with Km for each region of interest of the image after fracture mask branching²The dimensional output, i.e. the crack mask branches, will generate K binary masks with resolution m x m for each region of interest, several masks being generated for several cracks, K representing the number of cracks. When loss is calculated, the invention adopts sigmoid loss function to calculate loss of each pixel respectively, and L is calculated_maskDefined as the average binary cross entropy loss. And each region of interest, L, associated with an actual type k_maskThe definition of (c) is applied to the kth mask because the loss due to the output of the remaining masks is negligible and minimal.

Invention pair L_maskThe definition of (c) allows the model to generate a binary mask for each crack individually, thereby avoiding corner-to-corner between cracks. The invention uses the fracture classification branch to decide the category of the output fracture mask, so that the fracture mask and the category prediction do not interfere with each other. It is completely different from the operations of applying the full convolution network to pixel-level softmax and image semantic segmentation with multiple cross-entropy losses, because such operations can cause different cracks to rob the mask. The invention therefore replaces the multiple cross-entropy losses with binary losses, while the pixel level softmax remains unchanged. Through verification, the processing mode can obtain high-quality segmentation results.

The mask is to encode the spatial layout of the input objects. Thus, unlike class labels or frame offsets, which are inevitably shortened to a non-long output vector by a fully connected layer, the spatial structure of the extraction mask can naturally be resolved by the inter-pixel relationships provided by convolution. In particular, the present invention uses a full convolutional network to predict an m mask for each region of interest. Thus, each layer in the split mask branch can significantly maintain the m object space layout unchanged without shortening to a vector representation with reduced spatial dimensions. Unlike those methods that use fully connected layers for mask prediction, the full convolution representation of the present invention requires fewer parameters and the experimental run results are more accurate.

2) Replacing region of interest pooling with region of interest alignment

The bridge crack instance segmentation model is expanded on the basis of the Faster R-CNN model, and correct construction of crack mask branches is crucial to finally obtaining accurate bridge crack instance segmentation detection results. However, since the Faster R-CNN model is not intended to guarantee pixel-to-pixel alignment between the input and output of the model, since region-of-interest pooling (ropool) is employed in the Faster R-CNN model, which performs coarse spatial quantization for feature extraction. The pixel-to-pixel approach requires that the region-of-interest features must be aligned correctly to accurately preserve explicit pixel spatial correspondence, which is critical to outputting a high quality crack mask. In order to solve the problem of dislocation of the pooling of the interested regions, so that the model of the invention can output a high-quality bridge crack mask, the invention adopts a simple non-quantitative layer called as interested region alignment (RoIAlign), and accurately reserves the exact spatial position.

Because the region-of-interest pooling layers are not aligned one-to-one according to pixels, the influence on the final output crack boundary candidate box is not great, but the influence on the accuracy of the output crack mask is great, and in order to solve the problem, the invention adopts a region-of-interest aligning layer which is not strictly quantized and avoids the deviation between the extracted features and the input. The invention firstly does not carry out any quantification on the interested region or the space unit, and adopts x/16 which is different from interested pooling; then, a bilinear interpolation method is adopted when the accurate values of the input characteristics of the four regular sampling positions in each region of interest unit are calculated; finally, the final result is calculated by means of maximum or average value, the detailed operation information is shown in fig. 4, wherein the dotted line grid represents a feature diagram, the solid line represents a region of interest (in the present embodiment, there are 2 × 2 storage units), the points represent 4 sampling points in each storage unit, the alignment of the region of interest calculates the value of each sampling point by bilinear interpolation from the adjacent grid points on the feature diagram, and the quantization is not performed on any coordinate involved in the storage unit or the sampling point of the region of interest. Generally, the result is insensitive to the exact sample location or number of samples, as long as no quantization is performed.

3) Addition of predictive crack mask intersection score

In an example segmentation study of a target, the confidence of most classification of an example is used simultaneously as the quality score of the mask. However, mask quality is usually quantified as the intersection-to-union ratio between an example mask and its true labeled mask, and is usually not well correlated with the classification score, resulting in an accurate frame-level localization result and a higher classification score for the example segmentation network, but the corresponding mask is inaccurate. In order to solve the problems that the accurate frame-level positioning result and the higher classification score can be obtained by the bridge Crack example segmentation network, but the corresponding bridge Crack Mask is inaccurate, in the invention, a network for predicting the Crack Mask Intersection-over-unity (PCMIoU) is adopted and used for learning the quality of the predicted Crack example segmentation Mask.

PCMIoU focuses on scoring masks, unlike previous methods that aim to obtain more accurate case positioning or segmentation masks. To achieve this goal, the network will learn the score of each crack mask rather than use its classification score. For ease of distinction, the present invention calls the score of PCMIoU learning, which is inspired by the Average Precision (AP) metric of the instance segmentation, as a network model that directly learns the fracture mask cross-over ratio, which describes the initial fracture segmentation quality using the pixel-level cross-over ratio between the predicted fracture mask and its true annotated fracture mask. In the present invention, once the predicted fracture mask intersection ratio is obtained at the testing stage, the fracture mask score can be re-evaluated by multiplying the predicted fracture mask intersection ratio by the classification score, so the fracture mask score can know both the fracture semantic class and the integrity of the fracture instance mask.

The invention relates to a method for preparing a compound_maskDefined as the fraction of the crack prediction mask, ideally S_maskEqual to the pixel-level intersection ratio between the crack prediction mask and its matching true annotated crack mask. Ideal S_maskThere should also be positive values for only the true label class and zero for the other classes, since one mask belongs to only one class, which requires that the crack mask score works well on both tasks: firstly, the masks are classified into correct classes, and secondly, the candidate masks are subjected to cross-comparison and regression into foreground object classes. But it is difficult to train both tasks using only a single objective function. For simplicity, the present invention decomposes the mask score learning task into crack mask classification and mask cross-comparison regression, so for all crack classes, the score of each crack prediction mask is represented by equation (2):

S_mask＝S_cls·S_iou(2)

wherein S_clsThe score of which class the prediction belongs to is emphasized because S_clsThe goal of (1) is to classify the prediction belonging to which class, which has been done in the classification task at the stage of the regional convolutional neural network, so that the corresponding classification score, S, can be used directly_iouThen it is the score of the regression crack mask.

The objective of the PCMIoU network portion is to regress the intersection ratio between the predicted crack mask and its true annotated crack mask. The present invention uses the features of the region of interest alignment layer and the predicted crack mask connections as inputs to the PCMIoU network portion, with the output being the cross-over ratio between the predicted crack mask and its matching true annotated mask. In connection, the invention uses the largest pooling layer with kernel size of 2 and stride of 2, so that the predicted crack mask has the same space size characteristics as the interested region, the invention only selects to cross-compare and regress the crack mask into the true tag class, but not all classes, the PCMIoU model part of the invention consists of 4 convolutional layers and 3 fully connected layers, for 4 convolutional layers, the invention sets the kernel size and the filter number of all convolutional layers to 3 and 256 respectively on the basis of the mask basic frame, for 3 fully connected layers, the invention adopts the basic frame of the regional convolutional neural network and sets the outputs of the first two fully connected layers to 1024, and sets the output of the last fully connected layer to the number of classes.

In training the PCMIoU network portion, the present invention uses candidates of the area generation network as training samples. The training sample requires that the cross-over ratio between the prediction box and the matching true label box be greater than 0.5, as with the training sample of the mask base frame portion. In order to generate a regression target of each training sample, firstly, a prediction mask of a target class is obtained, and binarization is carried out on the prediction mask by using a threshold value of 0.5; the intersection ratio between the binary mask and its matching true annotated mask is then used as the target for the fracture mask intersection ratio. The invention uses L₂And (4) regressing the fracture mask intersection ratio through loss, setting the loss weight to be 1, integrating the basic framework of the fracture mask intersection ratio into the constructed bridge fracture instance segmentation model, and training the whole network end to end.

The design of the PCMIoU network module is selected as shown in fig. 5, i.e., a crack mask score map (28 × 28 × C) is predicted from the crack mask basic frame and the region-of-interest feature fusion.

The four designs are explained as follows:

(a) only the target mask connects the region of interest features: firstly, a scoring graph of a target class is taken, then the maximum pooling treatment is carried out, and then the scoring graph is connected with the characteristics of the region of interest.

(b) Only the target mask is multiplied by the region of interest features: firstly, a scoring graph of a target class is taken, then the maximum pooling processing is carried out, and the feature of the region of interest is multiplied.

(c) All masks connect the region of interest features: all class C mask score maps are subjected to maximum pooling and connected to the region of interest features.

(d) Only the target mask connects the high resolution region of interest features: first, a score map of the object class is taken and connected with 28 × 28 region-of-interest features.

The present invention uses the COCO evaluation index AP and the average exceeds the cross-over threshold to report the result, including AP @0.5, AP @0.75, AP @0.5 (or AP @0.75) indicating that the cross-over threshold of 0.5 (or 0.75) is used to determine if the predicted bounding box or mask is positive in the evaluation. The AP of the present invention was evaluated using a crack mask intersection ratio, and the results for the four designs are shown in Table 1 below:

TABLE 1PCMIoU inputs results of different design choices

It can be seen from a comparison of the results that the performance of the PCMIoU network module is robust to different methods of fusing crack mask predictions and features of interest. Performance improvements are observed in various designs. The present invention takes the target score map and the region of interest features as a default choice since best results can be obtained by connecting them.

The finally constructed bridge crack example segmentation model structure is shown in fig. 6, and the model is divided into two stages, wherein the first stage processes an input image and generates a candidate frame, and the second stage classifies the candidate frame and generates an accurate boundary frame and a high-quality mask.

Step four, training the example segmentation model built in the step three

The method comprises the following specific steps: 1) sending the training sample marked in the step two into a bridge crack example segmentation model for building improved Faster R-CNN in the step three, firstly, processing the training sample through a main network, wherein the main network is a residual error network (ResNet) 101 which is a standard convolution neural network and is used for preliminarily extracting crack characteristics when a characteristic extractor is used;

the reason for choosing the residual network is that it adds a jump on the standard feed-forward convolutional network to bypass the connection of some layers, each bypass layer generates a residual block accordingly, and the convolutional layer predicts the residual of the input tensor. The common deep feedforward network is difficult to optimize, but the residual error network has short connection of layer jump, namely, increases the shortcut of convolution layer output summation, so that the information in the network is more smoothly propagated forwards and backwards, and the degradation problem of the deep neural network is solved. The residual error network adopted by the invention is ResNet-101, and 101 refers to 101 layers of the residual error neural network with weights, including convolution layers and full-connection layers, and excluding pooling layers and batch normalization layers.

the invention introduces a feature pyramid network on the basis of the ResNet-101 network to further expand the model of the invention so as to be capable of representing the capability enhancement of the target on multiple scales. Compared with the standard feature extraction pyramid, the FPN of the present invention improves the feature extraction performance by adding a second pyramid, and it functions to select the high-level features from the first pyramid and pass them to the lower layer, as shown in fig. 7, in this process, the features of each level can be combined with the high-level and low-level features, and the FPN introduces extra complexity: in FPN the second pyramid has a feature map containing features at each level, rather than a single skeleton feature map in the standard skeleton (i.e., the highest level in the first pyramid), the features of which level is chosen is dynamically determined by the size of the object.

3) Then, a region generation network process is carried out, wherein the region generation network process is a small neural network, the image is processed by using a sliding window mode, a region with a target in the image is found, a target candidate frame is generated, then whether the candidate frame is a foreground or a background is judged by a softmax function, an anchor frame is corrected by bounding box regression, namely, the candidate frame is further determined, finally, a bounding box which accurately contains a crack is selected according to the candidate frame of the region generation network, and the position and the size of the bounding box are finely adjusted, as shown in fig. 8, if a plurality of mutually overlapped candidate frames exist, the candidate frame with the highest foreground score is finally reserved, and the rest is discarded by using a non-maximum suppression method. And finally, obtaining a target candidate frame of the detection result, and simultaneously completing backward transfer to finish the first stage of the model.

4) And finally, extracting features from each candidate frame by using region-of-interest alignment, performing crack classification by using crack classification branches of the network, generating a crack boundary frame by using boundary frame regression branches, and generating a high-quality crack mask by using crack mask branches, thereby finally completing the example segmentation of the bridge cracks.

Since a general classifier can only handle fixed input sizes, but the region of interest box appears to be of different sizes after the above fine tuning, this problem needs to be solved first before further processing the output of the region generation network. The present invention solves this problem by using region-of-interest alignment, which means that a region of the feature map is first cut out, and then the cut-out portion is modified to the required resolution. Region of interest alignment implements image cropping and resizing operations similar to those in image software, with the only difference in implementation detail. The specific operation is to sample at different points of the feature map, calculate the accurate value of the input feature of four regular sampling positions in each region of interest unit by applying bilinear interpolation, and calculate the final result by using a maximum value or average value mode, so that the exact spatial position is accurately reserved. After the processing of the region-of-interest alignment layer, the classifier, the bounding box regressor and the crack mask branches can be used for classifying candidate boxes and generating a bounding box and a high-quality crack mask respectively, and finally the example segmentation of the bridge crack is completed.

Step five, testing the example segmentation model trained in the step four

After the training is finished, testing and training the improved bridge crack example segmentation model of the Faster R-CNN by using the test set sample in the step one, and verifying the robustness of the bridge crack example segmentation model of the improved fast R-CNN;

step six, actual detection

Inputting the bridge crack image to be identified into a tested bridge crack example segmentation model of the Faster R-CNN, judging whether the image is the bridge crack image, framing the position of the crack if the image is the bridge crack image, and generating a segmentation mask to mark the real region of the crack form.

To verify the feasibility of the present invention, the inventors designed the following three sets of comparative experiments to be performed from different perspectives:

the first group of experiments are used for verifying the influence of a network module PCMIoU for predicting the intersection ratio of the crack masks on the experimental results of the bridge crack example segmentation model, the experimental results are shown in fig. 9, after the network module PCMIoU is added to the bridge crack example segmentation model, the generated crack masks and the positioning frames are more accurate, particularly the influence on the generated crack masks can be seen, the generated crack masks are closer to the actually marked crack masks, the crack masks are continuous and not broken, and the real bridge crack forms are met.

In order to further verify the effect of the bridge crack example segmentation model on crack detection, accuracy (Precision), Recall (Recall), F1 Score (F1-Score) and classification Score are used for quantitatively evaluating and analyzing the crack detection effect, and specific calculation formulas of a crack accuracy index Pre and a crack Recall index Rec are shown as a formula (3) and a formula (4):

and the F1 score is the harmonic mean of the precision rate and the recall rate, and the calculation formula is shown in the formula (5):

after calculation, the evaluation results are shown in table 2 below:

table 2 comparison table of effect of PCMIoU network module on results

The experimental result data in table 2 show that the accuracy, recall rate, F1 score and classification score of crack detection are significantly improved after the PCMIoU network module is added, which indicates that the classification, the generation of the positioning frame and the crack mask of the bridge crack example segmentation network model are greatly improved after the PCMIoU module is added.

The second set of experiments is used for testing the influence of the bridge fracture data set amplification method on the accuracy of the bridge fracture instance segmentation model based on the invention. The detailed experimental operating procedures are as follows: in the first step, 2000 collected original bridge crack images are used for training a bridge crack example segmentation model built by the invention, and the model is divided according to a certain proportion before training. In the experiment, the data set is finally divided into 1500 training data, 300 testing data and 200 verification data; secondly, 20000 original bridge crack images selected from the beginning of the invention and training samples mixed with the expanded bridge crack images are adopted to carry out iterative training on the bridge crack example segmentation model constructed by the invention for the same times; and thirdly, respectively carrying out result testing on the models after the two times of training, wherein the results are shown in fig. 10, and the result obtained from the experiment results is that the accuracy of the example segmentation result after the extended data set trains the models of the invention is greatly improved, and the output of the positioning frame of the crack and the generation quality of the crack mask are obviously improved. Based on the conclusion, the quality of the final detection effect of the bridge crack example segmentation model on the premise of reasonable construction depends on the number of training samples to a great extent.

The third group of experiments are used for verifying the detection effect of the bridge crack instance segmentation model on the multiple cracks, and the result is shown in fig. 11. Therefore, in order to improve the detection efficiency of the bridge crack, the image splicing technology (the prior art) is adopted, a plurality of images to be detected are combined into one large image and then are detected by the method, the experimental result is shown in fig. 12, the crack detection effect can be found to be still accurate, and the time for detecting the spliced image is the same as the time for detecting the single image which is not spliced. The method has the advantages that the detection of a plurality of bridge crack images can be completed within the time of detecting one bridge crack image, the bridge crack example segmentation model is combined with the image splicing technology to greatly accelerate the detection efficiency of the bridge crack, and in addition, the collection of the bridge crack image is local, so that after the bridge crack image is processed by the image splicing technology, the more complete crack condition can be seen through the detection of the method, and the complete analysis on the actual condition of the crack is facilitated. The invention provides a bridge crack example segmentation method based on Faster R-CNN, which finally realizes the accurate extraction and positioning of bridge cracks by improving the traditional fast R-CNN technology, not only can accurately frame the crack position on an original image, but also can generate high-quality segmentation masks in the original image according to the real form of the cracks, has the detection effects of target detection and semantic segmentation, can obtain the complete crack condition by combining the image splicing technology, the generated bridge crack mask can be used for evaluating the damage degree of the bridge and further quantifying the information of the extracted cracks, therefore, a reliable reference data index can be provided for maintenance and management of the bridge, defects can be found and processed in time, potential safety hazards are eliminated, and safe and stable operation of the bridge is guaranteed.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. The bridge crack example segmentation method based on Faster R-CNN is characterized by comprising the following steps:

step one, constructing a bridge crack data set

step two, marking training samples

Labeling the training set samples divided in the first step;

1) Adding crack mask branches;

2) replacing region of interest pooling with region of interest alignment;

3) Addition of predictive crack mask intersection score

Step four, training the example segmentation model built in the step three

step five, testing the example segmentation model trained in the step four

step six, actual detection

2. The method for bridge fracture instance segmentation based on Faster R-CNN as claimed in claim 1, wherein: the specific process of step three 1) is as follows:

A corresponding soft mask represented by floating point numbers is then generated and paralleled with the branch used for fracture classification and the bounding box regression branch in the Faster R-CNN model.

3. The method for bridge fracture instance segmentation based on Faster R-CNN as claimed in claim 1, wherein: the specific process of step three, step 2) is as follows:

4. The method for bridge fracture instance segmentation based on Faster R-CNN as claimed in claim 1, wherein: the specific process of step three, step 3) is as follows:

5. The method for bridge fracture instance segmentation based on Faster R-CNN as claimed in claim 1, wherein: the general loss function formula of the bridge crack example segmentation model in the third step is as follows:

L＝L_cls+L_box+L_mask(1)

6. The method for bridge fracture instance segmentation based on Faster R-CNN as claimed in claim 1, wherein: the training process of the fourth step is specifically as follows:

7. The method for bridge fracture instance segmentation based on Faster R-CNN as claimed in claim 1, wherein: the division ratio of the data set in the first step is a training set: and (3) test set: the validation set is 10:1: 1.