CN111986170A

CN111986170A - Defect detection algorithm based on Mask R-CNN (deep neural network)

Info

Publication number: CN111986170A
Application number: CN202010821216.XA
Authority: CN
Inventors: 郭龙源; 段厚裕; 周武威; 欧先锋; 张国云; 吴健辉; 鲁敏; 滕书华
Original assignee: Hunan Visualtouring Information Technology Co ltd; Hunan Institute of Science and Technology
Current assignee: Hunan Visualtouring Information Technology Co ltd; Hunan Institute of Science and Technology
Priority date: 2020-08-14
Filing date: 2020-08-14
Publication date: 2020-11-24

Abstract

The invention discloses a defect detection algorithm based on a deep neural network Mask R-CNN, which belongs to the technical field of defect detection, and comprises the following specific steps: extracting features by using a ResNet 50-based Feature Pyramid Network (FPN), extracting a region of interest (ROI) of a defect region by using a region suggestion network (RPN) to obtain a corresponding anchor frame, predicting pixel categories inside the ROI by using a full convolution neural network (FCN) to realize defect segmentation, and finally realizing the prediction of the category to which each ROI belongs and the coordinates of the corresponding anchor frame through a full connection layer of the network, wherein the algorithm performs two-point improvement on the Feature Pyramid Network (FPN) in MaskR-CNN aiming at a magnetic tile surface defect detection scene: adding a C1 module in the FPN, and eliminating a pooling layer in a C1 module feature extraction layer; a CLAHE pre-processing module is added before the feature extraction layer of the FPN. Experimental results show that the algorithm has strong generalization capability and robustness, and can be used for accurately segmenting the defects of the magnetic tile images.

Description

Defect detection algorithm based on Mask R-CNN (deep neural network)

Technical Field

The invention relates to the technical field of defect detection, in particular to a defect detection algorithm based on a deep neural network Mask R-CNN.

Background

With the proposition of the strategy of 'Chinese manufacturing 2025' and 'industry 4.0', the traditional Chinese manufacturing industry faces huge challenges of industry transformation and industry upgrading, which promotes the development and wide application of products such as industrial robots, high-precision numerical control machines, new energy automobiles and the like, and also proposes higher performance indexes for the motor, and the surface quality of the magnetic shoe directly influences the service performance of the motor. The magnetic shoe surface defect detection technology has the advantages of high detection efficiency, low cost, high reliability and the like, has very important significance for the production of motors, and also has a promotion effect on the survival and development of enterprises.

The Mask R-CNN is a new convolutional neural network proposed by Hommin on the basis of the former Faster R-CNN, and realizes example segmentation. The method not only can effectively detect the target, but also can finish high-quality semantic segmentation on the target. The main idea is to add a branch on the basis of the original Faster R-CNN so as to realize the semantic segmentation of the target. The Mask R-CNN improves the feature extraction network by applying a Feature Pyramid Network (FPN), better solves the problem that semantic information of a feature extraction layer is seriously lost, and greatly improves the segmentation precision of small target defects. For the condition that the segmentation of the defect contour is unclear, the Mask R-CNN replaces the interest region pooling layer with an interest region alignment layer, namely, the spatial information on the characteristic diagram is further utilized through bilinear interpolation, so that the defect contour is predicted more accurately.

The magnetic tile image has the characteristics of uneven illumination, complex surface texture, low contrast and the like, and the defects in the magnetic tile image are difficult to accurately segment by using the traditional defect detection algorithm.

Disclosure of Invention

1. Technical problem to be solved

Aiming at the problems in the prior art, the invention aims to provide a defect detection algorithm based on a deep neural network Mask R-CNN, which can realize a magnetic shoe surface defect detection scene, and the algorithm performs two improvements on a characteristic pyramid network (FPN) in the Mask R-CNN: on one hand, a C1 module is added in the FPN, and a pooling layer in a C1 module feature extraction layer is eliminated; on the other hand, the CLAHE preprocessing module is added in front of the feature extraction layer of the FPN, and the result shows that the algorithm has strong generalization capability and robustness and can accurately segment the defects of the magnetic tile image.

2. Technical scheme

In order to solve the above problems, the present invention adopts the following technical solutions.

A defect detection algorithm based on a deep neural network Mask R-CNN comprises the following steps:

s1, extracting features by using a ResNet 50-based Feature Pyramid Network (FPN);

s2, extracting a region of interest (ROI) of the defect region by using a region suggestion network (RPN) so as to obtain a corresponding anchor frame;

s3, predicting pixel classes inside the ROI by utilizing a full convolution neural network (FCN) to realize defect segmentation;

and S4, finally, realizing the prediction of the category to which each ROI belongs and the corresponding anchor frame coordinate through a full connection layer of the network.

Further, the method of the different ROIs is defined as the following formula:

further, k is₀Is a standard value and is set to 4.

Furthermore, a CLAHE preprocessing module for limiting contrast self-adaptive histogram equalization is added in front of a feature extraction layer of the Mask R-CNN.

Furthermore, in the preprocessing, the CLAHE preprocessing module must perform contrast amplitude limiting on each small area, and the CLAHE overcomes the problem of excessive amplification noise of the AHE by limiting the contrast improvement degree of the AHE algorithm.

Further, the CLAHE pre-processing module amplifies the contrast around a given pixel by the slope of the transform function, and the slope is proportional to the slope of its cumulative histogram, specifically, the operation is to clip the histogram by a predetermined threshold, then limit the amplification by calculating the CDF, and finally decide the clipping limit of the histogram by the size of the neighborhood and the distribution of the histogram.

Further, the clipping and limiting part is averagely put into the spare part of the histogram, and the calculation formula is as follows:

furthermore, the Loss function Loss of Mask R-CNN is composed of three parts,

Loss＝L_cls+L_box+L_mask#(1)。

furthermore, a C1 module is added to the FPN, and the pooling layer in the C1 module is eliminated, the feature extraction layer structure of the C1 module includes down-top, top-down and lateralconn, the down-top corresponds to the Residual Block structure in Resnet50, the Scale of each layer is reduced by 2 times, the top-down performs 2 times up-sampling on the feature map with low resolution of the higher layer, the lateralconn performs 1x1 convolution to reduce the number of the feature layers of C1 and ensure the size of the feature layers to be unchanged, and then the feature maps are directly added with the feature map subjected to top-sampling by C2 and output, and the operation is iterated until the feature map with the final resolution is generated.

Further, the defect detection algorithm is applied to magnetic tile defect detection.

3. Advantageous effects

Compared with the prior art, the invention has the advantages that:

the scheme can realize a magnetic shoe surface defect detection scene, and the algorithm performs two improvements on a characteristic pyramid network (FPN) in Mask R-CNN: on one hand, a C1 module is added in the FPN, and a pooling layer in a C1 module feature extraction layer is eliminated; on the other hand, the CLAHE preprocessing module is added in front of the feature extraction layer of the FPN, and the result shows that the algorithm has strong generalization capability and robustness and can accurately segment the defects of the magnetic tile image.

Drawings

FIG. 1 is a flow chart of the algorithm of the present invention;

FIG. 2 is a diagram of the FPN architecture after the improvement of the present invention;

FIG. 3 is a block diagram of a feature extraction layer of the module C1 according to the present invention;

FIG. 4 is a schematic diagram of the process of CLAHE direct cropping reallocation according to the present invention;

FIG. 5 is a diagram illustrating the effects of the CLAHE pre-processing module according to the present invention;

FIG. 6 is a comparison of the detection results of different algorithms;

FIG. 7 is a comparison table of the results of Mask R-CNN and the improved algorithm of the present invention;

FIG. 8 is a table comparing the present invention with a conventional detection algorithm.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention; it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and all other embodiments obtained by those skilled in the art without any inventive work are within the scope of the present invention.

In the description of the present invention, it should be noted that the terms "upper", "lower", "inner", "outer", "top/bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplification of description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "disposed," "sleeved/connected," "connected," and the like are to be construed broadly, e.g., "connected," which may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Example 1:

referring to fig. 1-2, a defect detection algorithm based on deep neural network Mask R-CNN includes the following steps:

The Loss function Loss of Mask R-CNN consists of three parts,

Loss＝L_cls+L_box+L_mask#(1)。

wherein L is_clsTo classify the loss, L_boxError of regression in bounding box, L_maskIs the fractional loss of the branch FCN.

The FPN classifies the layer without changing the feature map size in the Resnet50 network into a stage, namely Conv2, Conv3, Conv4 and Conv5 in the Resnet50 structure correspond to stage2, stage3, stage4 and stage5 in the FPN. The lower stage and the upper stage are fused to form a corresponding P2-P5 structure. The fused structure is convolved to eliminate the aliasing effect of image fusion, and P5 is separately downsampled by a factor of 0.5 to form a P6 structure. For the whole FPN structure, five outputs P2-P5 are shared, and the corresponding target sizes on feature maps of different outputs are different, so that the corresponding target sizes on the feature maps of the five outputs P2-P6 are different, so that the P2-P6 are used as the input of the RPN, different feature layers corresponding to different ROIs are required to be determined, a large-size ROI uses a high-layer feature layer, such as P4, and a small-size ROI uses a low-layer feature layer, such as P3. The method of different ROIs is defined as the following formula:

in equation (2), 244 is a standard input for ImageNet image classification, k0 is a standard value set to 4, and represents an output of P4 layers (the original size is P4 layers), w and h are the length and width of the ROI region, and if the ROI is 112 × 112, k is k ═ k₀-1-4-1-3 means that the ROI should use the feature layer of P3, with the k value being rounded.

Referring to fig. 1, although the FPN achieves better prediction effect by fusing features of the low-level and high-level feature maps, the magnetic shoe has complex texture features and low defect imaging contrast, and a CLAHE preprocessing module for limiting contrast adaptive histogram equalization is added before the feature extraction layer of the Mask R-CNN.

The CLAHE preprocessing module is realized by contrast clipping for adaptive histogram equalization. The characteristic can be applied to global histogram equalization, and on the basis, the contrast-limited histogram equalization is formed, a CLAHE preprocessing module is used for carrying out contrast amplitude limiting on each small area in preprocessing, and the CLAHE overcomes the problem of excessive amplification noise of the AHE by limiting the contrast improvement degree of an AHE algorithm.

Referring to fig. 4, the CLAHE pre-processing module amplifies the contrast around a given pixel by transforming the slope of the function, where the slope is proportional to the slope of its cumulative histogram, specifically, the operation is to clip the histogram by using a predetermined threshold, then limit the amplification by calculating the CDF, and finally decide the clipping limit of the histogram by the size of the neighborhood and the distribution of the histogram.

Clipping the clipped part is averagely put into the spare part of the histogram, and the calculation formula is as follows:

in the present invention, N is the total number of pixels (each sub-block), L is the number of gray levels (each sub-block), α is a truncation coefficient, α is 75, u is the gray average (each sub-block), q is the mean square error (each sub-block), and S is the mean square error (each sub-block) in the formula (3)_maxThe maximum allowable slope is the value of beta given by S_maxAnd the value of alpha is determined together, and beta is the shearing upper limit (each sub-histogram).

The effect diagram of the magnetic shoe image after passing through the preprocessing module is shown in fig. 5, and the right diagram in fig. 5 can clearly see that the background brightness of the magnetic shoe is greatly enhanced, and relatively speaking, the defect characteristics are also enhanced, so that the extraction capability of the Mask R-CNN characteristic extraction layer is further improved.

In the process of extracting features, the size of a feature map is gradually reduced by the convolutional neural network, so that some small target defect information is lost, and therefore, the detection of small defects in a prediction image is not ideal. In order to improve the detection rate of the small target defect, a low-level high-resolution feature map with strong semantic information is required to be utilized, the FPN utilizes the calculated features of different scales of the convolutional layer as output, the high-level low-resolution feature map and the low-level high-resolution feature map are fused, and the small target defect detection performance can be greatly improved under the condition that the original model calculation amount is not increased.

The invention adds the C1 module in the FPN because the size of the adopted data set picture is smaller, thus obtaining more detailed characteristics in a low-level high-resolution picture. Meanwhile, because the picture size is small and too many pooling layers are not needed, the pooling layer in the C1 module is eliminated, the feature extraction layer structure of the C1 module comprises a down-top, a top-down and a lateralconn, the down-top corresponds to a Residual Block structure in Resnet50, the Scale of each layer is reduced by 2 times, the top-down samples a feature map with a high-layer low resolution by 2 times, the lateralconn reduces the number of the feature layers of C1 by 1x1 convolution, the size of the feature layers is ensured to be unchanged, and then the feature maps are directly added with the feature map sampled by the top-down of C2 and output, and the operation is iterated until a feature map with the final resolution is generated.

In order to train a Mask R-CNN network, Labelme is adopted to make a corresponding label for each magnetic tile image, a gap defect label of the magnetic tile in the original image is gap, and a crack defect label is crack. In order to ensure the integrity of the data set, information such as the area of the defect, the coordinate of the bounding box and the like needs to be marked. 1200 magnetic shoe samples collected on site are adopted in the experimental process, wherein 600 defect samples and 600 good product samples are adopted, the 600 defect samples comprise defects such as notches and cracks of the magnetic shoes, and the defects are distributed uniformly. Firstly, the method randomly divides the defect samples into a training set consisting of 400 defect samples and 400 good samples, and the rest form a test set in the text. Meanwhile, the data volume of the invention is less, and the data set must be expanded. According to analysis, 6 kinds of data augmentation such as vertical, horizontal and diagonal mirror image, 90 degrees, 180 degrees and 270 degrees of rotation are carried out on a data set, training set data are augmented into 2400 defect samples and 2400 good product samples, and a test set is augmented into a mixed sample composed of 1200 defect samples and 1200 good product samples and serves as a new test set.

We do the following definitions: the defective magnetic shoe is detected as positiveThe detection of the normal magnetic shoe is called as the over-detection. In order to quantitatively evaluate the performance of a magnetic shoe defect detection algorithm, three indexes of a correct rate, a false detection rate and a missing detection rate are adopted for evaluation. Let R_CDIndicates the detection accuracy of the magnetic shoe, R_FARepresents the magnetic shoe false detection rate, R_BAIndicates the missing rate of the magnetic shoe, N_CRepresents the number of actual missed samples of the magnetic shoe, N_ANumber of qualified magnetic shoes, N_BIndicates the number of defective magnetic shoes, N_DAnd the actual false detection number of the magnetic shoes is represented. Then R is_CD、R_FA、R_BACan be calculated by the equations (5), (6) and (7).

As can be seen from Table 1, the Mask R-CNN improved aiming at the magnetic shoe surface defect scene reduces the missing rate and the false rate of the magnetic shoe defect detection compared with the original Mask R-CNN, namely improves the accuracy.

In the aspect of magnetic shoe surface defect detection, the comparison between the algorithm of the invention and several commonly used detection algorithms is shown in fig. 6-7, and compared with the commonly used detection algorithms, the detection performance of the algorithm of the invention is obviously improved.

Referring to fig. 8, the detection accuracy of the fourier transform method is relatively high, but because the fourier transform calculation is complex, and different filter templates need to be designed for different defect types to improve the detection effect, the fourier transform method is relatively complicated for industrial detection and is not suitable for surface defect detection of multiple defect types; for the FCN, due to the fact that few magnetic tiles have small defects, targets are lost after the FCN is subjected to convolution layer, and therefore various evaluation indexes of the FCN are low; although the PspNet improves the structure of feature extraction on the basis of FCN, compared with the FPN of the invention, the false detection rate and the missing detection rate of the PspNet are lower than those of Mask R-CNN because the feature extraction of a small defect target is inferior to that of the FPN; the method comprises the steps that feature extraction of different scales is carried out on an input image by the Deeplabv3 to obtain input with different resolutions, then the image with different scales is placed in the CNN to obtain segmentation results with different resolutions, finally the segmentation results with different resolutions are subjected to image fusion to obtain the segmentation result of the resolution of an original image, although feature extraction of small targets and low contrast defects is well solved by the aid of the Deeplabv3 by using the images with different scales, compared with the method that RPN is firstly used for carrying out target region candidate by the Mask R-CNN and then segmentation is carried out on a candidate region, both the false detection rate and the false detection rate are low.

The comparison of the detection effect graphs of different algorithms is shown in fig. 6, wherein for high-contrast defects, the fourier transform method has a better detection effect, but for different defect types, different filters need to be used, and the application of the method in magnetic tile detection is limited due to the limitation of the generalization capability of the algorithm; FCN is in the defect detection, to the defect detection effect of small target and low contrast ratio relatively bad; the PspNet has good detection effect on low-contrast defects, but cannot effectively detect defect mask images on fuzzy boundary crack defects; deeplabv3 improved some on small targets, but did not effectively get the defect contours; the algorithm can effectively solve the defects of fuzzy defect edge outline and low contrast, can improve the detection effect of small target defects, has better generalization capability, and can better solve the detection of the surface defects of the magnetic tiles.

Experimental results show that the algorithm has strong generalization capability and robustness, and can be used for accurately segmenting the defects of the magnetic tile images.

The invention can realize a magnetic shoe surface defect detection scene, and the algorithm makes two improvements on a characteristic pyramid network (FPN) in the Mask R-CNN: on one hand, a C1 module is added in the FPN, and a pooling layer in a C1 module feature extraction layer is eliminated; on the other hand, the CLAHE preprocessing module is added in front of the feature extraction layer of the FPN, and the result shows that the algorithm has strong generalization capability and robustness and can accurately segment the defects of the magnetic tile image.

The above are merely preferred embodiments of the present invention; the scope of the invention is not limited thereto. Any person skilled in the art should be able to cover the technical scope of the present invention by equivalent or modified solutions and modifications within the technical scope of the present invention.

Claims

1. A defect detection algorithm based on a deep neural network Mask R-CNN is characterized in that: the method comprises the following steps:

2. The defect detection algorithm based on the deep neural network MaskR-CNN as claimed in claim 1, wherein: the method of the different ROIs is defined as the following formula:

3. the defect detection algorithm based on the deep neural network MaskR-CNN as claimed in claim 2, wherein: k is₀Is a standard value and is set to 4.

4. The defect detection algorithm based on the deep neural network MaskR-CNN as claimed in claim 1, wherein: a CLAHE preprocessing module for limiting contrast self-adaptive histogram equalization is added in front of the feature extraction layer of the Mask R-CNN.

5. The defect detection algorithm based on the deep neural network MaskR-CNN as claimed in claim 4, wherein: in the preprocessing, the CLAHE preprocessing module must perform contrast amplitude limiting on each small area, and the CLAHE overcomes the problem of excessive amplification noise of the AHE by limiting the contrast improvement degree of the AHE algorithm.

6. The defect detection algorithm based on the deep neural network MaskR-CNN as claimed in claim 5, wherein: the CLAHE preprocessing module amplifies the contrast around a given pixel by the slope of a transformation function, the slope is in direct proportion to the slope of the cumulative histogram, the specific operation is that a preset threshold value is used for clipping the histogram, then the amplification limit is limited by calculating CDF, and finally the clipping limit of the histogram is determined by the size of the neighborhood and the distribution of the histogram.

7. The defect detection algorithm based on the deep neural network MaskR-CNN as claimed in claim 6, wherein: the part of the clipping amplitude limit is averagely put into the spare part of the histogram, and the calculation formula is as follows:

8. the defect detection algorithm based on the deep neural network MaskR-CNN as claimed in claim 1, wherein: the Loss function Loss of Mask R-CNN consists of three parts,

Loss＝L_cls+L_box+L_mask#(1)。

9. the defect detection algorithm based on the deep neural network MaskR-CNN as claimed in claim 1, wherein: the FPN is added with a C1 module, a pooling layer in a C1 module is eliminated, a feature extraction layer structure of the C1 module comprises a down-top, a top-down and a lateralconn, the down-top corresponds to a Residual Block structure in Resnet50, the Scale of each layer is reduced by 2 times, the top-down samples a feature map with a high-layer low resolution by 2 times, the lateral conn reduces the number of the feature layers of C1 by 1x1 convolution, the size of the feature layers is ensured to be unchanged, and then the feature maps are directly added with a feature map of C2 which is sampled by top-down and output, and the operation is iterated until a feature map with a final resolution is generated.

10. The defect detection algorithm based on the deep neural network Mask R-CNN as claimed in claims 1-9, wherein: the defect detection algorithm is applied to magnetic shoe defect detection.