CN110992311B

CN110992311B - Convolutional neural network flaw detection method based on feature fusion

Info

Publication number: CN110992311B
Application number: CN201911104107.XA
Authority: CN
Inventors: 许玉格; 钟铭; 吴宗泽
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2019-11-13
Filing date: 2019-11-13
Publication date: 2023-04-28
Anticipated expiration: 2039-11-13
Also published as: CN110992311A

Abstract

The invention discloses a convolutional neural network flaw detection method based on feature fusion, which comprises the following steps: 1) Preprocessing a data set; 2) Resize, padding and normalizing the pictures input into the model convolutional network; 3) Inputting the flaw pictures and the template pictures into a Resnet101 convolutional network for feature extraction, and respectively constructing FPN networks of the flaw pictures and the template pictures; 4) Overlapping channels of corresponding feature layers in the FPN network of the flaw picture and the template picture, and fusing in a convolution mode; 5) Extracting a preliminary candidate region based on the fused characteristic layer, and performing ROI pooling operation; 6) Cascading a plurality of ROI pooling layers and classification and regression layers to form a Cascade R-CNN network, and classifying and regressing the input candidate region; 7) Selecting an optimizer, and training a model; 8) And inputting the picture to be predicted into the trained model, and outputting a flaw detection result. The invention can improve the accuracy of classifying flaws and mAP values in the flaw detection process.

Description

Convolutional neural network flaw detection method based on feature fusion

Technical Field

The invention relates to the technical field of flaw detection, in particular to a convolutional neural network flaw detection method based on feature fusion.

Background

In the industrial manufacturing industry, flaw detection on products is a key issue affecting product quality. The industrial manufacturing process is a complex and multivariable process, and various flaws are easily caused on the product due to equipment failure or interference of human factors in the manufacturing process and the transportation process of the product, so that the quality of the product is affected. Whereas flaw detection may translate into classification and localization problems in one defect detection. Different flaws are generated during the production and transportation of the product due to the influence of various factors, and the shape, the size, the number and the like of the flaws are irregular. This results in product flaws that are not only highly unbalanced in number distribution, but also present significant difficulties in flaw detection in terms of flaw size and shape.

Traditional machine learning algorithms are often based on image processing and pattern recognition techniques, and perform analysis by extracting the power spectral density of the product surface texture features to complete detection of product flaws. The detection scheme is too dependent on priori knowledge, the accuracy of the detection result is not high, and the positions of flaws in the product cannot be positioned. The accuracy of flaw detection in an actual industrial application scene is a key index, namely, in the flaw detection scene, the recall rate of an algorithm is often seen more, a sample containing flaws is accurately identified, and information such as the number, the type and the size of the flaw sample is analyzed, so that the method is more beneficial to equipment adjustment in industrial production and improves the production process. Therefore, attention should be paid to research on a detection method for improving information such as accuracy of flaw identification and position distribution of flaws.

Disclosure of Invention

The invention aims to overcome the defects and shortcomings of the prior art, and provides a convolutional neural network flaw detection method based on feature fusion.

In order to achieve the above purpose, the technical scheme provided by the invention is as follows: a convolutional neural network flaw detection method based on feature fusion comprises the following steps:

1) Preprocessing data: for initial sample picture X _defect Cutting and turning to obtain data sets X _{defect_croped} And X _{defeect_fliped} Simultaneously obtaining a flaw label file after corresponding cutting and overturning; the pre-treatmentProcessing is only carried out on the flaw pictures, and the template pictures are not preprocessed;

2) For pictures input into convolutional network, including X _{defeect_fliped} Resize, padding and normalizing operations are carried out on the flaw image and the template image, meanwhile, the scaling ratio of the flaw label combined with the image is scaled, when the image is subjected to padding, parameters of the flaw label combined with the padding are adjusted, and when the image is subjected to normalization processing, specific processing is not carried out on the flaw label;

3) Inputting the flaw pictures and the template pictures into a Resnet101 convolutional network for feature extraction, selecting specific 4 layers in the extracted feature pictures for construction of a feature pyramid network (Feature Pyramid Network, FPN for short), and respectively constructing FPN networks of the flaw pictures and the template pictures;

4) Superposing channels of corresponding feature layers in the FPN network of the flaw picture and the FPN network of the template picture, and then fusing in a convolution mode to obtain fused feature layers;

5) Performing preliminary candidate region extraction based on the fused feature layers, wherein the extraction comprises classification and regression operations, and performing ROI (Region of Interest pooling, ROI pooling for short) operation on the extracted candidate region;

6) Cascading a plurality of ROI mapping layers and classification and regression layers to form Cascade R-CNN, and further classifying and regressing the input candidate region;

7) Selecting an optimizer, setting parameters of the optimizer and the iteration times T, and training the model; in the training process, updating the model weight once for each batch until iteration is completed, and obtaining a final weight file;

8) And inputting the picture to be predicted into the trained model, wherein the output result is the flaw type, the category confidence coefficient and the position of the flaw frame in the input picture.

In step 2), the processing of the defective label is specifically: when the picture is scaled by restore, the same scale scaling is carried out on the flaw label, so that the relative position of flaws in the picture is ensured to be unchanged,the size of the picture is increased when the padding processing is performed, and the parameters of padding (P _w ,P _h ) Performing coordinate transformation, wherein P _w For the increased size of the left and right sides of the picture during padding, P _h For the increased size of the upper and lower sides of the picture at the time of padding, the defective label before conversion is expressed as (x, y, d, h), and the defective label after conversion is expressed as (x _new ,y _new ,d _new ,h _new ) Wherein x and y respectively represent the left upper corner coordinates of the flaw frame, d and h represent the width and the height of the flaw frame, and the coordinate transformation rule is as follows:

said step 3) comprises the steps of:

3.1 Inputting the flaw picture and the template picture after pretreatment into a Resnet101 network respectively, extracting features, and selecting specific 4 feature layers to be used for constructing an FPN network;

3.2 The construction method of the FPN network comprises the steps of: respectively carrying out 1X 1 convolution on the 4 selected characteristic layers to obtain C2, C3, C4 and C5 layers with 256 channels and consistent characteristic diagram sizes and input characteristic diagram sizes;

for the C5 layer, 3X3 convolution is directly carried out on the C5 layer, so that a P5 layer with the same size and channel number as the input characteristic layer is obtained;

for the C4 layer, carrying out 3X3 convolution on the C4 layer to obtain an output characteristic layer with the same size and channel number as the C4 layer, and adding a characteristic layer after double up-sampling on the P5 layer to obtain the P4 layer;

for the C3 layer, carrying out 3X3 convolution on the C3 layer to obtain an output characteristic layer with the same size and channel number as the input layer, and adding a characteristic layer after double up-sampling on the P4 layer to obtain the P3 layer;

for the C2 layer, carrying out 3X3 convolution on the C2 layer to obtain an output characteristic layer with the same size and channel number as the C2 layer, and adding a characteristic layer after double up-sampling on the P3 layer to obtain the P2 layer;

directly carrying out 3×3 convolution on an original feature layer corresponding to the C5 layer to obtain a P6 layer with unchanged size and channel number;

3.3 The corresponding feature layers output after the template picture and the flaw picture pass through the FPN network are respectively expressed as P _{defect_i} And P _{template_i} I= {2,3,4,5,6}, where i represents the number of layers in the FPN network;

3.4 Input template feature map and flaw feature map, P output to FPN network _{defect_i} And P _{template_i} I= {2,3,4,5}, taking the corresponding feature layers, superposing the channels, fusing by adopting 1×1 convolution, keeping the size unchanged after fusing, halving the number of channels, and obtaining the fused feature layer P _{fused_i} ；

3.5 For P) _{defect_6} And P _{template_6} To ensure that enough high semantic flaw feature information is contained when candidate frame extraction is carried out later, P is not replaced _{defect_6} And P _{template_6} Fusing and directly taking P _{defect_6} Is P _{fused_6} 。

In step 5), the method for extracting the candidate region and the corresponding classification and regression operations thereof comprise the following steps:

5.1 For each position in the feature map, n×m flaw candidate frames are considered for ensuring adaptability to flaws because of extremely irregular flaw shapes, wherein n epsilon {1,2,3} represents the size class number of the flaw candidate frames, m epsilon {1,2,3} represents the aspect ratio class number of the flaw candidate frames, and the values of n and m are determined according to the actual distribution of flaws in the data set;

5.2 For all generated candidate frames, according to the relation between the Intersection-Over-Union (IOU) value of the candidate frame and the real flaws in the flaw labels corresponding to the pictures and the set IOU threshold value, the value range of the IOU threshold value is [0.3,0.8], and the specific value should be set in the experiment;

5.3 Regression is carried out on the candidate frames, the positions of the candidate frames and the true flaw frames are further fitted, and a loss function in the regression process is defined as follows:

the optimization objective of the function is:

in phi ₅ (P _i ) Is a feature vector composed of feature images of corresponding candidate frames, P _i Is a feature map of the corresponding candidate box,

is a parameter to be learned->

Transpose of w _* Is->

Optimized parameters,/->

Is a true flaw label, N is the flaw number, and lambda is a regular term;

5.4 Mapping the regressed pre-candidate frames back to the original image, judging whether the pre-candidate frames are out of the boundary in a large range or the area is smaller than a threshold value, eliminating the pre-candidate frames which do not meet the requirement, sorting from large to small according to the confidence level output by the softmax function, and extracting the previous L pre-candidate frames, wherein the value range of L is [1000,5000];

5.5 Non-maximum value suppression is carried out on the extracted pre-candidate frames, the candidate frames after non-maximum value suppression are sequenced, the first V outputs are selected, and the V value range is 100, 1000;

in step 6), a Cascade R-CNN network is formed by cascading an ROI pooling layer and a classification regression layer, and the method comprises the following steps:

6.1 Setting a new IOU threshold value, wherein the threshold value is larger than the IOU threshold value used in the previous round, setting the threshold value in the step to be better in the range of [0.6,0.8] according to experiments, classifying the input candidate frames into foreground and background according to the threshold value, carrying out further regression on the candidate frames, fitting the positions of the candidate frames and the true flaws in the flaw labels, and carrying out non-maximum suppression and then sequencing output;

6.2 Setting a new IOU threshold value for further improving the precision of the flaw frame, setting the value range of the threshold value to be better when [0.7,0.8] according to experiments, classifying the input candidate frame into a foreground and a background according to the threshold value, further regressing the candidate frame, fitting the candidate frame and the position of a real flaw in a flaw label, and outputting a classification result obtained after non-maximum suppression sequencing, namely the type of the flaw contained in the picture, wherein the regressing result is the flaw position contained in the picture.

In step 7), the optimization method selected by the model is random gradient descent, the learning rate setting in the experiment should be determined according to the number of pictures image_num trained by each video card and the number of used video cards GPU_num, the experiment shows that setting the learning rate to 0.00125 x image_num x GPU_num can obtain better effect, in order to ensure that the gradient descent reaches the optimal point and prevent the overfitting phenomenon in the model training process, the iteration times T should not be too large, and the experiment shows that the T value should be more than 10 and less than 100.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. according to the invention, a network structure based on Cascade R-CNN+Resnet101+FPN is used as a basic frame of the model, and the flaw picture is cut and rotated before training, so that the data sample size is increased.

2. According to the invention, the Cascade R-CNN is adopted to extract the candidate region for multiple times, so that the optimization of the candidate region in stages is realized, and meanwhile, the over fitting phenomenon of the candidate region in the training process can be prevented because the IOU threshold value selected in the Cascade structure is gradually increased.

3. According to the invention, the template picture is utilized in the training process, and more semantic information in the flaw picture can be obtained by combining the template picture and the flaw picture to perform feature extraction.

4. According to the invention, the characteristics of the template picture and the characteristics of the flaw picture are selectively fused in the FPN network, the fused layer can enhance the expression capability of the characteristic picture under the same dimension, and the high semantic characteristics extracted by the original characteristic layer are reserved by the layer which is not fused, so that the identification of flaws is facilitated.

5. When the template picture and the flaw picture are fused, the channel superposition is performed in a 1X 1 convolution mode, so that the proportion of the template picture characteristic and the flaw picture characteristic in the fusion can be adjusted in a network self-adaptive mode.

6. The top-down structure of the feature pyramid network adopted by the invention continuously brings the features with strong semantic information of the high layer to the bottom layer, and most background candidate areas can be filtered out by combining the high-layer semantic information extracted from the template picture, and meanwhile, the detection capability of a small target is improved.

Drawings

FIG. 1 is a schematic block diagram of the method of the present invention. In the figure, conv1, conv2, conv3, conv4 and conv5 represent picture feature extraction networks in models, 1×1conv represents convolution layers with convolution kernels of 1×1, C2, C4 and C5 represent feature graphs in the FPN network, 2×up represents up-sampling with sampling rate of 2,3×3conv represents convolution layers with convolution kernels of 3×3, P2, P3, P4, P5 and P6 represent feature layers after feature fusion in the FPN network, cls, C1, C2 and C3 represent classification networks, reg represents regression networks, pool represents pooling layers, H1, H2 and H3 represent convolution network modules, and B1, B2 and B3 represent candidate boxes of flaws extracted at each stage of the cascade network.

FIG. 2 is a flow chart of an example of the implementation of the method of the present invention.

Detailed Description

For more clearly describing the objects, technical solutions and advantages of the embodiments of the present invention, the technical solutions of the embodiments of the present invention will be fully described below with reference to the accompanying drawings in the embodiments of the present invention. It should be noted that this embodiment is only a part of embodiments of the present invention, and not all embodiments. All other embodiments based on the embodiments of the invention, which a person of ordinary skill in the art would obtain without inventive faculty, are within the scope of the invention.

"2019 Guangdong industrial intellectual innovation major" held in the Ariyaku is used herein to provide cloth data sets as experimental data sets. Cloth picture data come from textile factories, wherein the data set contains 4351 flaw pictures, 68 corresponding template pictures, 15 flaws are contained in the 4351 pictures, and the category names of the flaws are as follows: stain, staggering, watermarking, hair, stitch mark, insect sticking, hole, fold, weaving defect, leakage, wax spot, color difference, net folding, and other 15 defects are unevenly distributed in each defect picture.

The resolution ratio of the original flaw picture and the template picture can reach 4096 x 1080 at maximum, a display card adopted in the experimental implementation process is English to 1080Ti, and a display memory is 11G, and the display card cannot input full resolution pictures for experiments, so that the pictures need to be preprocessed. Firstly, cutting the pictures according to a mode of 2x2, namely, the resolution of each cut picture is 2048 x 905, and then, in order to increase the training sample size, overturning the cut pictures, wherein the overturning is carried out in the horizontal, vertical and horizontal and vertical directions.

When the model is evaluated, the overall performance of the model cannot be comprehensively reflected by simply using the defect picture identification accuracy. The central idea of the average accuracy of performance indexes (Average Precision, abbreviated as AP) is to measure the accuracy of prediction of the learned model on each category, mAP (mean Average Precision, abbreviated as mAP) is the average value of the AP, that is, mAP is the average value of the accuracy of prediction of the learned model on all categories, and when a detection task is carried out, the mAP value and the accuracy Acc are used as the final evaluation index of the learned model.

The defect detection method of the invention is called TFF-Cascade R-CNN (Template Features Fused-Cascade R-CNN, cascade convolution neural network based on template feature fusion), which classifies and regresses defects in cloth pictures by taking Cascade R-CNN+Resnet101+FPN as an integral framework of a model, and the specific implementation process of the TFF-Cascade R-CNN in the embodiment is shown in figures 1 and 2, and comprises the following steps:

1) Preprocessing a picture in an original data set, including:

1.1 For each defective picture x _{defect_i} (i= {1, 2., N }, N is the number of defective pictures) by 2×2 uniform cutting, and x is obtained after cutting is completed _{defect_i_1} ,x _{defect_i_2} ,x _{defect_i_3} ,x _{defect_i_4} Picture x of flaw _{defect_i} Corresponding defective label y _{defect_i} Cutting also into 4 parts y _{defect_i_1} ,y _{defect_i_2} ,y _{defect_i_3} ,y _{defect_i_4} Each part corresponds to x _{defect_i_1} ,x _{defect_i_2} ,x _{defect_i_3} ,x _{defect_i_4} ；

1.2 For the data set X obtained after cutting _{defect_croped} Each picture X in (a) _{defect_croped_i} (i= {1, 2., 4N }, N is the number of defective pictures) performing inversion in horizontal, vertical and horizontal-vertical directions to obtain corresponding X _{defect_croped_i_H} ,X _{defect_croped_i_V} ,X _{defect_croped_i_HV} And processing the defective label in the horizontal, vertical and horizontal vertical directions to obtain a corresponding y _{defect_croped_i_H} ,y _{defect_croped_i_V} ,y _{defect_croped_i_HV} 。

2) Inputting the preprocessed flaw picture and template picture into the model, and before the picture enters the convolutional neural network for feature extraction, performing related operations such as size resetting and pixel filling on the picture, ensuring that the picture input into the convolutional network has the same size, and facilitating the model to learn, wherein the specific process of the step is as follows:

2.1 Resizing the input picture to a resolution closest in scale (2048,905) while maintaining the original aspect ratio;

2.2 Normalized pixel value of the input picture, the specific steps of normalization are as follows:

2.2.1 Randomly sampling pictures in a training set, wherein the sample capacity is 800, and the sample set is X _norm ；

2.2.2 At each sampleA part with the size of 32 multiplied by 32 is randomly cut out in the picture, and the mean value mean of the part on the 3 channels of R, G and B is calculated _i Sum of variances std _i Where i= {1,2, … …, N }, N is the number of input pictures;

2.2.3 Calculating a sample set X _norm The average value of the mean and the variance on the 3 channels of R, G and B is mean and std, the obtained mean and variance are utilized to normalize the picture, and the normalized formula is that

/>

Wherein X is _original Representing a matrix of input images, X _norm Representing the normalized image matrix, adjusted_std is the adjusted variance, which is

Wherein N is the number of input pictures;

2.3 For the picture with resolution less than (2048,905) after the size reset, performing pixel filling operation, and filling the picture with a filling value of 0 when the pixel is filled, wherein the length and the width are all multiples of 32, and the filling value is the filling value of 0;

2.4 For processing the defective label, the input defective label should be converted into a standard format, so that the extraction and utilization of the label inside the model are facilitated.

3) And inputting the pretreated flaw pictures and template pictures into a Resnet101 convolution layer, and extracting the characteristics. When the first training is performed, an initial pre-training model is loaded.

4) The method for extracting 4 output feature layers conv2, conv3, conv4 and conv5 in the ResNet101 convolution layer to construct the FPN network comprises the following specific construction steps:

4.1 After convolution with convolution kernel 1×1, padding 0, step size stride 1, the size of the feature map is kept unchanged, the number of channels is uniformly changed to 256, and the obtained feature layers are denoted as C2, C3, C4, and C5;

4.2 After the C5 layer is subjected to convolution with a convolution kernel of 3 multiplied by 3, a padding of 1 and a step size stride of 1, a P5 layer with the feature diagram size and the channel number kept unchanged is obtained;

after the C4 layer is subjected to convolution with a convolution kernel of 3 multiplied by 3, a packing of 1 and a stride of 1, adding a P5 layer subjected to double up-sampling to obtain a P4 layer with the feature diagram size and the channel number unchanged as well as the C4 layer;

after the C3 layer is subjected to convolution with a convolution kernel of 3 multiplied by 3, a packing of 1 and a stride of 1, adding a P4 layer subjected to double up-sampling to obtain a P3 layer with the feature diagram size and the channel number unchanged as well as the C3 layer;

and (3) after the C2 layer is subjected to convolution with a convolution kernel of 3 multiplied by 3, a packing of 1 and a stride of 1, adding the P3 layer subjected to double upsampling to obtain a P2 layer with the feature diagram size and the channel number unchanged as well as the C2 layer.

The conv5 layer is directly subjected to convolution with a convolution kernel of 3×3, padding of 1, stride of 1 to obtain the P6 layer. At this point, the required feature layer in the FPN network is obtained.

5) Conv for picture of flaw and template _{defect_i} ，conv _{template_i} I= {2,3,4,5} is input into FPN, P is obtained according to step 4) _{defect_i} ，P _{template_i} I= {2,3,4,5,6}, P is fused using the fusion method _defect ，P _template The fusion is carried out, and the fusion steps are as follows:

5.1 For P) _{defect_i} ，P _{template_i} I= {2,3,4,5}, will correspond to P _{defect_i} ，P _{template_i} Stacking on channels, i.e. original P _{defect_i} ，P _{template_i} The feature images are all 256 channels with the same size, and after the feature images are overlapped on the channels, the feature image P with the same size and 512 channels is obtained _concatedi ；

5.2 P) to be obtained _{concated_i} Performing convolution operation, wherein the convolution parameters are set as follows: the convolution kernel is 1 multiplied by 1, the padding is 0, the stride is 1, the output channel is 256, and the characteristic layer after convolution fusion is P _{fused_i} ，i＝{2,3,4,5}；

5.3)P _{fused_6} Without making a channelIs directly added with P by superposition and convolution fusion operation _{defect_6} As P _{fused_6} Ensuring the proportion of flaw features in the subsequent candidate region extraction process.

6) Each P _{fused_i} Input into the RPN (Region Proposal Network, candidate region generation network) network and generate the ROI (Region of Interest ) region, the specific steps of the process are as follows:

6.1 Layer P) of features _fused Each layer of the system corresponds to an RPN network, and the RPN network is utilized to input the characteristic layer P _{fused_i} Generating candidate frames, which specifically comprises the following steps:

6.1.1 For each location in the feature map. Through experiments, 9 candidate boxes were considered, of which there were 3 scales and 3 aspect ratios;

6.1.2 Dividing the candidate frame into a foreground and a background according to the relation between the IOU value of the real flaw in the flaw label corresponding to the picture and the set IOU threshold value, and setting the IOU threshold value to be 0.5 in the step;

6.1.3 Regression is carried out on the generated candidate frames, the positions of the candidate frames and the true flaw frames are further fitted, and a loss function in the regression process is defined as follows:

the optimization objective of the function is:

wherein phi is ₅ (P _i ) Is a feature vector composed of feature images of corresponding candidate frames, P _i Is a feature map of the corresponding candidate box,

is a parameter to be learned->

Transpose of w _* Is->

Optimized parameters,/->

Is a true flaw label, N is the flaw number, and lambda is a regular term;

6.1.4 Mapping the regressed pre-candidate frames back to the original image, judging whether the pre-candidate frames exceed the boundary in a large range or the area is smaller than a threshold value, eliminating pre-candidate areas which do not meet the requirement, sequencing from large to small according to a softmax function mode, and extracting the first L pre-candidate areas, wherein L is 2000;

6.1.5 NMS (Non Maximum Suppression, non-maximal suppression) the L pre-candidate regions, ordering the results after NMS, outputting the previous V candidate regions, where V is 256;

6.2 Generating candidate areas on the 5 feature layers, merging the generated candidate areas and then outputting the combined candidate areas as the whole RPN network;

6.3 In the ROI pooling layer, the inputted candidate region is pooled. Inputting a region with the size of m multiplied by n after projection of the original image, assuming that the output size is p multiplied by q, which satisfies the relation m < p, n < q, dividing the region with the size of m multiplied by n into p multiplied by q, and carrying out maximum value pooling operation on each divided region;

7) The ROI candidate region is input into a Cascade R-CNN network, and classification and regression accuracy is further improved in an ROI pooling layer and classification and regression layer of the network, and the specific operation of the process is as follows:

7.1 Setting a new IOU threshold value which is larger than the threshold value used in the previous round, setting the threshold value to be 0.6 in the step, classifying the input candidate frames into foreground and background according to the threshold value, further carrying out regression, fitting the positions of the candidate frames and the true flaws in the flaw labels, and outputting the final NMS after sequencing;

7.2 Setting a new IOU threshold value to be 0.7, classifying the input candidate frames into foreground and background according to the threshold value, further carrying out regression, fitting the positions of the candidate frames and the real labels in the flaw labels, and outputting the classification result obtained after NMS sequencing, namely the type of flaws contained in the picture, wherein the regression result is the flaw position contained in the picture;

according to the steps, a training model is constructed and experiments are carried out. The experimental environment is CPU: intel (R) Xron (R) E5-2650 v4, display card of nvidia1080ti, display memory of 11G and construction based on Pytorch1.1 under Linux platform. In the model training process, a single Zhang Xianka is used for training, an optimizer of the model is set to be SGD, the learning rate is 0.00125, the impulse is 0.9, the batch in training is 1, the iteration number is 12, after training is completed, testing is performed on a divided test set, the sample capacity in the test set is 500, the number of flaw pictures is 450, the number of non-flaw pictures is 50, after testing is completed, the accuracy Acc is 89.10%, mAP is 41.80, compared with a model without FPN network fusion in a network structure, the accuracy Acc is 81.14%, and mAP is 38.91, and the classification accuracy Acc only reaches 79% by adopting a traditional machine learning-based method. Therefore, the method of the invention has an improvement effect on the defect detection problem.

In summary, the invention focuses on researching a convolutional neural network flaw detection algorithm based on feature fusion aiming at the flaw detection problem. According to the method, cascade R-CNN+Resnet101+FPN is used as an overall framework of an identification model, and a fused feature layer is constructed in an FPN network by combining acquired product template pictures, so that flaw classification and positioning are performed based on the fused feature layer. According to the method, on one hand, the classification accuracy of the defects of the product is improved, on the other hand, the mAP value of the model prediction result is improved, and the overall performance of the detection model is improved to a great extent, so that the method is worthy of popularization.

The embodiment of the present invention is a better embodiment, but the embodiment of the present invention is not limited by the embodiment, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. The convolutional neural network flaw detection method based on feature fusion is characterized by comprising the following steps of:

1) Preprocessing a data set: for picture X in the initial dataset _defect Cutting and turning to obtain data sets X _{defect_croped} And X _{defeect_fliped} Simultaneously obtaining a corresponding cut and overturned flaw label file; the pretreatment is only aimed at the flaw picture, and the template picture is not pretreated;

2) For pictures input into a model convolutional network, including X _{defeect_fliped} Resize, padding and normalizing operations are carried out on the flaw image and the template image, meanwhile, the scaling ratio of the flaw label combined with the image is scaled, when the image is subjected to padding, parameters of the flaw label combined with the padding are adjusted, and when the image is subjected to normalizing operation, specific treatment is not carried out on the flaw label;

3) Inputting the flaw picture and the template picture into a Resnet101 convolutional network for feature extraction, selecting a specific 4 layers in the extracted feature picture for construction of a feature pyramid network FPN, and respectively constructing the FPN networks of the flaw picture and the template picture, wherein the method comprises the following steps:

3.1 Inputting the pretreated flaw pictures and template pictures into a Resnet101 network respectively for feature extraction, wherein the Resnet101 network comprises 4 stages, and the last feature layer in each stage of the network is selected respectively for constructing an FPN network;

3.4 Input template features and flaw features, P for FPN network output _{defect_i} And P _{template_i} ，

i= {2,3,4,5}, respectively taking feature layers P corresponding to the flaw picture and the template picture _{defect_k} And P _{template_j} K=j epsilon {1,2,3,4,5}, overlapping the channels, then fusing by adopting 1×1 convolution, keeping the size unchanged after fusing, halving the number of channels, and obtaining the fused feature layer P _{fused_i} ，i＝{2,3,4,5}；

3.5 For P) _{defect_6} And P _{template_6} To ensure that the required quantity of high semantic defect characteristic information is contained when the candidate frame is extracted later, P is not replaced _{defect_6} And P _{template_6} Fusing and directly taking P _{defect_6} Is P _{fused_6} ；

5) Performing preliminary candidate region extraction based on the fused feature layers, wherein the extraction comprises classification and regression operations, and performing region-of-interest pooling operation, namely ROI pooling operation, on the extracted candidate region;

6) Cascading a plurality of ROI mapping layers and classification and regression layers to form Cascade R-CNN, and further classifying and regressing the input candidate region; the Cascade of the ROI pooling layer and the classification and regression layer forms a Cascade R-CNN network, which comprises the following steps:

6.1 Setting a new IOU threshold value, wherein the threshold value is larger than the IOU threshold value used in the previous round in order to ensure the improvement of the precision of the flaw frame after cascading, the value range is [0.6,0.8], classifying the input pre-candidate frame into a foreground and a background according to the threshold value, further regressing the pre-candidate frame, fitting the positions of the pre-candidate frame and the true flaw in the flaw label, and outputting the non-maximal value inhibition sequencing;

6.2 Setting a new IOU threshold value with the value range of [0.7,0.8], classifying the input pre-candidate frames into foreground and background according to the threshold value, further regressing the pre-candidate frames, fitting the positions of the pre-candidate frames and the true flaws in the flaw labels, and outputting the classification result obtained after non-maximum suppression sorting, namely the type of flaws contained in the picture, wherein the regressing result is the flaw position contained in the picture;

7) Selecting an optimizer, setting parameters and iteration times of the optimizer, and training the model; in the training process, updating the model weight once for each batch until iteration is completed, and obtaining a final weight file;

2. The convolutional neural network flaw detection method based on feature fusion of claim 1, wherein the method comprises the following steps: in step 2), the defective label is processed as follows: when the picture is subjected to the scale scaling, the same scale scaling is carried out on the flaw label, the relative position of flaws in the picture is ensured to be unchanged, the size of the picture is increased when the picture is subjected to the padding processing, and the parameters (P _w ,P _h ) Sitting downThe label transform, where P _w For the increased size of the left and right sides of the picture during padding, P _h For the increased size of the upper and lower sides of the picture during packing, the original defective label is denoted as (x, y, d, h), and the transformed defective label is denoted as (x _new ,y _new ,d _new ,h _new ) Wherein x and y respectively represent the left upper corner coordinates of the flaw frame, d and h represent the width and the height of the flaw frame, and the coordinate transformation rule is as follows:

x _new ＝x+P _w

y _new ＝y+P _h

d _new ＝d

h _new ＝h。

3. the convolutional neural network flaw detection method based on feature fusion of claim 1, wherein the method comprises the following steps: in step 5), the method for extracting the candidate region and the corresponding classification and regression operations thereof comprise the following steps:

5.1 For each position in the feature map, n×m candidate frames are considered to ensure adaptability to flaws because of extremely irregular flaw shapes, wherein n e {1,2,3} represents the size class number of the flaw candidate frames, m e {1,2,3} represents the aspect ratio class number of the flaw candidate frames, and the values of n and m are determined according to the actual distribution of flaws in the dataset;

5.2 For all the generated candidate frames, dividing the candidate frames into a foreground and a background according to the relation between the IOU value of the real flaw in the flaw label corresponding to the candidate frame and the picture and a set IOU threshold value, wherein the value range of the IOU threshold value is [0.3,0.8], and the specific value should be set in an experiment;

the optimization objective of the function is:

/>

is a parameter to be learned->

Transpose of w _* Is->

Optimized parameters,/->

Is a true flaw label, N is the flaw number, and lambda is a regular term;

5.4 Mapping the regressed pre-candidate frames back to the original image, judging whether the pre-candidate frames are out of the boundary in a large range or the area is smaller than a threshold value, eliminating the pre-candidate frames which do not meet the requirement, sorting from large to small according to the confidence level output by the softmax function, and extracting the first L pre-candidate frames, wherein the value range of L is [1000,5000];

5.5 Non-maximum suppression is carried out on the extracted pre-candidate frames, the candidate frames after non-maximum suppression are ordered, the first V outputs are selected, and the V value range is 100, 1000.

4. The convolutional neural network flaw detection method based on feature fusion of claim 1, wherein the method comprises the following steps: in step 7), the optimization method selected by the model is random gradient descent, the learning rate setting in the experiment should be determined according to the number of pictures image_num trained by each video card and the number of used video cards GPU_num, the experiment shows that setting the learning rate to 0.00125 x image_num x GPU_num can obtain better effect, in order to ensure that gradient descent reaches the optimal point and prevent over fitting phenomenon in the model training process, and the iteration number T should be greater than 10 and less than 100.