CN110992311B - Convolutional neural network flaw detection method based on feature fusion - Google Patents

Convolutional neural network flaw detection method based on feature fusion Download PDF

Info

Publication number
CN110992311B
CN110992311B CN201911104107.XA CN201911104107A CN110992311B CN 110992311 B CN110992311 B CN 110992311B CN 201911104107 A CN201911104107 A CN 201911104107A CN 110992311 B CN110992311 B CN 110992311B
Authority
CN
China
Prior art keywords
flaw
layer
picture
feature
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911104107.XA
Other languages
Chinese (zh)
Other versions
CN110992311A (en
Inventor
许玉格
钟铭
吴宗泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201911104107.XA priority Critical patent/CN110992311B/en
Publication of CN110992311A publication Critical patent/CN110992311A/en
Application granted granted Critical
Publication of CN110992311B publication Critical patent/CN110992311B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a convolutional neural network flaw detection method based on feature fusion, which comprises the following steps: 1) Preprocessing a data set; 2) Resize, padding and normalizing the pictures input into the model convolutional network; 3) Inputting the flaw pictures and the template pictures into a Resnet101 convolutional network for feature extraction, and respectively constructing FPN networks of the flaw pictures and the template pictures; 4) Overlapping channels of corresponding feature layers in the FPN network of the flaw picture and the template picture, and fusing in a convolution mode; 5) Extracting a preliminary candidate region based on the fused characteristic layer, and performing ROI pooling operation; 6) Cascading a plurality of ROI pooling layers and classification and regression layers to form a Cascade R-CNN network, and classifying and regressing the input candidate region; 7) Selecting an optimizer, and training a model; 8) And inputting the picture to be predicted into the trained model, and outputting a flaw detection result. The invention can improve the accuracy of classifying flaws and mAP values in the flaw detection process.

Description

Convolutional neural network flaw detection method based on feature fusion
Technical Field
The invention relates to the technical field of flaw detection, in particular to a convolutional neural network flaw detection method based on feature fusion.
Background
In the industrial manufacturing industry, flaw detection on products is a key issue affecting product quality. The industrial manufacturing process is a complex and multivariable process, and various flaws are easily caused on the product due to equipment failure or interference of human factors in the manufacturing process and the transportation process of the product, so that the quality of the product is affected. Whereas flaw detection may translate into classification and localization problems in one defect detection. Different flaws are generated during the production and transportation of the product due to the influence of various factors, and the shape, the size, the number and the like of the flaws are irregular. This results in product flaws that are not only highly unbalanced in number distribution, but also present significant difficulties in flaw detection in terms of flaw size and shape.
Traditional machine learning algorithms are often based on image processing and pattern recognition techniques, and perform analysis by extracting the power spectral density of the product surface texture features to complete detection of product flaws. The detection scheme is too dependent on priori knowledge, the accuracy of the detection result is not high, and the positions of flaws in the product cannot be positioned. The accuracy of flaw detection in an actual industrial application scene is a key index, namely, in the flaw detection scene, the recall rate of an algorithm is often seen more, a sample containing flaws is accurately identified, and information such as the number, the type and the size of the flaw sample is analyzed, so that the method is more beneficial to equipment adjustment in industrial production and improves the production process. Therefore, attention should be paid to research on a detection method for improving information such as accuracy of flaw identification and position distribution of flaws.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art, and provides a convolutional neural network flaw detection method based on feature fusion.
In order to achieve the above purpose, the technical scheme provided by the invention is as follows: a convolutional neural network flaw detection method based on feature fusion comprises the following steps:
1) Preprocessing data: for initial sample picture X defect Cutting and turning to obtain data sets X defect_croped And X defeect_fliped Simultaneously obtaining a flaw label file after corresponding cutting and overturning; the pre-treatmentProcessing is only carried out on the flaw pictures, and the template pictures are not preprocessed;
2) For pictures input into convolutional network, including X defeect_fliped Resize, padding and normalizing operations are carried out on the flaw image and the template image, meanwhile, the scaling ratio of the flaw label combined with the image is scaled, when the image is subjected to padding, parameters of the flaw label combined with the padding are adjusted, and when the image is subjected to normalization processing, specific processing is not carried out on the flaw label;
3) Inputting the flaw pictures and the template pictures into a Resnet101 convolutional network for feature extraction, selecting specific 4 layers in the extracted feature pictures for construction of a feature pyramid network (Feature Pyramid Network, FPN for short), and respectively constructing FPN networks of the flaw pictures and the template pictures;
4) Superposing channels of corresponding feature layers in the FPN network of the flaw picture and the FPN network of the template picture, and then fusing in a convolution mode to obtain fused feature layers;
5) Performing preliminary candidate region extraction based on the fused feature layers, wherein the extraction comprises classification and regression operations, and performing ROI (Region of Interest pooling, ROI pooling for short) operation on the extracted candidate region;
6) Cascading a plurality of ROI mapping layers and classification and regression layers to form Cascade R-CNN, and further classifying and regressing the input candidate region;
7) Selecting an optimizer, setting parameters of the optimizer and the iteration times T, and training the model; in the training process, updating the model weight once for each batch until iteration is completed, and obtaining a final weight file;
8) And inputting the picture to be predicted into the trained model, wherein the output result is the flaw type, the category confidence coefficient and the position of the flaw frame in the input picture.
In step 2), the processing of the defective label is specifically: when the picture is scaled by restore, the same scale scaling is carried out on the flaw label, so that the relative position of flaws in the picture is ensured to be unchanged,the size of the picture is increased when the padding processing is performed, and the parameters of padding (P w ,P h ) Performing coordinate transformation, wherein P w For the increased size of the left and right sides of the picture during padding, P h For the increased size of the upper and lower sides of the picture at the time of padding, the defective label before conversion is expressed as (x, y, d, h), and the defective label after conversion is expressed as (x new ,y new ,d new ,h new ) Wherein x and y respectively represent the left upper corner coordinates of the flaw frame, d and h represent the width and the height of the flaw frame, and the coordinate transformation rule is as follows:
Figure GDA0004069061740000031
said step 3) comprises the steps of:
3.1 Inputting the flaw picture and the template picture after pretreatment into a Resnet101 network respectively, extracting features, and selecting specific 4 feature layers to be used for constructing an FPN network;
3.2 The construction method of the FPN network comprises the steps of: respectively carrying out 1X 1 convolution on the 4 selected characteristic layers to obtain C2, C3, C4 and C5 layers with 256 channels and consistent characteristic diagram sizes and input characteristic diagram sizes;
for the C5 layer, 3X3 convolution is directly carried out on the C5 layer, so that a P5 layer with the same size and channel number as the input characteristic layer is obtained;
for the C4 layer, carrying out 3X3 convolution on the C4 layer to obtain an output characteristic layer with the same size and channel number as the C4 layer, and adding a characteristic layer after double up-sampling on the P5 layer to obtain the P4 layer;
for the C3 layer, carrying out 3X3 convolution on the C3 layer to obtain an output characteristic layer with the same size and channel number as the input layer, and adding a characteristic layer after double up-sampling on the P4 layer to obtain the P3 layer;
for the C2 layer, carrying out 3X3 convolution on the C2 layer to obtain an output characteristic layer with the same size and channel number as the C2 layer, and adding a characteristic layer after double up-sampling on the P3 layer to obtain the P2 layer;
directly carrying out 3×3 convolution on an original feature layer corresponding to the C5 layer to obtain a P6 layer with unchanged size and channel number;
3.3 The corresponding feature layers output after the template picture and the flaw picture pass through the FPN network are respectively expressed as P defect_i And P template_i I= {2,3,4,5,6}, where i represents the number of layers in the FPN network;
3.4 Input template feature map and flaw feature map, P output to FPN network defect_i And P template_i I= {2,3,4,5}, taking the corresponding feature layers, superposing the channels, fusing by adopting 1×1 convolution, keeping the size unchanged after fusing, halving the number of channels, and obtaining the fused feature layer P fused_i
3.5 For P) defect_6 And P template_6 To ensure that enough high semantic flaw feature information is contained when candidate frame extraction is carried out later, P is not replaced defect_6 And P template_6 Fusing and directly taking P defect_6 Is P fused_6
In step 5), the method for extracting the candidate region and the corresponding classification and regression operations thereof comprise the following steps:
5.1 For each position in the feature map, n×m flaw candidate frames are considered for ensuring adaptability to flaws because of extremely irregular flaw shapes, wherein n epsilon {1,2,3} represents the size class number of the flaw candidate frames, m epsilon {1,2,3} represents the aspect ratio class number of the flaw candidate frames, and the values of n and m are determined according to the actual distribution of flaws in the data set;
5.2 For all generated candidate frames, according to the relation between the Intersection-Over-Union (IOU) value of the candidate frame and the real flaws in the flaw labels corresponding to the pictures and the set IOU threshold value, the value range of the IOU threshold value is [0.3,0.8], and the specific value should be set in the experiment;
5.3 Regression is carried out on the candidate frames, the positions of the candidate frames and the true flaw frames are further fitted, and a loss function in the regression process is defined as follows:
Figure GDA0004069061740000051
the optimization objective of the function is:
Figure GDA0004069061740000052
in phi 5 (P i ) Is a feature vector composed of feature images of corresponding candidate frames, P i Is a feature map of the corresponding candidate box,
Figure GDA0004069061740000053
is a parameter to be learned->
Figure GDA0004069061740000054
Transpose of w * Is->
Figure GDA0004069061740000055
Optimized parameters,/->
Figure GDA0004069061740000056
Is a true flaw label, N is the flaw number, and lambda is a regular term;
5.4 Mapping the regressed pre-candidate frames back to the original image, judging whether the pre-candidate frames are out of the boundary in a large range or the area is smaller than a threshold value, eliminating the pre-candidate frames which do not meet the requirement, sorting from large to small according to the confidence level output by the softmax function, and extracting the previous L pre-candidate frames, wherein the value range of L is [1000,5000];
5.5 Non-maximum value suppression is carried out on the extracted pre-candidate frames, the candidate frames after non-maximum value suppression are sequenced, the first V outputs are selected, and the V value range is 100, 1000;
in step 6), a Cascade R-CNN network is formed by cascading an ROI pooling layer and a classification regression layer, and the method comprises the following steps:
6.1 Setting a new IOU threshold value, wherein the threshold value is larger than the IOU threshold value used in the previous round, setting the threshold value in the step to be better in the range of [0.6,0.8] according to experiments, classifying the input candidate frames into foreground and background according to the threshold value, carrying out further regression on the candidate frames, fitting the positions of the candidate frames and the true flaws in the flaw labels, and carrying out non-maximum suppression and then sequencing output;
6.2 Setting a new IOU threshold value for further improving the precision of the flaw frame, setting the value range of the threshold value to be better when [0.7,0.8] according to experiments, classifying the input candidate frame into a foreground and a background according to the threshold value, further regressing the candidate frame, fitting the candidate frame and the position of a real flaw in a flaw label, and outputting a classification result obtained after non-maximum suppression sequencing, namely the type of the flaw contained in the picture, wherein the regressing result is the flaw position contained in the picture.
In step 7), the optimization method selected by the model is random gradient descent, the learning rate setting in the experiment should be determined according to the number of pictures image_num trained by each video card and the number of used video cards GPU_num, the experiment shows that setting the learning rate to 0.00125 x image_num x GPU_num can obtain better effect, in order to ensure that the gradient descent reaches the optimal point and prevent the overfitting phenomenon in the model training process, the iteration times T should not be too large, and the experiment shows that the T value should be more than 10 and less than 100.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. according to the invention, a network structure based on Cascade R-CNN+Resnet101+FPN is used as a basic frame of the model, and the flaw picture is cut and rotated before training, so that the data sample size is increased.
2. According to the invention, the Cascade R-CNN is adopted to extract the candidate region for multiple times, so that the optimization of the candidate region in stages is realized, and meanwhile, the over fitting phenomenon of the candidate region in the training process can be prevented because the IOU threshold value selected in the Cascade structure is gradually increased.
3. According to the invention, the template picture is utilized in the training process, and more semantic information in the flaw picture can be obtained by combining the template picture and the flaw picture to perform feature extraction.
4. According to the invention, the characteristics of the template picture and the characteristics of the flaw picture are selectively fused in the FPN network, the fused layer can enhance the expression capability of the characteristic picture under the same dimension, and the high semantic characteristics extracted by the original characteristic layer are reserved by the layer which is not fused, so that the identification of flaws is facilitated.
5. When the template picture and the flaw picture are fused, the channel superposition is performed in a 1X 1 convolution mode, so that the proportion of the template picture characteristic and the flaw picture characteristic in the fusion can be adjusted in a network self-adaptive mode.
6. The top-down structure of the feature pyramid network adopted by the invention continuously brings the features with strong semantic information of the high layer to the bottom layer, and most background candidate areas can be filtered out by combining the high-layer semantic information extracted from the template picture, and meanwhile, the detection capability of a small target is improved.
Drawings
FIG. 1 is a schematic block diagram of the method of the present invention. In the figure, conv1, conv2, conv3, conv4 and conv5 represent picture feature extraction networks in models, 1×1conv represents convolution layers with convolution kernels of 1×1, C2, C4 and C5 represent feature graphs in the FPN network, 2×up represents up-sampling with sampling rate of 2,3×3conv represents convolution layers with convolution kernels of 3×3, P2, P3, P4, P5 and P6 represent feature layers after feature fusion in the FPN network, cls, C1, C2 and C3 represent classification networks, reg represents regression networks, pool represents pooling layers, H1, H2 and H3 represent convolution network modules, and B1, B2 and B3 represent candidate boxes of flaws extracted at each stage of the cascade network.
FIG. 2 is a flow chart of an example of the implementation of the method of the present invention.
Detailed Description
For more clearly describing the objects, technical solutions and advantages of the embodiments of the present invention, the technical solutions of the embodiments of the present invention will be fully described below with reference to the accompanying drawings in the embodiments of the present invention. It should be noted that this embodiment is only a part of embodiments of the present invention, and not all embodiments. All other embodiments based on the embodiments of the invention, which a person of ordinary skill in the art would obtain without inventive faculty, are within the scope of the invention.
"2019 Guangdong industrial intellectual innovation major" held in the Ariyaku is used herein to provide cloth data sets as experimental data sets. Cloth picture data come from textile factories, wherein the data set contains 4351 flaw pictures, 68 corresponding template pictures, 15 flaws are contained in the 4351 pictures, and the category names of the flaws are as follows: stain, staggering, watermarking, hair, stitch mark, insect sticking, hole, fold, weaving defect, leakage, wax spot, color difference, net folding, and other 15 defects are unevenly distributed in each defect picture.
The resolution ratio of the original flaw picture and the template picture can reach 4096 x 1080 at maximum, a display card adopted in the experimental implementation process is English to 1080Ti, and a display memory is 11G, and the display card cannot input full resolution pictures for experiments, so that the pictures need to be preprocessed. Firstly, cutting the pictures according to a mode of 2x2, namely, the resolution of each cut picture is 2048 x 905, and then, in order to increase the training sample size, overturning the cut pictures, wherein the overturning is carried out in the horizontal, vertical and horizontal and vertical directions.
When the model is evaluated, the overall performance of the model cannot be comprehensively reflected by simply using the defect picture identification accuracy. The central idea of the average accuracy of performance indexes (Average Precision, abbreviated as AP) is to measure the accuracy of prediction of the learned model on each category, mAP (mean Average Precision, abbreviated as mAP) is the average value of the AP, that is, mAP is the average value of the accuracy of prediction of the learned model on all categories, and when a detection task is carried out, the mAP value and the accuracy Acc are used as the final evaluation index of the learned model.
The defect detection method of the invention is called TFF-Cascade R-CNN (Template Features Fused-Cascade R-CNN, cascade convolution neural network based on template feature fusion), which classifies and regresses defects in cloth pictures by taking Cascade R-CNN+Resnet101+FPN as an integral framework of a model, and the specific implementation process of the TFF-Cascade R-CNN in the embodiment is shown in figures 1 and 2, and comprises the following steps:
1) Preprocessing a picture in an original data set, including:
1.1 For each defective picture x defect_i (i= {1, 2., N }, N is the number of defective pictures) by 2×2 uniform cutting, and x is obtained after cutting is completed defect_i_1 ,x defect_i_2 ,x defect_i_3 ,x defect_i_4 Picture x of flaw defect_i Corresponding defective label y defect_i Cutting also into 4 parts y defect_i_1 ,y defect_i_2 ,y defect_i_3 ,y defect_i_4 Each part corresponds to x defect_i_1 ,x defect_i_2 ,x defect_i_3 ,x defect_i_4
1.2 For the data set X obtained after cutting defect_croped Each picture X in (a) defect_croped_i (i= {1, 2., 4N }, N is the number of defective pictures) performing inversion in horizontal, vertical and horizontal-vertical directions to obtain corresponding X defect_croped_i_H ,X defect_croped_i_V ,X defect_croped_i_HV And processing the defective label in the horizontal, vertical and horizontal vertical directions to obtain a corresponding y defect_croped_i_H ,y defect_croped_i_V ,y defect_croped_i_HV
2) Inputting the preprocessed flaw picture and template picture into the model, and before the picture enters the convolutional neural network for feature extraction, performing related operations such as size resetting and pixel filling on the picture, ensuring that the picture input into the convolutional network has the same size, and facilitating the model to learn, wherein the specific process of the step is as follows:
2.1 Resizing the input picture to a resolution closest in scale (2048,905) while maintaining the original aspect ratio;
2.2 Normalized pixel value of the input picture, the specific steps of normalization are as follows:
2.2.1 Randomly sampling pictures in a training set, wherein the sample capacity is 800, and the sample set is X norm
2.2.2 At each sampleA part with the size of 32 multiplied by 32 is randomly cut out in the picture, and the mean value mean of the part on the 3 channels of R, G and B is calculated i Sum of variances std i Where i= {1,2, … …, N }, N is the number of input pictures;
2.2.3 Calculating a sample set X norm The average value of the mean and the variance on the 3 channels of R, G and B is mean and std, the obtained mean and variance are utilized to normalize the picture, and the normalized formula is that
Figure GDA0004069061740000091
/>
Wherein X is original Representing a matrix of input images, X norm Representing the normalized image matrix, adjusted_std is the adjusted variance, which is
Figure GDA0004069061740000092
Wherein N is the number of input pictures;
2.3 For the picture with resolution less than (2048,905) after the size reset, performing pixel filling operation, and filling the picture with a filling value of 0 when the pixel is filled, wherein the length and the width are all multiples of 32, and the filling value is the filling value of 0;
2.4 For processing the defective label, the input defective label should be converted into a standard format, so that the extraction and utilization of the label inside the model are facilitated.
3) And inputting the pretreated flaw pictures and template pictures into a Resnet101 convolution layer, and extracting the characteristics. When the first training is performed, an initial pre-training model is loaded.
4) The method for extracting 4 output feature layers conv2, conv3, conv4 and conv5 in the ResNet101 convolution layer to construct the FPN network comprises the following specific construction steps:
4.1 After convolution with convolution kernel 1×1, padding 0, step size stride 1, the size of the feature map is kept unchanged, the number of channels is uniformly changed to 256, and the obtained feature layers are denoted as C2, C3, C4, and C5;
4.2 After the C5 layer is subjected to convolution with a convolution kernel of 3 multiplied by 3, a padding of 1 and a step size stride of 1, a P5 layer with the feature diagram size and the channel number kept unchanged is obtained;
after the C4 layer is subjected to convolution with a convolution kernel of 3 multiplied by 3, a packing of 1 and a stride of 1, adding a P5 layer subjected to double up-sampling to obtain a P4 layer with the feature diagram size and the channel number unchanged as well as the C4 layer;
after the C3 layer is subjected to convolution with a convolution kernel of 3 multiplied by 3, a packing of 1 and a stride of 1, adding a P4 layer subjected to double up-sampling to obtain a P3 layer with the feature diagram size and the channel number unchanged as well as the C3 layer;
and (3) after the C2 layer is subjected to convolution with a convolution kernel of 3 multiplied by 3, a packing of 1 and a stride of 1, adding the P3 layer subjected to double upsampling to obtain a P2 layer with the feature diagram size and the channel number unchanged as well as the C2 layer.
The conv5 layer is directly subjected to convolution with a convolution kernel of 3×3, padding of 1, stride of 1 to obtain the P6 layer. At this point, the required feature layer in the FPN network is obtained.
5) Conv for picture of flaw and template defect_i ,conv template_i I= {2,3,4,5} is input into FPN, P is obtained according to step 4) defect_i ,P template_i I= {2,3,4,5,6}, P is fused using the fusion method defect ,P template The fusion is carried out, and the fusion steps are as follows:
5.1 For P) defect_i ,P template_i I= {2,3,4,5}, will correspond to P defect_i ,P template_i Stacking on channels, i.e. original P defect_i ,P template_i The feature images are all 256 channels with the same size, and after the feature images are overlapped on the channels, the feature image P with the same size and 512 channels is obtained concatedi
5.2 P) to be obtained concated_i Performing convolution operation, wherein the convolution parameters are set as follows: the convolution kernel is 1 multiplied by 1, the padding is 0, the stride is 1, the output channel is 256, and the characteristic layer after convolution fusion is P fused_i ,i={2,3,4,5};
5.3)P fused_6 Without making a channelIs directly added with P by superposition and convolution fusion operation defect_6 As P fused_6 Ensuring the proportion of flaw features in the subsequent candidate region extraction process.
6) Each P fused_i Input into the RPN (Region Proposal Network, candidate region generation network) network and generate the ROI (Region of Interest ) region, the specific steps of the process are as follows:
6.1 Layer P) of features fused Each layer of the system corresponds to an RPN network, and the RPN network is utilized to input the characteristic layer P fused_i Generating candidate frames, which specifically comprises the following steps:
6.1.1 For each location in the feature map. Through experiments, 9 candidate boxes were considered, of which there were 3 scales and 3 aspect ratios;
6.1.2 Dividing the candidate frame into a foreground and a background according to the relation between the IOU value of the real flaw in the flaw label corresponding to the picture and the set IOU threshold value, and setting the IOU threshold value to be 0.5 in the step;
6.1.3 Regression is carried out on the generated candidate frames, the positions of the candidate frames and the true flaw frames are further fitted, and a loss function in the regression process is defined as follows:
Figure GDA0004069061740000111
the optimization objective of the function is:
Figure GDA0004069061740000112
wherein phi is 5 (P i ) Is a feature vector composed of feature images of corresponding candidate frames, P i Is a feature map of the corresponding candidate box,
Figure GDA0004069061740000113
is a parameter to be learned->
Figure GDA0004069061740000114
Transpose of w * Is->
Figure GDA0004069061740000115
Optimized parameters,/->
Figure GDA0004069061740000116
Is a true flaw label, N is the flaw number, and lambda is a regular term;
6.1.4 Mapping the regressed pre-candidate frames back to the original image, judging whether the pre-candidate frames exceed the boundary in a large range or the area is smaller than a threshold value, eliminating pre-candidate areas which do not meet the requirement, sequencing from large to small according to a softmax function mode, and extracting the first L pre-candidate areas, wherein L is 2000;
6.1.5 NMS (Non Maximum Suppression, non-maximal suppression) the L pre-candidate regions, ordering the results after NMS, outputting the previous V candidate regions, where V is 256;
6.2 Generating candidate areas on the 5 feature layers, merging the generated candidate areas and then outputting the combined candidate areas as the whole RPN network;
6.3 In the ROI pooling layer, the inputted candidate region is pooled. Inputting a region with the size of m multiplied by n after projection of the original image, assuming that the output size is p multiplied by q, which satisfies the relation m < p, n < q, dividing the region with the size of m multiplied by n into p multiplied by q, and carrying out maximum value pooling operation on each divided region;
7) The ROI candidate region is input into a Cascade R-CNN network, and classification and regression accuracy is further improved in an ROI pooling layer and classification and regression layer of the network, and the specific operation of the process is as follows:
7.1 Setting a new IOU threshold value which is larger than the threshold value used in the previous round, setting the threshold value to be 0.6 in the step, classifying the input candidate frames into foreground and background according to the threshold value, further carrying out regression, fitting the positions of the candidate frames and the true flaws in the flaw labels, and outputting the final NMS after sequencing;
7.2 Setting a new IOU threshold value to be 0.7, classifying the input candidate frames into foreground and background according to the threshold value, further carrying out regression, fitting the positions of the candidate frames and the real labels in the flaw labels, and outputting the classification result obtained after NMS sequencing, namely the type of flaws contained in the picture, wherein the regression result is the flaw position contained in the picture;
according to the steps, a training model is constructed and experiments are carried out. The experimental environment is CPU: intel (R) Xron (R) E5-2650 v4, display card of nvidia1080ti, display memory of 11G and construction based on Pytorch1.1 under Linux platform. In the model training process, a single Zhang Xianka is used for training, an optimizer of the model is set to be SGD, the learning rate is 0.00125, the impulse is 0.9, the batch in training is 1, the iteration number is 12, after training is completed, testing is performed on a divided test set, the sample capacity in the test set is 500, the number of flaw pictures is 450, the number of non-flaw pictures is 50, after testing is completed, the accuracy Acc is 89.10%, mAP is 41.80, compared with a model without FPN network fusion in a network structure, the accuracy Acc is 81.14%, and mAP is 38.91, and the classification accuracy Acc only reaches 79% by adopting a traditional machine learning-based method. Therefore, the method of the invention has an improvement effect on the defect detection problem.
In summary, the invention focuses on researching a convolutional neural network flaw detection algorithm based on feature fusion aiming at the flaw detection problem. According to the method, cascade R-CNN+Resnet101+FPN is used as an overall framework of an identification model, and a fused feature layer is constructed in an FPN network by combining acquired product template pictures, so that flaw classification and positioning are performed based on the fused feature layer. According to the method, on one hand, the classification accuracy of the defects of the product is improved, on the other hand, the mAP value of the model prediction result is improved, and the overall performance of the detection model is improved to a great extent, so that the method is worthy of popularization.
The embodiment of the present invention is a better embodiment, but the embodiment of the present invention is not limited by the embodiment, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims (4)

1. The convolutional neural network flaw detection method based on feature fusion is characterized by comprising the following steps of:
1) Preprocessing a data set: for picture X in the initial dataset defect Cutting and turning to obtain data sets X defect_croped And X defeect_fliped Simultaneously obtaining a corresponding cut and overturned flaw label file; the pretreatment is only aimed at the flaw picture, and the template picture is not pretreated;
2) For pictures input into a model convolutional network, including X defeect_fliped Resize, padding and normalizing operations are carried out on the flaw image and the template image, meanwhile, the scaling ratio of the flaw label combined with the image is scaled, when the image is subjected to padding, parameters of the flaw label combined with the padding are adjusted, and when the image is subjected to normalizing operation, specific treatment is not carried out on the flaw label;
3) Inputting the flaw picture and the template picture into a Resnet101 convolutional network for feature extraction, selecting a specific 4 layers in the extracted feature picture for construction of a feature pyramid network FPN, and respectively constructing the FPN networks of the flaw picture and the template picture, wherein the method comprises the following steps:
3.1 Inputting the pretreated flaw pictures and template pictures into a Resnet101 network respectively for feature extraction, wherein the Resnet101 network comprises 4 stages, and the last feature layer in each stage of the network is selected respectively for constructing an FPN network;
3.2 The construction method of the FPN network comprises the steps of: respectively carrying out 1X 1 convolution on the 4 selected characteristic layers to obtain C2, C3, C4 and C5 layers with 256 channels and consistent characteristic diagram sizes and input characteristic diagram sizes;
for the C5 layer, 3X3 convolution is directly carried out on the C5 layer, so that a P5 layer with the same size and channel number as the input characteristic layer is obtained;
for the C4 layer, carrying out 3X3 convolution on the C4 layer to obtain an output characteristic layer with the same size and channel number as the C4 layer, and adding a characteristic layer after double up-sampling on the P5 layer to obtain the P4 layer;
for the C3 layer, carrying out 3X3 convolution on the C3 layer to obtain an output characteristic layer with the same size and channel number as the input layer, and adding a characteristic layer after double up-sampling on the P4 layer to obtain the P3 layer;
for the C2 layer, carrying out 3X3 convolution on the C2 layer to obtain an output characteristic layer with the same size and channel number as the C2 layer, and adding a characteristic layer after double up-sampling on the P3 layer to obtain the P2 layer;
directly carrying out 3×3 convolution on an original feature layer corresponding to the C5 layer to obtain a P6 layer with unchanged size and channel number;
3.3 The corresponding feature layers output after the template picture and the flaw picture pass through the FPN network are respectively expressed as P defect_i And P template_i I= {2,3,4,5,6}, where i represents the number of layers in the FPN network;
3.4 Input template features and flaw features, P for FPN network output defect_i And P template_i
i= {2,3,4,5}, respectively taking feature layers P corresponding to the flaw picture and the template picture defect_k And P template_j K=j epsilon {1,2,3,4,5}, overlapping the channels, then fusing by adopting 1×1 convolution, keeping the size unchanged after fusing, halving the number of channels, and obtaining the fused feature layer P fused_i ,i={2,3,4,5};
3.5 For P) defect_6 And P template_6 To ensure that the required quantity of high semantic defect characteristic information is contained when the candidate frame is extracted later, P is not replaced defect_6 And P template_6 Fusing and directly taking P defect_6 Is P fused_6
4) Superposing channels of corresponding feature layers in the FPN network of the flaw picture and the FPN network of the template picture, and then fusing in a convolution mode to obtain fused feature layers;
5) Performing preliminary candidate region extraction based on the fused feature layers, wherein the extraction comprises classification and regression operations, and performing region-of-interest pooling operation, namely ROI pooling operation, on the extracted candidate region;
6) Cascading a plurality of ROI mapping layers and classification and regression layers to form Cascade R-CNN, and further classifying and regressing the input candidate region; the Cascade of the ROI pooling layer and the classification and regression layer forms a Cascade R-CNN network, which comprises the following steps:
6.1 Setting a new IOU threshold value, wherein the threshold value is larger than the IOU threshold value used in the previous round in order to ensure the improvement of the precision of the flaw frame after cascading, the value range is [0.6,0.8], classifying the input pre-candidate frame into a foreground and a background according to the threshold value, further regressing the pre-candidate frame, fitting the positions of the pre-candidate frame and the true flaw in the flaw label, and outputting the non-maximal value inhibition sequencing;
6.2 Setting a new IOU threshold value with the value range of [0.7,0.8], classifying the input pre-candidate frames into foreground and background according to the threshold value, further regressing the pre-candidate frames, fitting the positions of the pre-candidate frames and the true flaws in the flaw labels, and outputting the classification result obtained after non-maximum suppression sorting, namely the type of flaws contained in the picture, wherein the regressing result is the flaw position contained in the picture;
7) Selecting an optimizer, setting parameters and iteration times of the optimizer, and training the model; in the training process, updating the model weight once for each batch until iteration is completed, and obtaining a final weight file;
8) And inputting the picture to be predicted into the trained model, wherein the output result is the flaw type, the category confidence coefficient and the position of the flaw frame in the input picture.
2. The convolutional neural network flaw detection method based on feature fusion of claim 1, wherein the method comprises the following steps: in step 2), the defective label is processed as follows: when the picture is subjected to the scale scaling, the same scale scaling is carried out on the flaw label, the relative position of flaws in the picture is ensured to be unchanged, the size of the picture is increased when the picture is subjected to the padding processing, and the parameters (P w ,P h ) Sitting downThe label transform, where P w For the increased size of the left and right sides of the picture during padding, P h For the increased size of the upper and lower sides of the picture during packing, the original defective label is denoted as (x, y, d, h), and the transformed defective label is denoted as (x new ,y new ,d new ,h new ) Wherein x and y respectively represent the left upper corner coordinates of the flaw frame, d and h represent the width and the height of the flaw frame, and the coordinate transformation rule is as follows:
x new =x+P w
y new =y+P h
d new =d
h new =h。
3. the convolutional neural network flaw detection method based on feature fusion of claim 1, wherein the method comprises the following steps: in step 5), the method for extracting the candidate region and the corresponding classification and regression operations thereof comprise the following steps:
5.1 For each position in the feature map, n×m candidate frames are considered to ensure adaptability to flaws because of extremely irregular flaw shapes, wherein n e {1,2,3} represents the size class number of the flaw candidate frames, m e {1,2,3} represents the aspect ratio class number of the flaw candidate frames, and the values of n and m are determined according to the actual distribution of flaws in the dataset;
5.2 For all the generated candidate frames, dividing the candidate frames into a foreground and a background according to the relation between the IOU value of the real flaw in the flaw label corresponding to the candidate frame and the picture and a set IOU threshold value, wherein the value range of the IOU threshold value is [0.3,0.8], and the specific value should be set in an experiment;
5.3 Regression is carried out on the candidate frames, the positions of the candidate frames and the true flaw frames are further fitted, and a loss function in the regression process is defined as follows:
Figure FDA0004069061730000041
the optimization objective of the function is:
Figure FDA0004069061730000042
/>
in phi 5 (P i ) Is a feature vector composed of feature images of corresponding candidate frames, P i Is a feature map of the corresponding candidate box,
Figure FDA0004069061730000043
is a parameter to be learned->
Figure FDA0004069061730000044
Transpose of w * Is->
Figure FDA0004069061730000045
Optimized parameters,/->
Figure FDA0004069061730000046
Is a true flaw label, N is the flaw number, and lambda is a regular term;
5.4 Mapping the regressed pre-candidate frames back to the original image, judging whether the pre-candidate frames are out of the boundary in a large range or the area is smaller than a threshold value, eliminating the pre-candidate frames which do not meet the requirement, sorting from large to small according to the confidence level output by the softmax function, and extracting the first L pre-candidate frames, wherein the value range of L is [1000,5000];
5.5 Non-maximum suppression is carried out on the extracted pre-candidate frames, the candidate frames after non-maximum suppression are ordered, the first V outputs are selected, and the V value range is 100, 1000.
4. The convolutional neural network flaw detection method based on feature fusion of claim 1, wherein the method comprises the following steps: in step 7), the optimization method selected by the model is random gradient descent, the learning rate setting in the experiment should be determined according to the number of pictures image_num trained by each video card and the number of used video cards GPU_num, the experiment shows that setting the learning rate to 0.00125 x image_num x GPU_num can obtain better effect, in order to ensure that gradient descent reaches the optimal point and prevent over fitting phenomenon in the model training process, and the iteration number T should be greater than 10 and less than 100.
CN201911104107.XA 2019-11-13 2019-11-13 Convolutional neural network flaw detection method based on feature fusion Active CN110992311B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911104107.XA CN110992311B (en) 2019-11-13 2019-11-13 Convolutional neural network flaw detection method based on feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911104107.XA CN110992311B (en) 2019-11-13 2019-11-13 Convolutional neural network flaw detection method based on feature fusion

Publications (2)

Publication Number Publication Date
CN110992311A CN110992311A (en) 2020-04-10
CN110992311B true CN110992311B (en) 2023-04-28

Family

ID=70084136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911104107.XA Active CN110992311B (en) 2019-11-13 2019-11-13 Convolutional neural network flaw detection method based on feature fusion

Country Status (1)

Country Link
CN (1) CN110992311B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111523452B (en) * 2020-04-22 2023-08-25 北京百度网讯科技有限公司 Method and device for detecting human body position in image
CN112053317A (en) * 2020-04-26 2020-12-08 张辉 Workpiece surface defect detection method based on cascade neural network
CN111667476B (en) * 2020-06-09 2022-12-06 创新奇智(广州)科技有限公司 Cloth flaw detection method and device, electronic equipment and readable storage medium
CN111784644A (en) * 2020-06-11 2020-10-16 上海布眼人工智能科技有限公司 Printing defect detection method and system based on deep learning
CN112001448A (en) * 2020-08-26 2020-11-27 大连信维科技有限公司 Method for detecting small objects with regular shapes
CN112113978A (en) * 2020-09-22 2020-12-22 成都国铁电气设备有限公司 Vehicle-mounted tunnel defect online detection system and method based on deep learning
CN112053357A (en) * 2020-09-27 2020-12-08 同济大学 FPN-based steel surface flaw detection method
CN112270687A (en) * 2020-10-16 2021-01-26 鲸斛(上海)智能科技有限公司 Cloth flaw identification model training method and cloth flaw detection method
CN112150460B (en) * 2020-10-16 2024-03-15 上海智臻智能网络科技股份有限公司 Detection method, detection system, device and medium
CN112149693A (en) * 2020-10-16 2020-12-29 上海智臻智能网络科技股份有限公司 Training method of contour recognition model and detection method of target object
CN113516615B (en) * 2020-11-24 2024-03-01 阿里巴巴集团控股有限公司 Sample generation method, system, equipment and storage medium
CN112669300A (en) * 2020-12-31 2021-04-16 上海智臻智能网络科技股份有限公司 Defect detection method and device, computer equipment and storage medium
CN113065400A (en) * 2021-03-04 2021-07-02 国网河北省电力有限公司 Invoice seal detection method and device based on anchor-frame-free two-stage network
CN113240626B (en) * 2021-04-08 2023-07-11 西安电子科技大学 Glass cover plate concave-convex type flaw detection and classification method based on neural network
CN113420648B (en) * 2021-06-22 2023-05-05 深圳市华汉伟业科技有限公司 Target detection method and system with rotation adaptability
CN113627435A (en) * 2021-07-13 2021-11-09 南京大学 Method and system for detecting and identifying flaws of ceramic tiles
CN113628179B (en) * 2021-07-30 2023-11-24 厦门大学 PCB surface defect real-time detection method, device and readable medium
CN113610822B (en) * 2021-08-13 2022-09-09 湖南大学 Surface defect detection method based on multi-scale information fusion
CN113435425B (en) * 2021-08-26 2021-12-07 绵阳职业技术学院 Wild animal emergence and emergence detection method based on recursive multi-feature fusion
CN114170532A (en) * 2021-11-23 2022-03-11 北京航天自动控制研究所 Multi-target classification method and device based on difficult sample transfer learning
CN116977334B (en) * 2023-09-22 2023-12-12 山东东方智光网络通信有限公司 Optical cable surface flaw detection method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109064459A (en) * 2018-07-27 2018-12-21 江苏理工学院 A kind of Fabric Defect detection method based on deep learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014005123A1 (en) * 2012-06-28 2014-01-03 Pelican Imaging Corporation Systems and methods for detecting defective camera arrays, optic arrays, and sensors
JP6300529B2 (en) * 2014-01-08 2018-03-28 キヤノン株式会社 Imaging apparatus, control method therefor, program, and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109064459A (en) * 2018-07-27 2018-12-21 江苏理工学院 A kind of Fabric Defect detection method based on deep learning

Also Published As

Publication number Publication date
CN110992311A (en) 2020-04-10

Similar Documents

Publication Publication Date Title
CN110992311B (en) Convolutional neural network flaw detection method based on feature fusion
CN111027547B (en) Automatic detection method for multi-scale polymorphic target in two-dimensional image
CN111260614B (en) Convolutional neural network cloth flaw detection method based on extreme learning machine
CN112967243B (en) Deep learning chip packaging crack defect detection method based on YOLO
Ferguson et al. Automatic localization of casting defects with convolutional neural networks
CN107145898B (en) Radiographic image classification method based on neural network
CN109446992A (en) Remote sensing image building extracting method and system, storage medium, electronic equipment based on deep learning
CN108830326B (en) Automatic segmentation method and device for MRI (magnetic resonance imaging) image
CN113240626B (en) Glass cover plate concave-convex type flaw detection and classification method based on neural network
CN108416266B (en) Method for rapidly identifying video behaviors by extracting moving object through optical flow
CN107944442A (en) Based on the object test equipment and method for improving convolutional neural networks
CN110084817B (en) Digital elevation model production method based on deep learning
US20210192271A1 (en) Method and Apparatus for Pose Planar Constraining on the Basis of Planar Feature Extraction
CN109711288A (en) Remote sensing ship detecting method based on feature pyramid and distance restraint FCN
CN106951840A (en) A kind of facial feature points detection method
CN111553200A (en) Image detection and identification method and device
CN113610822B (en) Surface defect detection method based on multi-scale information fusion
CN108614994A (en) A kind of Human Head Region Image Segment extracting method and device based on deep learning
CN111695633A (en) Low-illumination target detection method based on RPF-CAM
CN115841447A (en) Detection method for surface defects of magnetic shoe
CN111027538A (en) Container detection method based on instance segmentation model
CN114549507B (en) Improved Scaled-YOLOv fabric flaw detection method
CN114842201A (en) Sandstone aggregate image segmentation method based on improved Mask _ Rcnn
CN115829995A (en) Cloth flaw detection method and system based on pixel-level multi-scale feature fusion
CN115019181A (en) Remote sensing image rotating target detection method, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant