CN114332008A

CN114332008A - Unsupervised defect detection and positioning method based on multi-level feature reconstruction

Info

Publication number: CN114332008A
Application number: CN202111625694.4A
Authority: CN
Inventors: 陈平平; 毛焕; 陈锋; 林志坚
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-04-12
Anticipated expiration: 2041-12-28

Abstract

The invention relates to an unsupervised defect detection and positioning method based on multi-level feature reconstruction. The method comprises the following steps: acquiring abnormal images without abnormal images and different types of defects of products; extracting a multi-scale feature group by using an abnormal-free image input feature extraction network; inputting the features of the highest dimensionality into a reconstruction network to reconstruct a new feature group corresponding to different scales layer by layer; constructing a loss function training reconstruction network for the two groups of feature groups; and in the testing stage, an anomaly map and an anomaly score are calculated according to the difference condition of the two characteristic groups and are used for judging the anomaly and positioning the defect area. The method effectively utilizes the characteristic information difference of the abnormal image and the abnormal image in different dimensions, and can realize the detection of the defective area of the product, thereby avoiding manual marking and improving the detection efficiency of the product quality.

Description

Unsupervised defect detection and positioning method based on multi-level feature reconstruction

Technical Field

The invention relates to the field of computer vision, in particular to an unsupervised defect detection and positioning method based on multi-level feature reconstruction.

Background

The defect detection has wide and important application in the field of industrial production all the time, and in the industrial production, whether a product is a defective product or not is detected, the defect area of the product is positioned, and the yield of the product can be effectively ensured. In an actual scene, because defect samples have the characteristic of difficult acquisition, the number of the defect samples is very rare relative to the number of normal samples, the types of the defects have uncertainty, and the sizes and the positions of the defects are random, the characteristics of the defects cannot be predicted in advance. In addition, under the actual industrial production scene, certain requirements are also placed on the real-time performance of the defect detection algorithm.

With the continuous development of deep learning in the field of target detection, defect detection has also made a great progress as a branch of the field of target detection. In defect detection, only normal samples or an additional small part of defect samples are usually used as a training set, and the method of supervised learning is not practical in a defect detection task, so the method of unsupervised or weakly supervised learning is usually adopted. How to avoid the need for data labeling and realize the positioning of different types of defect areas is an important issue at present. In order to avoid the need of data labeling, the method is generally established on the characteristic extraction of a normal sample, and the characteristic distribution of the normal sample can be learned by reconstructing or learning the characteristic, so that a certain deviation can occur when a defective sample is faced as the basis of abnormal judgment. In order to realize the positioning of the defect area, an abnormal graph with the size of an input image is constructed, so that the prediction result of the network model on the image defect condition can be reflected intuitively, but the direct reconstruction of the normal image possibly faces the problem of over-strong generalization capability, so that the defect image is reconstructed, and in addition, the positioning of the image block level also does not meet the requirement of positioning the defect areas with different scales.

Disclosure of Invention

The invention aims to provide an unsupervised defect detection and positioning method based on multi-level feature reconstruction, namely, a defect detection framework which is high in efficiency and combines multi-level feature information with different scales is provided, the requirement for marking training data is avoided, and the defect positioning of different types with different sizes is realized. The scheme reduces the cost of manual labeling, improves the detection efficiency, has strong adaptability, and can be used in the defect detection links of different products.

In order to achieve the purpose, the technical scheme of the invention is as follows: an unsupervised defect detection and positioning method based on multi-level feature reconstruction comprises the following steps:

step S1, acquiring abnormal images and abnormal image making data sets of the target product;

s2, constructing a multi-level feature-based feature extraction and reconstruction network;

step S3, inputting a training image data set into the network constructed in the step S2 for training;

step S4, inputting the test image data set into the parameter optimal model for reasoning;

and step S5, obtaining a detection result by adopting an abnormal graph based on multi-level feature difference.

In an embodiment of the present invention, the data set used in step S1 is an industrial inspection abnormal data set MVTecAD, where the MVTecAD data set includes 15 categories of high resolution images with a total of more than 5 thousand, the categories include 10 categories of objects and 5 categories of textures, the training image data set in each category includes only normal images, and the test image data set includes normal images and multiple types of defect images and provides an annotation of a defect area.

In an embodiment of the present invention, the defective region label provided by the data set is converted into a mask image by a binarization method, that is, pixels of the normal region and the background region are set to 0, and pixels of the defective region are set to 255, and the input image and the mask image are adjusted by using a scaling and clipping method for network training and reasoning.

In an embodiment of the present invention, in step S2, a feature extraction and reconstruction network based on multi-level features is constructed, and the feature extraction and reconstruction network is composed of a feature extraction module and a feature reconstruction module; and extracting 3 feature graphs with different scales from the feature extraction module, taking the minimum size feature graph as the input of the feature reconstruction module, and extracting 3 feature graphs with corresponding scales from the feature reconstruction module, wherein the two groups of feature graphs are used for subsequent training and reasoning.

In one embodiment of the invention, the feature extraction module is based on a wideResNet50-2 network architecture.

In an embodiment of the present invention, in step S3, a loss function is used to train parameters of an optimized network model, parameters of a feature extraction module in a reconstructed network and a feature extraction module based on multi-level features remain unchanged, parameters of a feature reconstruction module participate in optimization, and training is performed under a Pytorch deep learning framework; loss function using multi-level feature reconstruction function L_f；

Multilevel feature reconstruction function L_fThe expression of (a) is as follows:

wherein f is_nAnd f'_nAnd respectively representing the feature map extracted by the nth layer and the corresponding reconstructed feature map, wherein the two feature maps are normalized by L2, and MSE represents a mean square error loss function.

In an embodiment of the present invention, in step S5, a group of residual feature maps is obtained by comparing differences between two groups of feature maps output by the feature extraction module and the feature reconstruction module, an abnormal map having the same size as an input image is obtained by performing upsampling, adding, and gaussian filtering operations on the residual feature maps of different layers, an abnormal score is obtained by taking a maximum value in the abnormal map, and a detection and location result of a defect is obtained by selecting an appropriate threshold to perform threshold segmentation on the abnormal map and the abnormal score.

In one embodiment of the present invention, residual error feature map for the nth layer

The expression of (a) is as follows:

wherein f is_nAnd f'_nRespectively representing the characteristic diagram extracted at the n-th layer and the corresponding reconstruction characteristic diagram, wherein the two characteristic diagrams are standardized by L2, | | · | | survival rate₂Representing the L2 norm.

In one embodiment of the present invention, the expression for the anomaly map m is as follows:

wherein G is₄Denoted as a gaussian filter with variance of 4, U is denoted as an upsampling operation using a bilinear interpolation, w_n，h_n，c_nRespectively representing residual characteristic diagrams

Width, height and number of channels.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention provides an efficient multi-level feature extraction module and a feature reconstruction module, which enhance the positioning capability of an abnormal defect area.

2. The invention integrates the difference of the extracted features and the reconstructed features under different scales as output, has strong adaptability to the defect detection of different classes, and improves the defect detection performance.

3. The invention carries out feature reconstruction by the features of the input image, realizes unsupervised learning and reduces the cost of manual marking;

4. the invention can also be applied to product defect detection in other scenes.

Drawings

FIG. 1 is a flow chart of the structure of the embodiment of the present invention.

Fig. 2 is an exemplary diagram of the MVTecAD data set in step S1 according to an embodiment of the present invention.

Fig. 3 is a diagram of a network structure for extracting and reconstructing features based on multi-level features in step S2 according to an embodiment of the present invention.

Fig. 4 is a block diagram of the reconstruction of the constructed features in step S2 according to the embodiment of the present invention.

Fig. 5 shows the detection result of the step S4 of inputting the test image data set into the parameter optimization model for inference in the embodiment of the present invention.

Detailed Description

The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.

The unsupervised defect detection and positioning method based on multi-level feature reconstruction integrates the difference between multi-level features and reconstruction features, realizes the detection and positioning of defect samples under the condition of unsupervised learning, and simultaneously has small calculation overhead of a used network model, thereby meeting the performance requirement and the real-time requirement of a defect detection scene.

The invention relates to an unsupervised defect detection and positioning method based on multi-level feature reconstruction, which comprises the following steps:

The following are specific embodiments of the present invention.

As shown in fig. 1, the present embodiment provides an unsupervised defect detection and location method based on multi-level feature reconstruction, which includes the following steps:

in this example, the step S1 is implemented as follows:

taking an MVTecAD dataset as an example, the label of the defect sample is labeled by the defect area provided by the dataset, a binary mask image is generated according to the labeling information, and then operations of scaling, clipping and standardization are applied to the sample image and the mask image.

s21, constructing a feature extraction module based on the wideResNet50-2 network, extracting 3 layers of feature graphs with different scales, and taking the layer of feature graph with the minimum size and the maximum number of channels as the input of a next feature reconstruction module;

in this example, the step S21 is implemented as follows:

firstly, the first 3 convolutional layers (Conv2, Conv3 and Conv4) of the wideResNet50-2 network are cut as a feature extraction module, and after an image is input, a feature map { f) with different scales of 3 layers is output₁，f₂，f₃H, will f₃As input to the feature reconstruction module.

And step S22, constructing a feature reconstruction module, reconstructing from high dimension to low dimension by using three convolution layers with the high-dimension feature map as input, adjusting the size of the feature map by using an up-sampling operation from layer to layer, and outputting the feature maps of 3 layers corresponding to the size in the feature extraction module.

In this example, the step S22 includes the steps of:

step S221, constructing a feature reconstruction module, which is composed of three convolution layers (Conv4 ', Conv3 ', Conv2 '), each convolution layer includes 3 convolution operations with convolution kernel sizes of 1, 3, and 1, each convolution operation is accompanied by batch normalization operation and a LeakyReLu activation function, each convolution layer samples the output feature map by 2 times by using a bilinear interpolation method, and an output branch of each layer of feature map integrates information between channels by using a convolution operation with a convolution kernel size of 1.

Step S222, the feature f with the largest number of channels and the smallest size output by the feature extraction module₃As an input to the feature reconstruction module,outputting a reconstructed feature set { f 'of corresponding scale size from each convolution layer of the feature reconstruction module'₁，f′₂，f′₃}。

in this example, the step S3 includes the steps of:

and S31, training parameters of an optimized network model by adopting a loss function and an SGD optimizer, keeping parameters of a feature extraction module in the network unchanged, participating in optimization by parameters of a feature reconstruction module, and training under a Pythrch deep learning framework. Loss function using multi-level feature reconstruction function L_f。

Step S32, in the training stage, the training data set is divided into 20% as a verification set, the training condition of the current model parameters is evaluated on the verification set, and the loss L of the model on the verification set is calculated_fAnd taking the training round with the minimum average value as the model with the optimal parameters and storing the model.

step S5, obtaining a detection result by adopting an abnormal graph based on multi-level feature difference;

step S51, calculating difference values between the feature maps of each layer by using the feature map and the reconstructed feature map obtained by reasoning in the step S4 to obtain a residual feature map

Residual profile for nth layer

The expression of (a) is as follows:

Step S52, for the residual feature map of each layer, after upsampling by a bilinear interpolation method, adding the residual feature maps of each layer and passing through a gaussian filter to obtain a final abnormal map m:

residual profile for nth layer

The expression of (a) is as follows:

Width, height and number of channels.

In this example, the step S52 is implemented as follows:

and selecting the maximum value in the abnormal graph as an abnormal score, and performing threshold segmentation on the abnormal graph and the abnormal score by selecting a proper threshold to obtain a defect detection and positioning result.

The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims

1. An unsupervised defect detection and positioning method based on multi-level feature reconstruction is characterized by comprising the following steps:

2. The unsupervised defect detecting and locating method based on multi-level feature reconstruction as claimed in claim 1, wherein the data set used in step S1 is an industrial inspection abnormal data set MVTecAD, the MVTecAD data set includes 15 categories of high resolution images with a total of more than 5 k, the categories include 10 categories of objects and 5 categories of textures, the training image data set in each category includes only normal images, and the testing image data set includes normal images and multiple types of defect images and provides an annotation of defect regions.

3. The unsupervised defect detection and localization method based on multi-level feature reconstruction as claimed in claim 2, wherein the defect region label provided by the data set is converted into the mask image by the binarization method, i.e. the pixels of the normal region and the background region are set to 0, and the pixels of the defect region are set to 255, and the input image and the mask image are adjusted by scaling and cropping method for network training and reasoning.

4. The unsupervised defect detection and localization method based on multilevel feature reconstruction as claimed in claim 1, wherein the step S2 is to construct a multilevel feature-based feature extraction and reconstruction network, which is composed of a feature extraction module and a feature reconstruction module; and extracting 3 feature graphs with different scales from the feature extraction module, taking the minimum size feature graph as the input of the feature reconstruction module, and extracting 3 feature graphs with corresponding scales from the feature reconstruction module, wherein the two groups of feature graphs are used for subsequent training and reasoning.

5. The method as claimed in claim 4, wherein the feature extraction module is based on a wideResNet50-2 network structure.

6. The unsupervised defect detection and localization method based on multilevel feature reconstruction as claimed in claim 4, wherein in step S3, a loss function is used to train parameters of an optimized network model, parameters of a feature extraction module in the feature extraction and reconstruction network based on multilevel features remain unchanged, parameters of a feature reconstruction module participate in optimization, and training is performed under a Pytorch deep learning framework; loss function using multi-level feature reconstruction function L_f；

7. The unsupervised defect detection and localization method based on multi-level feature reconstruction as claimed in claim 4, wherein in step S5, a group of residual feature maps is obtained by comparing the difference between two groups of feature maps output by the feature extraction module and the feature reconstruction module, an abnormal map with the same size as the input image is obtained by performing up-sampling, addition and gaussian filtering operations on the residual feature maps of different layers, the maximum value in the abnormal map is taken to obtain the abnormal score, and the abnormal map and the abnormal score are subjected to threshold segmentation by selecting a proper threshold to obtain the defect detection and localization result.

8. The unsupervised defect detection and localization method based on multi-level feature reconstruction as claimed in claim 7, wherein the residual feature map of the nth layer

The expression of (a) is as follows:

9. The unsupervised defect detecting and locating method based on multi-level feature reconstruction as claimed in claim 8, wherein the expression for the abnormal graph m is as follows:

Width, height and number of channels.