CN114862838A

CN114862838A - Unsupervised learning-based defect detection method and equipment

Info

Publication number: CN114862838A
Application number: CN202210623633.2A
Authority: CN
Inventors: 曾利宏; 李杰明; 杨洋; 黄涛; 吴创廷
Original assignee: Shenzhen Huahan Weiye Technology Co ltd
Current assignee: Shenzhen Huahan Weiye Technology Co ltd
Priority date: 2022-06-02
Filing date: 2022-06-02
Publication date: 2022-08-05

Abstract

The invention provides a defect detection method and device based on unsupervised learning, wherein the method comprises the following steps: acquiring a normal image of a product, and adding artificial noise to the normal image to generate an abnormal image matched with the normal image; training the defect detection network according to the normal image and the abnormal image matched with the normal image to obtain a trained defect detection network; and carrying out defect detection on the image to be detected of the product by adopting the trained defect detection network to obtain a defect detection result. The abnormal images are obtained by adding artificial noise to the normal images, so that positive and negative samples in a training set are balanced, manual marking is not needed, defect detection based on unsupervised learning is realized, and the defect detection effect can be improved.

Description

Unsupervised learning-based defect detection method and equipment

Technical Field

The embodiment of the invention relates to the technical field of image processing, in particular to a defect detection method and device based on unsupervised learning.

Background

In the production process of products, defective products with flaws are inevitably produced due to the limitation of process level or the influence of environmental factors, and the defect detection is an effective means for ensuring the yield of the products. The automatic defect detection by analyzing the image of the product has been widely applied in industrial production due to the advantages of low cost, high efficiency, strong real-time performance, high reliability and the like.

With the rapid development of deep learning in the field of target detection, defect detection as a branch of the field of target detection has also been rapidly developed. At present, a defect detection method based on deep learning usually adopts a supervised learning mode, namely, a normal image of a normal product is collected as a positive sample and added into a training set, a defect image of a defect product is collected as a negative sample and added into the training set, the samples in the training set are labeled manually, and a defect detection model is trained by adopting the training set based on the supervised learning mode. However, in the actual production process, the number of defective products is scarce compared with that of normal products, so that the difficulty in acquiring defect images is increased, the imbalance of positive and negative samples in a training set is caused, and finally, the defect detection effect based on supervised learning is poor.

Disclosure of Invention

The embodiment of the invention provides a defect detection method and device based on unsupervised learning, which are used for solving the problem of poor defect detection effect based on the existing supervised learning.

In a first aspect, an embodiment of the present invention provides a defect detection method based on unsupervised learning, including:

acquiring a normal image of a product, and adding artificial noise to the normal image to generate an abnormal image matched with the normal image;

training the defect detection network according to the normal image and the abnormal image matched with the normal image to obtain the trained defect detection network, wherein the defect detection network comprises a teacher network, a feature compression network, a student network and a projection network, the student network has the same network structure as the teacher network, and the training of the defect detection network comprises the following steps:

inputting the normal image and the abnormal image matched with the normal image into a defect detection network respectively;

extracting the features of the normal image through a teacher network to obtain an original feature map of the normal image; performing feature compression on the original feature map of the normal image through a feature compression network to obtain a compressed feature map of the normal image; performing feature reconstruction on the compressed feature map of the normal image through a student network to obtain a reconstructed feature map of the normal image; determining a first loss function according to the original feature map of the normal image and the reconstruction feature map of the normal image, wherein the first loss function is in negative correlation with the similarity between the original feature map of the normal image and the reconstruction feature map of the normal image;

extracting the characteristics of the abnormal image through a teacher network to obtain an original characteristic diagram of the abnormal image; performing feature compression on the original feature map of the abnormal image through a feature compression network to obtain a compressed feature map of the abnormal image; performing feature reconstruction on the compressed feature map of the abnormal image through a student network to obtain a reconstructed feature map of the abnormal image; respectively carrying out projection dimensionality reduction on the original characteristic diagram of the abnormal image and the reconstructed characteristic diagram of the abnormal image through a projection network to respectively obtain an original characteristic vector of the abnormal image and a reconstructed characteristic vector of the abnormal image; determining a second loss function according to the original characteristic vector of the abnormal image and the reconstruction characteristic vector of the abnormal image, wherein the second loss function is positively correlated with the similarity between the original characteristic vector of the abnormal image and the reconstruction characteristic vector of the abnormal image;

performing iterative updating on network parameters of the feature compression network, the student network and the projection network by taking the sum of the minimized first loss function and the minimized second loss function as a target until an iteration termination condition is met, and obtaining a trained defect detection network;

and carrying out defect detection on the image to be detected of the product by adopting the trained defect detection network to obtain a defect detection result.

In one embodiment, multi-scale feature extraction of different levels is carried out on a normal image through a teacher network to obtain n (n is more than or equal to 2) layers of original feature maps of the normal image; performing feature compression on n layers of original feature maps of the normal image through a feature compression network to obtain a compressed feature map of the normal image; performing reverse-sequence layer-by-layer feature reconstruction on the compressed feature map of the normal image through a student network to obtain n layers of reconstructed feature maps of the normal image;

carrying out multi-scale feature extraction on the abnormal image at different levels through a teacher network to obtain n layers of original feature maps of the abnormal image; performing feature compression on n layers of original feature maps of the abnormal image through a feature compression network to obtain a compressed feature map of the abnormal image; performing reverse-sequence layer-by-layer feature reconstruction on the compressed feature map of the abnormal image through a student network to obtain n layers of reconstructed feature maps of the abnormal image; and respectively carrying out projection dimensionality reduction on the n layers of original feature maps of the abnormal images and the n layers of reconstructed feature maps of the abnormal images through a projection network to respectively obtain n layers of original feature vectors of the abnormal images and n layers of reconstructed feature vectors of the abnormal images.

In one embodiment, the feature compression of n layers of original feature maps of a normal image by a feature compression network to obtain a compressed feature map of the normal image includes: respectively carrying out convolution, batch standardization and activation processing on each layer of original feature map in the first n-1 layers of original feature maps of the normal image, and converting the original feature maps into original feature maps with the same feature resolution as the nth layer of original feature map of the normal image to obtain n layers of original feature maps with uniform resolution of the normal image; carrying out channel splicing on n layers of original feature maps with uniform resolution of the normal image on a channel to obtain a fused original feature map of the normal image; sequentially performing space compression and channel compression on the fused original feature map of the normal image to obtain a compressed feature map of the normal image;

the method for performing feature compression on n layers of original feature maps of the abnormal image through a feature compression network to obtain a compressed feature map of the abnormal image comprises the following steps: respectively carrying out convolution, batch standardization and activation processing on each layer of original feature map in the first n-1 layers of original feature maps of the abnormal image, converting the original feature maps into original feature maps with the same feature resolution as the nth layer of original feature map of the abnormal image, and obtaining n layers of original feature maps with uniform resolution of the abnormal image; channel splicing is carried out on n layers of original feature maps with uniform resolution of the abnormal image on a channel to obtain a fused original feature map of the abnormal image; and sequentially performing space compression and channel compression on the fused original characteristic diagram of the abnormal image to obtain a compressed characteristic diagram of the abnormal image.

In one embodiment, the first penalty function is determined according to the following expression:

wherein L is _pull Representing a first loss function, α _k K-th layer original feature map, gamma, representing a normal image _k K-th layer reconstruction feature map, v, representing normal image _k Is represented by (alpha) _k ) ^T ·γ _k Number of elements in (1).

In one embodiment, the second penalty function is determined according to the following expression:

wherein L is _push Representing a second loss function, ε _k Indicating an abnormality mapK-th layer original feature vector, σ, of the image _k A k-th layer reconstructed feature vector representing an abnormal image.

In one embodiment, the method for detecting the defects of the image to be detected of the product by using the trained defect detection network comprises the following steps:

inputting an image to be detected into a trained defect detection network to obtain n layers of original characteristic diagrams and n layers of reconstructed characteristic diagrams of the image to be detected;

calculating the distance between the original characteristic diagram and the reconstructed characteristic diagram of the corresponding layer of the image to be detected to obtain n layers of abnormal prediction subgraphs;

the n layers of abnormal prediction subgraphs are up-sampled to the resolution of the image to be detected and fused at the positions of corresponding pixel points to obtain an abnormal prediction graph with the same size as the image to be detected, and elements in the abnormal prediction graph are used for expressing the probability that the corresponding pixel points in the image to be detected are abnormal pixels;

and determining a defect detection result according to the abnormal prediction graph.

In one embodiment, determining a defect detection result from an anomaly prediction map includes:

determining the maximum value in the abnormal prediction image as the abnormal score of the image to be detected;

when the abnormal score is larger than or equal to a preset abnormal threshold value, determining the image to be detected as an abnormal image;

and when the abnormal score is smaller than a preset abnormal threshold value, determining the image to be detected as a normal image.

carrying out binarization processing on the abnormal prediction image according to a preset binarization threshold value to obtain an abnormal segmentation image of the image to be detected;

and performing connectivity analysis on the abnormal segmentation graph, and determining a defect detection result according to a communication result.

In a second aspect, an embodiment of the present invention provides a defect detection apparatus based on unsupervised learning, including: at least one processor and memory;

the memory stores computer-executable instructions;

at least one processor executes computer-executable instructions stored in a memory, the processor being configured to perform the unsupervised learning-based defect detection method according to any of the first aspect.

In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the computer-executable instructions are executed by a processor, the method for detecting defects based on unsupervised learning according to any one of the first aspect is implemented.

According to the defect detection method and device based on unsupervised learning, the abnormal images matched with the normal images are generated by adding artificial noise to the normal images, the normal images and the abnormal images are matched one by one to construct the training data set without manual marking, the defect detection network is trained based on an unsupervised learning mode, and the balance of positive and negative samples in the training data set is beneficial to improving the detection effect of the defect detection network; the compressed feature map obtained by feature compression of the original feature map is subjected to feature reconstruction based on the feature compression network, and compared with the method of directly performing feature transmission in a short-circuit mode between a teacher network and a student network in a knowledge distillation mode, the method can effectively avoid the leakage of the feature representation capability of the abnormal image; training is carried out by taking the sum of the first loss function and the second loss function as a target, so that the distance between the original characteristic diagram and the reconstructed characteristic diagram of the normal image can be reduced, and the distance between the original characteristic vector and the reconstructed characteristic vector of the abnormal image matched with the original characteristic diagram and the reconstructed characteristic diagram can be increased. The trained defect detection network in the embodiment can reduce the characteristic reconstruction capability of the abnormal image and enlarge the distance between the original characteristic vector and the reconstructed characteristic vector of the abnormal image while maintaining the fine characteristic reconstruction of the normal image, so that the defect detection of the image to be detected can be realized by adopting the trained defect detection network in the embodiment, the false detection can be reduced, and the defect detection precision can be improved.

Drawings

Fig. 1 is a flowchart of a defect detection method based on unsupervised learning according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a training process of a normal image according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a training process of an abnormal image according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of defect segmentation according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a defect detection apparatus based on unsupervised learning according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments are numbered with like associated elements. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.

Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.

The ordinal numbers used herein for the components, such as "first," "second," etc., are used merely to distinguish between the objects described, and do not have any sequential or technical meaning. The term "connected" and "coupled" when used in this application, unless otherwise indicated, includes both direct and indirect connections (couplings).

The defect detection technology based on machine vision is widely applied to industrial production with the advantages of high detection safety, reliability, high automation degree and the like. However, in the actual production process, the number of images of defective products is scarce compared with that of images of normal products, the difficulty in collecting a large number of image data containing all defects is high, and the labeling cost is high, so that the training concentration is unbalanced in positive and negative samples, the defect detection effect based on supervised learning is poor, the use scene is narrow, and the large-scale popularization and use are difficult. Compared with the prior art, the method has the advantages that the normal image of the normal product is easy to obtain, so that the abnormal image is obtained by adding artificial noise to the normal image, the defect detection is realized by adopting a characteristic reconstruction mode, the defect detection based on unsupervised learning can be realized under the condition of only the normal image, the method can adapt to numerous scenes of industrial production, and the large-scale application of the surface abnormality detection of the image of the industrial product and the improvement of the detection performance are realized. The following will be described in detail by way of specific examples.

Referring to fig. 1, a defect detection method based on unsupervised learning according to an embodiment of the present invention may include:

s101, acquiring a normal image of a product, and adding artificial noise to the normal image to generate an abnormal image matched with the normal image.

The product in this embodiment may be any product in industrial production, and may be, for example, a mobile phone screen glass, a battery, an automobile housing, and the like. It can be understood that the trained defect detection network can be used for detecting the defects of any product by using the images of any product for training.

The normal image is obtained by image acquisition of normal products, and is easy to obtain due to the fact that the number of the normal products is large and the normal images are easy to obtain. The artificial noise in this embodiment may be obtained from large-scale natural image data in a random sampling manner. It should be noted that artifacts are not used to simulate the appearance of one or more real defects in an industrial production environment, but rather to generate an appearance that is outside of the normal image distribution. An abnormal image generated by adding artificial noise to a normal image has a data distribution different from that of the normal image.

A large number of normal images are collected, and a matched training sample is formed by adding artificial noise to form a training data set. The training samples in the training dataset are presented in pairs, i.e. normal images and their paired abnormal images are trained as a set. In the embodiment, the training data set can be constructed under the condition that only normal images exist, and positive and negative samples in the training data set appear in pairs without manual labeling, so that the cost of data acquisition is reduced, and the balanced training samples are beneficial to improving the defect detection performance of the defect detection network.

And S102, training the defect detection network according to the normal image and the abnormal image matched with the normal image to obtain the trained defect detection network.

The defect detection network (F) in this embodiment may be built based on a deep neural network, and may include a teacher network (T), a feature compression network (C), a student network (S), and a projection network (P). The teacher network is used for extracting features from the images, the feature compression network is used for compressing the features extracted by the teacher network, the student network is used for reconstructing the features compressed by the feature compression network, and the projection network is used for performing projection dimensionality reduction on the features of the abnormal images. The network structure of the student network is the same as that of the teacher network, the network parameters of the teacher network are kept unchanged in the training process, and the characteristics of the student network reconstruction and the characteristics extracted by the teacher network have the same dimensionality.

During training, the normal images and the matched abnormal images are used as a group of input defect detection networks for training. For a normal image, the distance between the extracted feature and the reconstructed feature is reduced as a training target; for abnormal images, the training goal is to increase the difference between the extracted features and the reconstructed features. Due to the fact that the normal images and the abnormal images have different distributions, the defect detection network can identify the defects through normality deviating from the normal images through training.

And respectively inputting the normal image and the abnormal image matched with the normal image into a defect detection network to train the normal image and the abnormal image. How to train the defect detection network will be described in detail below by a training process of normal images and a training process of abnormal images, respectively.

Extracting the features of the normal image through a teacher network to obtain an original feature map of the normal image; performing feature compression on the original feature map of the normal image through a feature compression network to obtain a compressed feature map of the normal image; performing feature reconstruction on the compressed feature map of the normal image through a student network to obtain a reconstructed feature map of the normal image; and determining a first loss function according to the original feature map of the normal image and the reconstructed feature map of the normal image, wherein the first loss function is in negative correlation with the similarity between the original feature map of the normal image and the reconstructed feature map of the normal image. For normal images, the training goal is to reduce the distance between the original feature map and the reconstructed feature map. That is, under the control of the first loss function, the distance between the original feature map of the normal image and the reconstructed feature map of the normal image is made as small as possible, and the feature recovery capability of the student network society on the normal image is trained. When the distance between the original feature map of the normal image and the reconstructed feature map of the normal image is smaller, the similarity between the original feature map of the normal image and the reconstructed feature map of the normal image is higher, and the value of the first loss function is smaller, so that the training can be performed by minimizing the first loss function.

Extracting the characteristics of the abnormal image through a teacher network to obtain an original characteristic diagram of the abnormal image; performing feature compression on the original feature map of the abnormal image through a feature compression network to obtain a compressed feature map of the abnormal image; performing feature reconstruction on the compressed feature map of the abnormal image through a student network to obtain a reconstructed feature map of the abnormal image; respectively carrying out projection dimensionality reduction on the original characteristic diagram of the abnormal image and the reconstructed characteristic diagram of the abnormal image through a projection network to respectively obtain an original characteristic vector of the abnormal image and a reconstructed characteristic vector of the abnormal image; and determining a second loss function according to the original characteristic vector of the abnormal image and the reconstructed characteristic vector of the abnormal image, wherein the second loss function is positively correlated with the similarity between the original characteristic vector of the abnormal image and the reconstructed characteristic vector of the abnormal image. It should be noted that all pixels in the abnormal image are not abnormal pixels, and the abnormal pixels are usually only located in a partial region, so that the training cannot be performed in a point-to-point manner. Therefore, in this embodiment, projection dimension reduction is performed on the original feature map and the reconstructed feature map of the abnormal image through a projection network, so as to obtain a corresponding feature vector representation. The low-dimensional feature vector represents whether the image is abnormal or not, and the difference between the original feature vector and the reconstructed feature vector of the abnormal image can be made as large as possible under the control of the second loss function. When the distance between the original feature vector and the reconstructed feature vector of the abnormal image is larger, the similarity between the original feature vector and the reconstructed feature vector is lower, the value of the second loss function is smaller, and therefore the training can be carried out by minimizing the second loss function.

And matching the normal images and the abnormal images one by one, and inputting the images into a defect detection network for training to jointly optimize the network. In order to improve the feature reconstruction capability of the defect detection network on the normal image and reduce the feature reconstruction capability of the defect detection network on the abnormal image, the network parameters of the feature compression network, the student network and the projection network can be iteratively updated by taking the sum of the minimized first loss function and the minimized second loss function as a target until the iteration termination condition is met, and the trained defect detection network is obtained. When the trained defect detection network is used for detection, the network parameters are kept unchanged.

The feature reconstruction in this embodiment is performed on the basis of a compressed feature map obtained by feature compressing the original feature map by the feature compression network, and compared with a method in which feature transfer is directly performed in a short-circuit manner between the teacher network and the student network in a knowledge distillation manner, leakage of feature representation capability of an abnormal image can be effectively avoided, thereby avoiding excessive enhancement of generalization of the student network. The trained defect detection network in the embodiment can reduce the characteristic reconstruction capability of the abnormal image while maintaining the refined characteristic reconstruction of the normal image, enlarge the distance between the original characteristic vector and the reconstructed characteristic vector of the abnormal image, effectively reduce the false detection of the normal image and improve the precision of defect detection.

S103, defect detection is carried out on the image to be detected of the product by adopting the trained defect detection network, and a defect detection result is obtained.

In this embodiment, after the trained defect detection network is obtained, the trained defect detection network may be used to perform defect detection on the image to be detected of the product. Specifically, the image to be detected can be input into a trained defect detection network, and a defect detection result can be determined according to the distance between the original characteristic diagram and the reconstructed characteristic diagram of the obtained image to be detected. The defect detection result may be, for example, determining whether the whole image to be detected is an abnormal image according to the distance between the original feature map and the reconstructed feature map; and the accurate segmentation of the defects can be realized according to the distances between all pixel points calculated in a point-to-point mode between the original characteristic diagram and the reconstructed characteristic diagram.

According to the defect detection method based on unsupervised learning, the abnormal images matched with the normal images are generated by adding artificial noise to the normal images, the normal images and the abnormal images are matched one by one to construct the training data set, manual marking is not needed, the defect detection network is trained based on the unsupervised learning mode, and the balance of positive and negative samples in the training data set is beneficial to improving the detection effect of the defect detection network; the compressed feature map obtained by feature compression of the original feature map is subjected to feature reconstruction based on the feature compression network, and compared with the method of directly performing feature transmission in a short-circuit mode between a teacher network and a student network in a knowledge distillation mode, the method can effectively avoid the leakage of the feature representation capability of the abnormal image; training is carried out by taking the sum of the first loss function and the second loss function as a target, so that the distance between the original characteristic diagram and the reconstructed characteristic diagram of the normal image can be reduced, and the distance between the original characteristic vector and the reconstructed characteristic vector of the abnormal image matched with the original characteristic diagram and the reconstructed characteristic diagram can be increased. The trained defect detection network in the embodiment can reduce the characteristic reconstruction capability of the abnormal image and enlarge the distance between the original characteristic vector and the reconstructed characteristic vector of the abnormal image while maintaining the fine characteristic reconstruction of the normal image, so that the defect detection of the image to be detected can be realized by adopting the trained defect detection network in the embodiment, the false detection can be reduced, and the defect detection precision can be improved.

On the basis of the embodiment, the method provided by the implementation can be used for extracting multi-scale features of different levels from a normal image through a teacher network to obtain n (n is more than or equal to 2) layers of original feature maps of the normal image, so that the defects of the product comprise large targets with larger sizes and small targets with smaller sizes, and the defects of different sizes and shapes can be detected; performing feature compression on n layers of original feature maps of the normal image through a feature compression network to obtain a compressed feature map of the normal image; and performing reverse-sequence layer-by-layer feature reconstruction on the compressed feature map of the normal image through a student network to obtain n layers of reconstructed feature maps of the normal image.

Fig. 2 is a schematic diagram of a training process of a normal image according to an embodiment of the present invention. Fig. 2 illustrates an example where n is 3. As shown in FIG. 2, a normal image x is input into a defect detection network, and multi-scale feature extraction of different levels is firstly performed on the normal image x through a teacher network T in a defect detection network F to sequentially obtain a first-level original feature map alpha from a low level to a high level ₁ The second layer original characteristic diagram alpha ₂ And a third layer original feature map alpha ₃ . The higher the level, the lower the resolution of the feature map, and the greater the number of channels. That is, the higher the level of the feature map, the larger the receptive field of the pixels, the more friendly it is to large targets; the lower levels of the feature map have smaller receptive fields of pixels, which is more friendly to small targets. Alpha is alpha ₁ 、α ₂ And alpha ₃ A multi-scale raw feature map of the normal image x is constructed. And then, performing feature compression on the multi-scale original feature map of the normal image x through a feature compression network C in the defect detection network F to obtain a compressed feature map beta of the normal image x. Finally, performing reverse-sequence layer-by-layer feature reconstruction on the compressed feature map beta of the normal image x through a student network S in the defect detection network F to sequentially obtain a third-layer reconstructed feature map gamma from a high level to a low level ₃ Second layer reconstruction feature map gamma ₂ And a first layer reconstruction feature map gamma ₁ . Wherein alpha is ₁ And gamma ₁ 、α ₂ And gamma ₂ 、α ₃ And gamma ₃ Are correspondingly equal.

When multi-scale feature extraction of different levels is performed on a normal image, feature compression is performed on n layers of original feature maps of the normal image through a feature compression network to obtain a compressed feature map of the normal image, which may include: respectively carrying out convolution, batch standardization and activation processing on each layer of original feature map in the first n-1 layers of original feature maps of the normal image, and converting the original feature maps into original feature maps with the same feature resolution as the nth layer of original feature map of the normal image to obtain n layers of original feature maps with uniform resolution of the normal image; channel splicing is carried out on n layers of original feature maps with uniform resolution of the normal images on the channels to obtain fused original feature maps of the normal images; and sequentially performing space compression and channel compression on the fusion original characteristic diagram of the normal image to obtain a compression characteristic diagram of the normal image. Taking the 3-layer original feature map extracted as shown in fig. 2 as an example, the feature compression network C may be according to α ₃ Respectively for α ₁ And alpha ₂ Convolution, batch normalization BN and Relu activation processes are performed to reduce feature resolution and increase channel number, alpha ₁ And alpha ₂ Conversion to and ₃ and obtaining 3 layers of original feature maps with uniform resolution of the normal image x by using the original feature maps with the same feature resolution. And carrying out channel splicing on the 3 layers of original feature maps with uniform resolution to obtain a fusion original feature map of the normal image x. And further compressing the fused original feature map in space, for example, further reducing the feature resolution by using a convolutional layer, then performing 1x1 convolution, reducing the number of feature channels, completing channel compression, and finally obtaining a compressed feature map beta of the normal image x.

When multi-scale feature extraction of different levels is carried out on the normal image, the first loss function is determined according to n layers of original feature maps of the normal image and n layers of reconstruction feature maps corresponding to the original feature maps. And minimizing the first loss function, namely, minimizing the distance between each layer of original feature map of the normal image and each layer of reconstructed feature map corresponding to the original feature map. In an alternative embodiment, the first loss function may be determined according to the following expression:

wherein L is _pull Representing a first loss function, α _k K-th layer original feature map, gamma, representing a normal image _k K-th layer reconstruction feature map, v, representing normal image _k Is represented by (alpha) _k ) ^T ·γ _k Number of elements in (1). When alpha is expressed by the above formula _k And gamma _k The higher the similarity between them, i.e. the smaller the distance between them, the smaller the value of the first loss function. The quality of the reconstruction features can be effectively controlled by calculating the average of cosine losses of each layer.

For the abnormal image, the method provided by the implementation can further perform multi-scale feature extraction of different levels on the abnormal image through a teacher network on the basis of the embodiment to obtain n layers of original feature maps of the abnormal image; performing feature compression on n layers of original feature maps of the abnormal image through a feature compression network to obtain a compressed feature map of the abnormal image; performing reverse-sequence layer-by-layer feature reconstruction on the compressed feature map of the abnormal image through a student network to obtain n layers of reconstructed feature maps of the abnormal image; and respectively carrying out projection dimensionality reduction on the n layers of original feature maps of the abnormal images and the n layers of reconstructed feature maps of the abnormal images through a projection network to respectively obtain n layers of original feature vectors of the abnormal images and n layers of reconstructed feature vectors of the abnormal images.

Fig. 3 is a schematic diagram of a training process of an abnormal image according to an embodiment of the present invention. Fig. 3 is also illustrated with n-3 as an example. As shown in fig. 3, an abnormal image x obtained by adding artificial noise to the normal image x ⁺ Inputting the defect detection network, and firstly, processing the abnormal image x through the teacher network T in the defect detection network F ⁺ Carrying out multi-scale feature extraction of different levels to obtain a first-level original feature map from a low level to a high level in sequence

Second layer original feature map

And a third layer of original feature maps

The higher the level, the lower the resolution of the feature map, and the greater the number of channels. That is, the higher the level of the feature map, the larger the receptive field of the pixels, the more friendly it is to large targets; the lower levels of the feature map have smaller receptive fields of pixels, which is more friendly to small targets.

And

an abnormal image x is formed ⁺ The multi-scale original feature map of (1). The anomaly image x is then compressed by the feature compression network C in the defect detection network F ⁺ The multi-scale original characteristic diagram is subjected to characteristic compression to obtain an abnormal image x ⁺ Compressed characteristic map beta of ⁺ . Then abnormal image x is detected by student network S in defect detection network F ⁺ Compressed characteristic map beta of ⁺ Performing reverse-sequence layer-by-layer feature reconstruction to obtain a third-layer reconstruction feature map from a high level to a low level in sequence

Second layer reconstructed feature map

And first layer reconstructed feature map

Finally, abnormal images x are respectively processed through a projection network P ⁺ Multi-scale original feature map and abnormal image x ⁺ The multi-scale reconstruction characteristic diagram is subjected to projection dimension reduction to respectively obtain abnormal images x ⁺ Of the multi-scale original feature vector epsilon ₁ 、ε ₂ And ε ₃ And an abnormal image x ⁺ Multi-scale reconstruction of feature vectors sigma ₁ 、σ ₂ And σ ₃ 。

When multi-scale feature extraction of different levels is performed on an abnormal image, feature compression is performed on n layers of original feature maps of the abnormal image through a feature compression network to obtain a compressed feature map of the abnormal image, which may include: respectively carrying out convolution, batch standardization and activation processing on each layer of original feature map in the first n-1 layers of original feature maps of the abnormal image, converting the original feature maps into original feature maps with the same feature resolution as the nth layer of original feature map of the abnormal image, and obtaining n layers of original feature maps with uniform resolution of the abnormal image; channel splicing is carried out on n layers of original feature maps with uniform resolution of the abnormal image on a channel to obtain a fused original feature map of the abnormal image; and sequentially performing space compression and channel compression on the fused original characteristic diagram of the abnormal image to obtain a compressed characteristic diagram of the abnormal image. Taking the 3-layer original feature map extracted as shown in fig. 3 as an example, the feature compression network C may be according to

Respectively, in

And

performing convolution, batch normalization BN and ReLU activation processes to reduce feature resolution and increase channel count will

And

is converted into and

obtaining an abnormal image x from the original feature map with the same feature resolution ⁺ With uniform resolution3 layers of original feature maps. Performing channel splicing on 3 layers of original characteristic graphs with uniform resolution to obtain an abnormal image x ⁺ Fusing the original feature map. Further compressing the fused original feature map in space, for example, using convolution layer to further reduce feature resolution, then performing 1x1 convolution to reduce the number of feature channels, completing channel compression, and finally obtaining abnormal image x ⁺ Compressed characteristic map beta of ⁺ 。

When multi-scale feature extraction of different levels is carried out on the abnormal image, the second loss function is determined according to n layers of original feature vectors of the abnormal image and n layers of reconstruction feature vectors corresponding to the original feature vectors. And minimizing the second loss function, namely maximizing the distance between each layer of original feature map of the abnormal image and each layer of reconstructed feature map corresponding to the original feature map. In an alternative embodiment, the second loss function may be determined according to the following expression:

wherein L is _push Representing a second loss function, ε _k K-th layer original feature vector, σ, representing an abnormal image _k A k-th layer reconstructed feature vector representing an abnormal image. When ε is equal to _k And σ _k The larger the distance between, the smaller the second loss function.

In an alternative embodiment, the overall loss function L of the defect detection network F may be constructed according to the following expression _total ：

L _total ＝L _pull +L _push

I.e. the defect detection network F is trained with the goal of minimizing the sum of the first loss function and the second loss function.

In the defect detection method based on unsupervised learning, based on the above embodiment, further by performing multi-scale feature extraction and reconstruction in different levels, and integrating the distances between the original features and the reconstructed features of each layer, defects with different sizes and shapes can be detected, which is more sensitive to fine defects and improves the detection capability of fine defects.

On the basis of any of the foregoing embodiments, in the method provided in this embodiment, the defect detection is performed on the to-be-detected image of the product by using the trained defect detection network, and obtaining the defect detection result may include:

and S1031, inputting the image to be detected into the trained defect detection network to obtain n layers of original characteristic diagrams and n layers of reconstructed characteristic diagrams of the image to be detected.

After the training of the defect detection network F is completed, the projection network P does not participate in the detection process. After the trained defect detection network is obtained, the acquired image to be detected is input into the trained defect detection network, so that n layers of original characteristic graphs alpha of the image to be detected can be obtained _k (k ═ 1,2, …, n) and n-layer reconstruction signatures γ _k (k＝1,2,…,n)。

S1032, calculating the distance between the original characteristic diagram and the reconstructed characteristic diagram of the corresponding layer of the image to be detected to obtain n layers of abnormal prediction subgraphs.

When the distance between the original characteristic diagram and the reconstructed characteristic diagram is larger, the probability of the image to be detected to be abnormal is larger. In the present embodiment, alpha can be calculated _k And gamma _k To obtain a corresponding anomaly prediction subgraph M _k . In an alternative embodiment, M may be determined in a point-to-point manner according to the following expression _k ：

When alpha is _k And gamma _k The greater the distance between M _k The larger the value of the element in (b), the larger the value indicates that the element belongs to the abnormal pixel with a higher probability.

S1033, the n layers of abnormal prediction subgraphs are up-sampled to the resolution of the image to be detected, fusion is carried out at the positions of corresponding pixel points, an abnormal prediction graph with the same size as the image to be detected is obtained, and elements in the abnormal prediction graph are used for representing the probability that the corresponding pixel points in the image to be detected are abnormal pixels.

It can be understood that the abnormal prediction subgraphs of each layer determined according to the original feature map and the reconstructed feature map with different scales have different resolutions, but still retain the position information of the defect. In order to realize defective pixel level positioning in the image to be detected, n layers of abnormal prediction subgraphs M can be used in the embodiment _k And (4) up-sampling to the resolution of the image to be detected, and fusing at the position of the corresponding pixel point to obtain an abnormal prediction image A with the same size as the image to be detected. Specifically, the calculation formula of the pixel-level abnormality prediction map a is as follows:

a is the same as the size of the image to be detected, each element in A is used for representing the probability that the corresponding pixel point in the image to be detected is an abnormal pixel, and the probability that the pixel is an abnormal pixel is higher if the numerical value is larger; upesample represents to perform upsampling, and the sampling rate of the upsampling can be according to the image to be detected and M _k The ratio of the resolution sizes.

S1034, determining a defect detection result according to the abnormal prediction graph.

The defect classification or defect segmentation may be performed based on the abnormality prediction map.

In an alternative embodiment, the rapid classification of the image to be detected can be realized according to the maximum value in the abnormality prediction map. Specifically, the maximum value in the abnormality prediction map may be determined as the abnormality Score of the image to be detected, that is, the abnormality Score ═ max (a). When the abnormal score is larger than or equal to a preset abnormal threshold value, determining the image to be detected as an abnormal image; and when the abnormal score is smaller than a preset abnormal threshold value, determining the image to be detected as a normal image. By comparing the abnormal score with a preset abnormal threshold value, the images to be detected can be classified quickly.

In an optional implementation manner, pixel-level segmentation of the image to be detected can also be realized according to the abnormal prediction graph. Specifically, binarization processing can be performed on the abnormal prediction image according to a preset binarization threshold value to obtain an abnormal segmentation image of the image to be detected, connectivity analysis can be performed on the abnormal segmentation image, and a defect detection result can be determined according to the connectivity result. It can be understood that the abnormal segmentation map can include all the possible abnormal pixel points by adjusting the size of the binarization threshold; furthermore, in order to reduce the false detection rate, connectivity analysis can be performed on the abnormal segmentation graph, and the defect detection result can be determined according to the area, the confidence degree and the like of the connectivity result. For example, for some isolated points in the abnormal segmentation map, it may be that normal pixel points are mistakenly identified as abnormal points, and the false detection rate can be reduced by screening the areas of the connected results. As shown in fig. 4, the left image to be detected is input into the defect detection network, so that the right pixel-level defect detection result can be obtained, and the precise positioning of the defect is realized.

Based on the above embodiment, the defect detection method based on unsupervised learning further obtains n layers of abnormal prediction subgraphs by calculating the distance between each layer of original feature graph and the reconstructed feature graph of the image to be detected, then up-samples the n layers of abnormal prediction subgraphs to the resolution of the image to be detected, and performs fusion at the corresponding pixel point position to obtain the abnormal prediction graph with the same size as the image to be detected, and according to the abnormal prediction graph, not only can the image to be detected be rapidly classified, but also the pixel level segmentation of the image to be detected can be realized.

Fig. 5 shows defect detecting equipment based on unsupervised learning, and the embodiment of the invention is only described with reference to fig. 5, which does not mean that the invention is limited thereto. Fig. 5 is a schematic structural diagram of a defect detection apparatus based on unsupervised learning according to an embodiment of the present invention. As shown in fig. 5, the unsupervised learning-based defect detection apparatus 50 provided in the present embodiment may include: memory 501, processor 502, and bus 503. The bus 503 is used to realize connection between the elements.

The memory 501 stores a computer program, and when the computer program is executed by the processor 502, the computer program can implement the technical solution of the defect detection method based on unsupervised learning provided by any of the above method embodiments.

Wherein, the memory 501 and the processor 502 are electrically connected directly or indirectly to realize the data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines, such as bus 503. The memory 501 stores a computer program for implementing the unsupervised learning-based defect detection method, which includes at least one software functional module that can be stored in the memory 501 in the form of software or firmware, and the processor 502 executes various functional applications and data processing by running the software program and the module stored in the memory 501.

It will be appreciated that the configuration of fig. 5 is merely illustrative and may include more or fewer components than shown in fig. 5 or have a different configuration than shown in fig. 5. The components shown in fig. 5 may be implemented in hardware and/or software. For example, an image recording device for recording an image to be recorded can also be provided.

Reference is made herein to various exemplary embodiments. However, those skilled in the art will recognize that changes and modifications may be made to the exemplary embodiments without departing from the scope hereof. For example, the various operational steps, as well as the components used to perform the operational steps, may be implemented in differing ways depending upon the particular application or consideration of any number of cost functions associated with operation of the system (e.g., one or more steps may be deleted, modified or incorporated into other steps).

Additionally, as will be appreciated by one skilled in the art, the principles herein may be reflected in a computer program product on a computer readable storage medium, which is pre-loaded with computer readable program code. Any tangible, non-transitory computer-readable storage medium may be used, including magnetic storage devices (hard disks, floppy disks, etc.), optical storage devices (CD-ROMs, DVDs, Blu Ray disks, etc.), flash memory, and/or the like. These computer program instructions may be loaded onto a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions which execute on the computer or other programmable data processing apparatus create means for implementing the functions specified. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including means for implementing the function specified. The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified.

The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.

Claims

1. A defect detection method based on unsupervised learning is characterized by comprising the following steps:

training a defect detection network according to the normal image and the abnormal image matched with the normal image to obtain a trained defect detection network, wherein the defect detection network comprises a teacher network, a feature compression network, a student network and a projection network, the student network and the teacher network have the same network structure, and the training of the defect detection network comprises the following steps:

inputting the normal image and the abnormal image matched with the normal image into the defect detection network respectively;

performing feature extraction on the normal image through the teacher network to obtain an original feature map of the normal image; performing feature compression on the original feature map of the normal image through the feature compression network to obtain a compressed feature map of the normal image; performing feature reconstruction on the compressed feature map of the normal image through the student network to obtain a reconstructed feature map of the normal image; determining a first loss function according to the original feature map of the normal image and the reconstructed feature map of the normal image, wherein the first loss function is in negative correlation with the similarity between the original feature map of the normal image and the reconstructed feature map of the normal image;

performing feature extraction on the abnormal image through the teacher network to obtain an original feature map of the abnormal image; performing feature compression on the original feature map of the abnormal image through the feature compression network to obtain a compressed feature map of the abnormal image; performing feature reconstruction on the compressed feature map of the abnormal image through the student network to obtain a reconstructed feature map of the abnormal image; respectively carrying out projection dimensionality reduction on the original feature map of the abnormal image and the reconstructed feature map of the abnormal image through the projection network to respectively obtain an original feature vector of the abnormal image and a reconstructed feature vector of the abnormal image; determining a second loss function according to the original feature vector of the abnormal image and the reconstructed feature vector of the abnormal image, wherein the second loss function is positively correlated with the similarity between the original feature vector of the abnormal image and the reconstructed feature vector of the abnormal image;

performing iterative updating on the network parameters of the feature compression network, the student network and the projection network by taking the minimized sum of the first loss function and the second loss function as a target until an iteration termination condition is met to obtain a trained defect detection network;

2. The method of claim 1,

carrying out multi-scale feature extraction on the normal image at different levels through the teacher network to obtain n (n is more than or equal to 2) layers of original feature maps of the normal image; performing feature compression on the n layers of original feature maps of the normal image through the feature compression network to obtain a compressed feature map of the normal image; performing reverse-sequence layer-by-layer feature reconstruction on the compressed feature map of the normal image through the student network to obtain n layers of reconstructed feature maps of the normal image;

carrying out multi-scale feature extraction on the abnormal image at different levels through the teacher network to obtain n layers of original feature maps of the abnormal image; performing feature compression on the n layers of original feature maps of the abnormal image through the feature compression network to obtain a compressed feature map of the abnormal image; performing reverse-sequence layer-by-layer feature reconstruction on the compressed feature map of the abnormal image through the student network to obtain n layers of reconstructed feature maps of the abnormal image; and respectively carrying out projection dimensionality reduction on the n layers of original feature maps of the abnormal images and the n layers of reconstructed feature maps of the abnormal images through the projection network to respectively obtain n layers of original feature vectors of the abnormal images and n layers of reconstructed feature vectors of the abnormal images.

3. The method of claim 2,

the feature compression of the n-layer original feature map of the normal image by the feature compression network to obtain the compressed feature map of the normal image comprises: performing convolution, batch standardization and activation processing on each layer of original feature map in the first n-1 layers of original feature maps of the normal image respectively, and converting the original feature maps into original feature maps with the same feature resolution as the nth layer of original feature map of the normal image to obtain n layers of original feature maps with uniform resolution of the normal image; carrying out channel splicing on n layers of original feature maps with uniform resolution of the normal image on a channel to obtain a fusion original feature map of the normal image; sequentially performing space compression and channel compression on the fused original feature map of the normal image to obtain a compressed feature map of the normal image;

the feature compression of the n-layer original feature map of the abnormal image by the feature compression network to obtain the compressed feature map of the abnormal image comprises: performing convolution, batch standardization and activation processing on each layer of original feature map in the first n-1 layers of original feature maps of the abnormal image respectively, and converting the original feature maps into original feature maps with the same feature resolution as the nth layer of original feature map of the abnormal image to obtain n layers of original feature maps with uniform resolution of the abnormal image; performing channel splicing on n layers of original feature maps with uniform resolution of the abnormal image on a channel to obtain a fusion original feature map of the abnormal image; and sequentially performing space compression and channel compression on the fused original feature map of the abnormal image to obtain a compressed feature map of the abnormal image.

4. The method of claim 2, wherein the first loss function is determined according to the expression:

5. The method of claim 2, wherein the second loss function is determined according to the expression:

wherein L is _push Representing a second loss function, ε _k K-th layer original feature vector, σ, representing an abnormal image _k Indicating an anomalyThe k-th layer of the image reconstructs the feature vector.

6. The method according to any one of claims 2-5, wherein the performing defect inspection on the image to be inspected of the product using the trained defect inspection network to obtain a defect inspection result comprises:

inputting the image to be detected into the trained defect detection network to obtain n layers of original characteristic diagrams and n layers of reconstructed characteristic diagrams of the image to be detected;

7. The method of claim 6, wherein said determining a defect detection result from said anomaly prediction map comprises:

determining the maximum value in the abnormality prediction image as the abnormality score of the image to be detected;

when the abnormal score is larger than or equal to a preset abnormal threshold value, determining that the image to be detected is an abnormal image;

and when the abnormal score is smaller than a preset abnormal threshold value, determining that the image to be detected is a normal image.

8. The method of claim 6, wherein said determining a defect detection result from said anomaly prediction map comprises:

9. A defect detection apparatus based on unsupervised learning, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the memory-stored computer-executable instructions for performing the unsupervised learning-based defect detection method of any of claims 1-8.

10. A computer-readable storage medium, characterized in that the medium has stored thereon a program executable by a processor to implement the unsupervised learning-based defect detection method according to any one of claims 1 to 8.