CN114240892B

CN114240892B - Knowledge distillation-based unsupervised industrial image anomaly detection method and system

Info

Publication number: CN114240892B
Application number: CN202111555291.7A
Authority: CN
Inventors: 沈卫明; 曹云康; 宋亚楠
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2024-07-02
Anticipated expiration: 2041-12-17
Also published as: CN114240892A

Abstract

The invention discloses an unsupervised industrial image anomaly detection method and system based on knowledge distillation, and belongs to the technical field of industrial image processing. The method comprises two stages of training and testing, wherein the multi-scale knowledge distillation and multi-scale anomaly fusion are formed, the multi-scale knowledge distillation comprises a teacher network and a student network, self-adaptive difficult cases are used for mining difficult cases samples, and the student network is optimized by using pixel and context similarity among the difficult cases. In the training stage, only normal industrial images are utilized to carry out knowledge distillation from a teacher network to a student network, and iterative optimization is carried out on student network parameters, so that the depth characteristics of the student network and the normal industrial product images extracted by the teacher network are similar; in the test stage, depth features of the test image are extracted respectively, and regression errors among the features can be used for image anomaly segmentation and detection. The method effectively improves the performance of unsupervised industrial image anomaly detection, reduces labor cost, and improves the automation and intellectualization level of quality inspection of the production line.

Description

Knowledge distillation-based unsupervised industrial image anomaly detection method and system

Technical Field

The invention belongs to the technical field of industrial image processing, and further relates to an unsupervised industrial image anomaly detection method and system based on knowledge distillation.

Background

The automatic detection of the quality of industrial products is of great importance, and the industrial image anomaly detection technology can improve the product quality detection performance and production efficiency of the production line. Currently most industries, for example: the steel, textile, semiconductor industries and the like, the detection of the product quality still mainly depends on manual experience, and the detection personnel are easy to fatigue after working for a long time, so that false detection and omission detection are caused. The industrial image anomaly detection technology based on image processing is developed, and can replace manual work to effectively detect the quality of industrial products. The traditional industrial image anomaly detection main flow based on image processing comprises the following steps: the method comprises the steps of collecting images, extracting features and scoring anomalies, wherein the feature extraction part is highly dependent on expert knowledge, and is difficult to generalize to different scenes. The industrial image anomaly detection method based on deep learning does not need manual priori, can accelerate the deployment of product quality inspection, improves the detection performance and the detection speed, and is becoming a mainstream method gradually.

However, because the frequency of occurrence of anomalies in the industrial production line is low, the collected normal data is far greater than the abnormal data, and therefore the unsupervised deep learning anomaly detection method is significant.

Disclosure of Invention

Aiming at the defects or improvement demands of the prior art, the invention provides an unsupervised industrial image anomaly detection method based on knowledge distillation, which aims to detect industrial product image anomalies by utilizing the strong feature expression capability of deep learning, does not need to mark anomaly data in the parameter optimization stage, effectively improves the detection efficiency, reduces the manufacturing cost and has adaptability to different types of industrial product image anomalies.

In order to achieve the above purpose, the invention provides an unsupervised industrial image anomaly detection method based on knowledge distillation, which comprises the following steps:

Training phase:

Training a normal industrial product image serving as a training sample image by using the sample image to generate an abnormality detection model, wherein the abnormality detection model comprises a teacher network and a student network, the teacher network is used for extracting first features of an input image, and the student network is used for extracting second features of the input image;

extracting a difficult sample from the first feature and the second feature through self-adaptive difficult mining;

utilizing the difficult sample to improve an objective function of a student network in a knowledge distillation mode, and training to generate an anomaly detection model;

Testing:

inputting the image to be detected into a trained anomaly detection model to obtain anomaly scores of different scales, and fusing the anomaly scores of all scales by using multiplication to obtain a final anomaly score map to finish image anomaly detection.

The abnormality detection model can be divided into a multi-scale knowledge distillation module and a multi-scale abnormality fusion module, wherein the multi-scale knowledge distillation module comprises a teacher network and a student network.

The deep neural network trained on the large-scale natural image dataset has high discrimination of the extracted deep features, and can be used as an effective industrial image feature extractor. Normal industrial image features extracted by pre-training deep neural networks tend to be distributed in a compact manifold of high-dimensional feature space. The abnormal industrial image features are distributed more dispersedly and are far from the normal feature manifold. Modeling the distribution of the normal feature manifold using a model G (θ):

Where M represents the normal feature manifold. Theoretically, when G (θ) is able to accurately describe a normal feature manifold, G (x; θ) can accurately determine whether test data belongs to the manifold. However, the general model is difficult to accurately give the classification result, but gives probability, and needs to manually give a threshold value for further truncation:

G(x；θ)＝y (2)

In general, the larger y indicates the more likely that the data is anomalous. Since training data is limited and distributed in actual training, it is difficult to directly learn G (θ) to describe a normal feature manifold. Knowledge distillation can be used to implicitly learn a normal feature manifold, and the corresponding model G can be expressed as G (T, S), where T represents a teacher network, is a pre-training network with high discriminant, and S represents a student network with the same network structure as the teacher network. The corresponding abnormality discrimination model is:

wherein, F _T and F _S respectively represent the feature extractor of the teacher network and the student network, and formula (3) is essentially to calculate the feature difference extracted by the teacher network and the student network. In order for the formula (3) to satisfy the condition of the formula (2), namely: when the test data is normal data, the abnormal score is small; when the test data is abnormal data, the abnormal score is large, the characteristics of normal data extracted by the T and the S are required to be similar as far as possible, and the characteristics of the abnormal data are required to be different as far as possible. Through knowledge distillation on normal data, the characteristics of the normal data extracted by the T and the S can be made to be similar as much as possible; meanwhile, S does not learn the characteristic of the abnormal data extracted by T, and the characteristic difference of the abnormal data extracted by the S is larger. The knowledge distillation step is described in detail below.

Given image I, the corresponding image feature f _T,f_S is extracted by S, T:

f_T＝F_T(I) (4)

f_S＝F_S(I) (5)

Let f _T,f_S be h×w in resolution and D in dimension. Normalization of f _T,f_S:

considering different scale features extracted from deep neural network A feature map representing the first layer.

In order to make the normal data characteristics obtained by extracting T and S be similar as possible, the method can be directly optimized Defining pixel level loss as:

Wherein, Indicating the total number of scales, and H ^(l)×W^(l) indicates the size of each scale. By optimizing the pixel level loss of the normal sample features, the normal data features extracted by T and S can be made similar. However, in the practical training process, the parameter quantity of the selected T and S is usually larger, and the similarity between normal industrial images is extremely high, so that enough knowledge is difficult to extract, and the fitting problem occurs in the training process. Namely, the normal data characteristics of the training set extracted by the T and the S are very similar, but larger regression errors still exist in the actual use process, so that the abnormal detection performance is reduced. In order to alleviate the overfitting problem, the method uses the context similarity loss and the self-adaptive difficult-case mining to further optimize the training process.

In the knowledge distillation training process, the loss function used determines how the student network learns the knowledge of the teacher network. To alleviate the overfitting problem, the student network may be constrained from extracting contextual similarity of features with the teacher network. Intuitively, if the context similarity between the student network and the teacher network is high, they should have the same similarity relationship between depth features extracted from the same input image. Since the depth features are normalized to the unit hypersphere, their similarity relationship can be defined by cosine similarity:

Wherein Q (f) ∈R ^N×D denotes a pair A stretching operation is performed, where D represents the feature dimension and N represents the number of features. G (f) represents a similarity calculation, and values of G (f _T)^(l) and G (f _S)^(l), with coordinates (i, j)) encode the similarity of the i-th vector in Q (f _T)^(l)) to the j-th vector in Q (f _S)^(l):

where N ^(l)＝H^(l)×W^(l) represents the number of vectors contained in Q _T ^(l),Q_S ^(l). Taking into account the context similarity, the final loss function can be defined as:

L(f_T,f_S)＝L_ps(f_T,f_S)+λL_cs(f_T,f_S) (13)

where λ is the hyper-parameter weight that controls pixel similarity and context similarity loss.

In the training process, the difficult sample should play a larger role, while the simple sample does not need continuous optimization. The existing method adopts a fixed threshold value to judge whether any sample is a difficult sample, but the fixed threshold value is difficult to select, and the required threshold values among different tasks are different. The invention uses the self-adaptive difficult-case mining strategy to dynamically mine the difficult cases in real time. More specifically, a difficult mining strategy for measuring the difficulty level of a sample in real time is provided. The difficult mining strategy may be defined as follows:

m^(l,t)＝αm^(l,t-1)+(1-α)(μ^(l,t)+βσ^(l,t)) (14)

Wherein, And (3) extracting depth characteristic differences with the characteristic map position (i, j) by the teacher network and the student network for the t iteration. Mu ^(l,t),σ^(l,t) isThe mean and variance of (3) so that the difficulty level of the sample can be dynamically judged based on mu ^(l,t),σ^(l,t). The formula (14) uses an exponential moving average method to avoid extreme noise generated in each iteration process, so that the change of the dynamic threshold m ^(l,t) is smoother. Alpha is a super-parameter for controlling the smoothness degree of m ^(l,t), and beta is a super-parameter for controlling the difficulty degree of digging to obtain a sample. In general, the larger β, the fewer the number of samples mined, and the greater the difficulty in the training batch. Difficult cases mining can be extracted by the following formula:

Wherein, (i _h,j_h) represents the finally extracted difficult sample coordinates, and the corresponding teacher network characteristics and student network characteristics are represented as After the difficult cases are mined, the loss function is calculated by only using the mined difficult case samples, so that the difficult case samples in the training process play a larger role, and meaningless optimization of simple samples is avoided. The loss function after difficult mining can be defined as:

After training, the student network may extract features similar to those of the teacher network that extract normal data, i.e., the student network has implicitly described the normal data manifold extracted by the teacher network. At this time, the anomaly score may be calculated according to equation (3). Specifically, the anomaly score is calculated as follows:

To obtain better performance, multiplications are used to fuse the outlier scores for each scale:

wherein Upsample () is an up-sampling function, and up-samples the abnormal score maps of different scales to the resolution of the input image through quadratic linear interpolation. AS ^(f) is the final anomaly score map, which can provide a reference for subsequent decisions.

In another aspect, the present invention provides an unsupervised industrial image anomaly detection system based on knowledge distillation, comprising: a computer readable storage medium and a processor;

The computer-readable storage medium is for storing executable instructions;

The processor is used for reading executable instructions stored in the computer readable storage medium and executing the method for detecting the unsupervised industrial image abnormality based on knowledge distillation.

The unsupervised industrial image anomaly detection method based on knowledge distillation establishes an end-to-end deep learning anomaly detection model, can be used for solving the problem of cold start of a production line due to insufficient anomaly samples at early stage, improves the performance of unsupervised industrial image anomaly detection, effectively improves the quality detection automation level of the production line based on image analysis, and can provide services for subsequent supervised anomaly detection and anomaly diagnosis.

Drawings

FIG. 1 is a schematic diagram of an unsupervised industrial image anomaly detection method based on knowledge distillation.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

The invention provides an unsupervised industrial image anomaly detection method based on knowledge distillation, which is shown in fig. 1 and comprises the following steps:

Training phase:

Testing:

The abnormality detection model can be divided into a multi-scale knowledge distillation module and a multi-scale abnormality fusion module, wherein the multi-scale knowledge distillation module comprises a teacher network and a student network. The deep neural network trained on the large-scale natural image dataset has high discrimination of the extracted deep features, and can be used as an effective industrial image feature extractor. Normal industrial image features extracted by pre-training deep neural networks tend to be distributed in a compact manifold of high-dimensional feature space. The abnormal industrial image features are distributed more dispersedly and are far from the normal feature manifold. Modeling the distribution of the normal feature manifold using a model G (θ):

G(x；θ)＝y (2)

Given image I, the corresponding image feature f _T,f_S is extracted by S, T:

f_T＝F_T(I) (4)

f_S＝F_S(I) (5)

L(f_T,f_S)＝L_ps(f_T,f_S)+λL_cs(f_T,f_S) (13)

In the training process, the difficult sample should play a larger role, while the simple sample does not need continuous optimization. The existing method adopts a fixed threshold value to judge whether any sample is a difficult sample, but the fixed threshold value is difficult to select, and the required threshold values among different tasks are different. The invention uses the self-adaptive difficult-case mining strategy to dynamically mine the difficult cases in real time, and only uses the difficult-case samples obtained by mining to calculate the loss, as shown in AHSM (Adaptively Hard Sample Mining) in fig. 1. More specifically, a difficult mining strategy for measuring the difficulty level of a sample in real time is provided. The difficult mining strategy may be defined as follows:

m^(l,t)＝αm^(l,t-1)+(1-α)(μ^(l,t)+βσ^(l,t)) (14)

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the present patent are intended to be included within the scope of the present invention.

Claims

1. An unsupervised industrial image anomaly detection method based on knowledge distillation, which is characterized by comprising the following steps:

Training phase:

Utilizing the difficult sample to improve an objective function of a student network in a knowledge distillation mode, and training to generate an anomaly detection model; the objective function is:

Where L _ps is the pixel similarity penalty, L _cs is the context similarity penalty, The first characteristic and the second characteristic corresponding to the difficult sample are represented, and lambda is the hyper-parameter weight for controlling the pixel similarity loss and the context similarity loss;

The pixel similarity loss is:

Wherein, Indicating the total number of scales, H ^(l)×W^(l) indicating the size of each scale;

The context similarity loss is:

Wherein, Representing the total number of scales, H ^(l)×W^(l) representing the size of each scale, Q (f) εR ^N×D representing the pairPerforming a stretching operation, wherein D represents feature dimensions, N represents feature numbers, G (f) represents similarity calculation, and values of G (f _T)^(l) and G (f _S)^(l) with coordinates (i, j) encode the similarity of the ith vector in Q (f _T)^(l) and the jth vector in Q (f _S)^(l));

Testing:

Inputting the image to be detected into a trained anomaly detection model to obtain anomaly scores of different scales, and fusing the anomaly scores of all scales by using multiplication to obtain a final anomaly score map to finish image anomaly detection; the anomaly score is calculated as follows:

The anomaly scores for the various scales are fused using multiplication:

Wherein AS ^(f) is the final outlier plot, upsample () is an upsampling function, upsampling outlier plots of different scales to the resolution of the input image by quadratic linear interpolation, Indicating the total number of degrees of scale.

2. The method of claim 1, wherein the first and second features f _T,f_S are represented as:

f_T＝F_T(I)

f_S＝F_S(I)

Normalization:

Wherein, F _T and F _S respectively represent feature extractors of a teacher network and a student network, and I is an input image.

3. The method of claim 2, wherein the refractory samples are represented as:

m^(l,t)＝αm^(l,t-1)+(1-α)(μ^(l,t)+βσ^(l,t))

Wherein, A feature map representing a teacher network and a student network of the first layer,For the t iteration, the teacher network and the student network extract depth feature differences with feature map positions (i, j), m ^(l,t) as a dynamic threshold value and mu ^(l,t)、σ^(l,t) asAlpha is a super parameter for controlling the smoothness degree of m ^(l,t), and beta is a super parameter for controlling the difficulty degree of the sample obtained by excavation.

4. An unsupervised industrial image anomaly detection system based on knowledge distillation, comprising: a computer readable storage medium and a processor;

The computer-readable storage medium is for storing executable instructions;

The processor is configured to read executable instructions stored in the computer-readable storage medium and execute the knowledge distillation-based unsupervised industrial image anomaly detection method of any one of claims 1 to 3.