CN114240892A

CN114240892A - Unsupervised industrial image anomaly detection method and system based on knowledge distillation

Info

Publication number: CN114240892A
Application number: CN202111555291.7A
Authority: CN
Inventors: 沈卫明; 曹云康; 宋亚楠
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2022-03-25
Anticipated expiration: 2041-12-17
Also published as: CN114240892B

Abstract

The invention discloses an unsupervised industrial image anomaly detection method and system based on knowledge distillation, and belongs to the technical field of industrial image processing. The method comprises two stages of training and testing, and consists of multi-scale knowledge distillation and multi-scale abnormity fusion, wherein the multi-scale knowledge distillation comprises a teacher network and a student network, difficult samples are dynamically mined by using self-adaptive difficult samples, and the student network is optimized by using the pixel and context similarity among the difficult samples. Only using the normal industrial image in the training stage to carry out knowledge distillation from a teacher network to a student network, and carrying out iterative optimization on student network parameters to enable the depth characteristics of the student network and the normal industrial product image extracted by the teacher network to be similar; in the testing stage, the depth features of the tested image are respectively extracted, and the regression error between the features can be used for image abnormity segmentation and detection. The method effectively improves the performance of unsupervised industrial image anomaly detection, reduces the labor cost, and improves the automation and intelligentization level of production line quality inspection.

Description

Unsupervised industrial image anomaly detection method and system based on knowledge distillation

Technical Field

The invention belongs to the technical field of industrial image processing, and further relates to an unsupervised industrial image anomaly detection method and system based on knowledge distillation.

Background

The automatic detection of the quality of industrial products is vital, and the product quality detection performance and the production efficiency of a production line can be improved by the industrial image anomaly detection technology. Currently most industries, for example: in the industries of steel, textile, semiconductors and the like, the detection of the product quality still mainly depends on manual experience, and detection personnel are easy to fatigue after working for a long time, so that false detection and missed detection are caused. The image processing-based industrial image anomaly detection technology is developed, and can replace manual work to effectively detect the quality of industrial products. The traditional industrial image anomaly detection based on image processing comprises the following main processes: the method comprises the steps of collecting images, extracting features and carrying out abnormal scoring, wherein the feature extraction part is highly dependent on expert knowledge and is difficult to generalize to different scenes. The industrial image anomaly detection method based on deep learning does not need manual prior, can accelerate the deployment of product quality inspection, improves the detection performance and the detection speed, and is gradually becoming a mainstream method.

However, because the frequency of occurrence of the abnormality in the industrial production line is low, the collected normal data is far greater than the abnormal data, and therefore the unsupervised deep learning abnormality detection method is significant.

Disclosure of Invention

Aiming at the defects or improvement requirements of the prior art, the invention provides an unsupervised industrial image anomaly detection method based on knowledge distillation, which aims to detect the image anomaly of an industrial product by utilizing the strong feature expression capability of deep learning, does not need to label anomaly data in the parameter optimization stage, effectively improves the detection efficiency, reduces the manufacturing cost and has adaptability to the image anomaly detection of different types of industrial products.

In order to achieve the purpose, the invention provides an unsupervised industrial image anomaly detection method based on knowledge distillation, which comprises the following steps:

a training stage:

taking a normal industrial product image as a training sample image, and training by using the sample image to generate an abnormality detection model, wherein the abnormality detection model comprises a teacher network and a student network, the teacher network is used for extracting a first feature of an input image, and the student network is used for extracting a second feature of the input image;

extracting hard samples from the first feature and the second feature through adaptive hard mining;

improving an objective function of a student network in a knowledge distillation mode by using the difficult sample, and training to generate an abnormal detection model;

and (3) a testing stage:

and inputting the image to be detected into the trained anomaly detection model to obtain anomaly scores of different scales, and fusing the anomaly scores of all scales by using multiplication to obtain a final anomaly score map so as to finish the image anomaly detection.

The anomaly detection model can be divided into a multi-scale knowledge distillation module and a multi-scale anomaly fusion module, and the multi-scale knowledge distillation module comprises a teacher network and a student network.

The depth features extracted by the deep neural network trained on the large-scale natural image data set have high discriminability and can be used as an effective industrial image feature extractor. The normal industrial image features extracted through the pre-training deep neural network are often distributed in a compact manifold of a high-dimensional feature space. The abnormal industrial image features are distributed more dispersedly and far away from the normal feature manifold. The distribution of the normal feature manifold is modeled using model G (θ):

where M represents the normal feature manifold. Theoretically, when G (theta) can accurately describe a normal characteristic manifold, G (x; theta) can accurately judge whether test data belongs to the manifold. However, the general model is difficult to accurately give the classification result, but gives the probability, and needs to manually give a threshold value for further truncation:

G(x；θ)＝y (2)

in general, a larger y indicates a higher probability that the data is anomalous. Because training data is limited and distribution is unknown during actual training, it is difficult to directly learn G (theta) to describe the normal feature manifold. Knowledge distillation can be used for implicit learning of normal characteristic manifold, and the corresponding model G can be represented as G (T, S), wherein T represents a teacher network and is a pre-training network with high discriminability, and S represents a student network with the same network structure as the teacher network. The corresponding abnormality discrimination model is:

wherein, F_TAnd F_SAnd (3) respectively representing the characteristic extractors of the teacher network and the student network, wherein the essence of the formula (3) is to calculate the characteristic difference obtained by extraction of the teacher network and the student network. In order for formula (3) to satisfy the condition of formula (2), that is: when the test data are normal data, the abnormal score is small; when the test data are abnormal data, the abnormal score is large, the normal data features extracted by the T and the S are required to be similar as much as possible, and the abnormal data features are different as much as possible. Through knowledge distillation on normal data, the characteristics of the normal data extracted by T and S are similar as much as possible; meanwhile, S does not learn the abnormal data characteristics extracted by T, and the abnormal data characteristics extracted by T are different greatly. The knowledge distillation procedure is specifically described below.

Given an image I, extracting corresponding image features f through S and T_T，f_S：

f_T＝F_T(I) (4)

f_S＝F_S(I) (5)

Suppose f_T，f_SHas a resolution of H × W and a dimension of D. To f_T，f_SAnd (3) carrying out normalization:

considering different scale characteristics extracted from deep neural network

The characteristic diagram of the l-th layer is shown.

In order to make the normal data characteristics extracted by T and S as similar as possible, the normal data characteristics can be directly optimized

Defining a pixel level penalty as:

wherein the content of the first and second substances,

indicates the total scale degree, H^(l)×W^(l)Representing the dimensions of the respective dimensions. By optimizing the pixel level loss of the normal sample features, the T and the S extracted normal data features can be similar. However, in the actual training process, the parameters of the selected T and S are usually large, and the similarity between normal industrial images is extremely high, so that sufficient knowledge is difficult to extract, and the overfitting problem occurs in the training process. That is, the normal data features of the training set extracted by the T and S are very similar, but a large regression error still exists in the actual use process, resulting in the reduction of the abnormal detection performance. To alleviate overfittingThe method further optimizes the training process by using context similarity loss and self-adaptive hard-case mining.

In the knowledge distillation training process, the loss function used determines how the student network learns the knowledge of the teacher network. To alleviate the overfitting problem, the contextual similarity of the student network and the teacher network extracted features may be constrained. Intuitively, if the context similarity between the student network and the teacher network is high, they should have the same similarity relationship between the extracted depth features of the same input image. Since the depth features have been normalized to a unit hypersphere, its similarity relationship can be defined by cosine similarity:

wherein Q (f) e R^N×DPresentation pair

A stretching operation is performed, where D represents the feature dimension and N represents the number of features. G (f) represents similarity calculation, G (f)_T)^(l)And G (f)_S)^(l)The value of the middle coordinate (i, j) encodes Q (f)_T)^(l)The ith vector and Q (f)_S)^(l)The similarity of the jth vector. The contextual similarity loss may be calculated by constraining a similarity matrix between the teacher network and the student network:

wherein N is^(l)＝H^(l)×W^(l)Represents Q_T ^(l)，Q_S ^(l)The number of vectors contained in (a). Considering contextual similarityThe final loss function may then be defined as:

L(f_T,f_S)＝L_ps(f_T,f_S)+λL_cs(f_T,f_S) (13)

wherein λ is a hyper-parametric weight that controls pixel similarity and context similarity loss.

During training, difficult samples should play a greater role, while simple samples do not require continuous optimization. The existing method adopts a fixed threshold value to judge whether any sample is a sample which is difficult to sample, but the fixed threshold value is difficult to select, and the threshold values required by different tasks are different. The invention uses a self-adaptive difficult case mining strategy to dynamically mine difficult cases in real time. More specifically, a difficult case mining strategy for measuring the difficulty degree of a sample in real time is provided. The difficult-to-mine strategy may be defined as follows:

m^(l,t)＝αm^(l,t-1)+(1-α)(μ^(l,t)+βσ^(l,t)) (14)

wherein the content of the first and second substances,

and (5) extracting the depth feature difference with the feature map position (i, j) by the teacher network and the student network in the t-th iteration. Mu.s^(l,t)，σ^(l,t)Is composed of

And thus may be based on μ^(l,t)，σ^(l,t)And dynamically judging the difficulty level of the sample. The formula (14) uses an exponential moving average method to avoid extreme value noise generated in each iteration process, so that the dynamic threshold value m^(l,t)The change is smoother. α is control m^(l,t)And beta is a hyper-parameter for controlling the difficulty degree of the mining obtained sample. Generally speaking, the larger β, the fewer the number of samples mined, and the greater the difficulty in the training batch. Difficult excavation can be extracted by the following formula:

wherein (i)_h,j_h) Representing the coordinates of the difficult sample obtained by final extraction, and representing the corresponding teacher network characteristic and student network characteristic as

After the hard samples are mined, the loss function is calculated by using the hard samples obtained by mining, so that the hard samples can play a greater role in the training process, and the meaningless optimization of simple samples is avoided. The hard mined penalty function may be defined as:

after training, the student network can extract features similar to the features of the normal data extracted by the teacher network, namely the student network implicitly describes the normal data manifold extracted by the teacher network. At this time, the anomaly score can be calculated according to equation (3). Specifically, the anomaly score is calculated as follows:

to obtain better performance, the anomaly scores of each scale are fused using multiplication:

the Upsample () is an upsampling function, and upsamples abnormal score maps of different scales to the resolution of the input image through quadratic linear interpolation. AS^(f)And providing a reference for subsequent decision-making for the final abnormal score map.

In another aspect, the present invention provides an unsupervised industrial image anomaly detection system based on knowledge distillation, comprising: a computer-readable storage medium and a processor;

the computer-readable storage medium is used for storing executable instructions;

the processor is used for reading executable instructions stored in the computer readable storage medium and executing the unsupervised industrial image anomaly detection method based on knowledge distillation.

According to the unsupervised industrial image anomaly detection method based on knowledge distillation, an end-to-end deep learning anomaly detection model is established, the problem of cold start of a production line due to insufficient abnormal samples in the early stage can be solved, the unsupervised industrial image anomaly detection performance is improved, the automation level of production line quality detection based on image analysis is effectively improved, and services can be provided for subsequent supervised anomaly detection and anomaly diagnosis.

Drawings

FIG. 1 is a schematic diagram of an unsupervised industrial image anomaly detection method based on knowledge distillation.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The invention provides an unsupervised industrial image anomaly detection method based on knowledge distillation, which comprises the following steps of:

a training stage:

and (3) a testing stage:

The anomaly detection model can be divided into a multi-scale knowledge distillation module and a multi-scale anomaly fusion module, and the multi-scale knowledge distillation module comprises a teacher network and a student network. The depth features extracted by the deep neural network trained on the large-scale natural image data set have high discriminability and can be used as an effective industrial image feature extractor. The normal industrial image features extracted through the pre-training deep neural network are often distributed in a compact manifold of a high-dimensional feature space. The abnormal industrial image features are distributed more dispersedly and far away from the normal feature manifold. The distribution of the normal feature manifold is modeled using model G (θ):

G(x；θ)＝y (2)

f_T＝F_T(I) (4)

f_S＝F_S(I) (5)

considering different scale characteristics extracted from deep neural network

The characteristic diagram of the l-th layer is shown.

Defining a pixel level penalty as:

wherein the content of the first and second substances,

indicates the total scale degree, H^(l)×W^(l)Representing the dimensions of the respective dimensions. By optimizing the pixel level loss of the normal sample features, the T and the S extracted normal data features can be similar. However, in the actual training process, the parameters of the selected T and S are usually large, and the similarity between normal industrial images is extremely high, so that sufficient knowledge is difficult to extract, and the overfitting problem occurs in the training process. I.e. training set correction obtained by T and S extractionThe characteristics of constant data are very similar, but a larger regression error still exists in the actual use process, so that the abnormal detection performance is reduced. To alleviate the overfitting problem, the method further optimizes the training process by using context similarity loss and adaptive hard case mining.

wherein Q (f) e R^N×DPresentation pair

wherein N is^(l)＝H^(l)×W^(l)Represents Q_T ^(l)，Q_S ^(l)The number of vectors contained in (a). Considering the context similarity, the final penalty function can be defined as:

L(f_T,f_S)＝L_ps(f_T,f_S)+λL_cs(f_T,f_S) (13)

During training, difficult samples should play a greater role, while simple samples do not require continuous optimization. The existing method adopts a fixed threshold value to judge whether any sample is a sample which is difficult to sample, but the fixed threshold value is difficult to select, and the threshold values required by different tasks are different. In the invention, a self-adaptive difficult case mining strategy is used for dynamically mining difficult cases in real time, and only the difficult case samples obtained by mining are used for loss calculation, as shown in AHSM (adaptive Hard Sample mining) in FIG. 1. More specifically, a difficult case mining strategy for measuring the difficulty degree of a sample in real time is provided. The difficult-to-mine strategy may be defined as follows:

m^(l,t)＝αm^(l,t-1)+(1-α)(μ^(l,t)+βσ^(l,t)) (14)

wherein the content of the first and second substances,

extracting feature maps from the teacher network and the student network for the tth iterationThe depth feature difference with position (i, j). Mu.s^(l,t)，σ^(l,t)Is composed of

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall be included in the scope of the present invention.

Claims

1. An unsupervised industrial image anomaly detection method based on knowledge distillation is characterized by comprising the following steps:

a training stage:

and (3) a testing stage:

2. The method of claim 1, wherein the first and second features f_T，f_SExpressed as:

f_T＝F_T(I)

f_S＝F_S(I)

normalization:

wherein, F_TAnd F_SRespectively representing the feature extractors of the teacher network and the student network, and I is an input image.

3. The method of claim 2, wherein the difficult case sample is represented as:

m^(l,t)＝αm^(l,t-1)+(1-α)(μ^(l,t)+βσ^(l,t))

wherein the content of the first and second substances,

a feature diagram representing the teacher network and the student network at layer l,

in the t-th iteration, the depth feature difference with the feature map position (i, j) is extracted by the teacher network and the student network, and m is^(l,t)To dynamic threshold, μ^(l,t)、σ^(l,t)Is composed of

A is control m^(l,t)And beta is a hyper-parameter for controlling the difficulty degree of the mining obtained sample.

4. The method of claim 3, wherein the objective function is:

wherein L is_psFor loss of pixel similarity, L_csIn order for the context to be lost in similarity,

the first feature and the second feature corresponding to the sample difficult to sample are shown, and lambda is a super-parameter weight for controlling pixel similarity loss and context similarity loss.

5. The method of claim 4, wherein the pixel similarity penalty is:

wherein the content of the first and second substances,

indicates the total scale degree, H^(l)×W^(l)Representing the dimensions of the respective dimensions.

6. The method of claim 4, wherein the contextual similarity measure is lost as:

wherein the content of the first and second substances,

indicates the total scale degree, H^(l)×W^(l)Denotes the size of the respective scale, Q (f) e R^N×DPresentation pair

Performing a stretching operation, wherein D represents feature dimension, N represents feature number, G (f) represents similarity calculation, and G (f)_T)^(l)And G (f)_S)^(l)The value of the middle coordinate (i, j) encodes Q (f)_T)^(l)The ith vector and Q (f)_S)^(l)The similarity of the jth vector.

7. The method of claim 3, wherein the anomaly score is calculated as follows:

the anomaly scores for each scale are fused using multiplication:

wherein, AS^(f)For the final abnormal value-dividing map, upscale () is an up-sampling function, the abnormal value-dividing maps with different scales are up-sampled to the resolution of the input image through quadratic linear interpolation,

indicating the total gauge number.

8. An unsupervised industrial image anomaly detection system based on knowledge distillation, comprising: a computer-readable storage medium and a processor;

the processor is used for reading executable instructions stored in the computer readable storage medium and executing the unsupervised industrial image anomaly detection method based on knowledge distillation, as claimed in any one of claims 1 to 7.