CN117372720B

CN117372720B - Unsupervised anomaly detection method based on multi-feature cross mask repair

Info

Publication number: CN117372720B
Application number: CN202311319685.1A
Authority: CN
Inventors: 王珺璞; 徐贵力; 刘若鹏; 贺军崴; 董文德
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2023-10-12
Filing date: 2023-10-12
Publication date: 2024-04-26
Anticipated expiration: 2043-10-12
Also published as: CN117372720A

Abstract

The application discloses an unsupervised anomaly detection method based on multi-feature cross mask restoration, and relates to the technical field of unsupervised detection. The method combines two types of methods in the existing unsupervised anomaly detection method, combines feature learning and depth reconstruction, performs reconstruction tasks aiming at the extracted multi-scale feature images, can obtain more semantic information which is favorable for distinguishing normal samples from abnormal samples, converts the feature reconstruction problem into feature restoration problem by using a sub-mask image in a cross mask mode, can effectively prevent reconstruction of an abnormal region, and has good detection accuracy.

Description

Unsupervised anomaly detection method based on multi-feature cross mask repair

Technical Field

The application relates to the technical field of unsupervised detection, in particular to an unsupervised anomaly detection method based on multi-feature cross mask repair.

Background

Anomaly detection in industrial production has important effects on ensuring product quality, ensuring product safety and improving production efficiency. Although supervised deep learning approaches have achieved good detection results, they require a large number of annotated defect images to support model training. However, in most practical industrial scenarios, it is easier to obtain normal images, but collecting and marking all types of defect images is not practical. The unsupervised deep learning method only needs normal samples to train, so that whether the test image has defects can be judged, and the abnormal region is further positioned, so that the method is widely studied.

Currently, the unsupervised anomaly detection methods are mainly divided into two categories: (1) The feature-based method first utilizes a deep network (including pre-training and self-supervised learning models) to extract discriminative features of normal images. The feature distribution of these normal images is then modeled using a related statistical method, such as a clustering algorithm or gaussian model. In the reasoning process, the image features deviating from the learned distribution are regarded as anomalies, so that the anomaly detection function is realized. Since this type of approach detects anomalies in a feature space that contains semantic representations, it generally yields better results. However, feature-based methods lack interpretability and do not directly determine abnormal regions in an image. At the same time, the artificially chosen feature distribution assumption is difficult to adapt to all anomalies, which further limits the scope of application of feature-based approaches in practical industrial quality inspection. (2) Based on reconstruction methods, the method typically uses a reconstruction model (e.g., an automatic encoder and generation of an countermeasure network) to model potential representations of normal data from which the original data is then reconstructed. Because the trained model only learns the relevant knowledge of the normal sample, only the non-defective region in the abnormal sample can be reconstructed, but the defective region cannot be reconstructed. And calculating the difference between the input and the reconstruction result through the related distance measurement function to generate an abnormal graph, wherein the region with a larger difference value is the abnormal region. However, the unsupervised anomaly detection method based on reconstruction cannot effectively learn or utilize the semantic features of the image, so that the anomaly detection result in a complex environment is poor, the generalization capability of the depth reconstruction model is too strong, and in practical application, the model can reconstruct an anomaly region, so that detection omission is caused.

Disclosure of Invention

Aiming at the problems and the technical requirements, the application provides an unsupervised anomaly detection method based on multi-feature cross mask repair, which has the following technical scheme:

an unsupervised anomaly detection method based on multi-feature cross mask repair, the unsupervised anomaly detection method comprising:

performing multi-scale feature extraction on the input image to obtain a multi-scale feature map of the input image;

Processing the multi-scale feature map by using a plurality of sub-mask maps respectively to obtain a plurality of blocked feature maps, wherein the size of each sub-mask map is the same as that of the multi-scale feature map, and blocking areas of each sub-mask map are mutually disjoint;

Performing image restoration on each shielded feature map by using a restoration network based on a Transformer to obtain a corresponding restoration map, and combining the restoration map with the restored region in each restoration map to obtain a reconstructed feature map;

and comparing the reconstructed feature map with the multi-scale feature map to obtain an abnormality detection result of the input image.

The further technical scheme is that the method for generating the plurality of sub-mask patterns comprises the following steps:

uniformly meshing an all-zero mask map with the same image size as the multi-scale feature map;

Randomly reserving pixel points with pixel values of 0 in a plurality of grid areas in the all-zero mask image, setting the pixel values of the pixel points in the rest grid areas to be 1, and constructing to obtain a sub-mask image; and constructing a plurality of complementary sub-mask patterns according to the same method, wherein each sub-mask pattern keeps the pixel points in different grid areas in the all-zero mask pattern, and the pixel points in each grid area in the all-zero mask pattern only keep in a unique sub-mask pattern.

The further technical scheme is that the processing of the multi-scale feature map by using each sub-mask map to obtain the corresponding blocked feature map comprises the following steps:

And copying the sub-mask map along the channel dimension of the multi-scale feature map, and then multiplying the sub-mask map with the multi-scale feature map pixel by pixel to obtain a corresponding blocked feature map.

The repair network comprises a head convolution module, a first DefT module, a first downsampling module, a second DefT module, a second downsampling module, a third DefT module, a first upsampling module, a fourth DefT module, a second upsampling module, a fifth DefT module and a tail convolution module which are sequentially connected from input to output; the output characteristics of the second DefT module and the output characteristics of the fourth DefT module are subjected to characteristic splicing and then enter a second up-sampling module for up-sampling, and the output characteristics of the first DefT module and the output characteristics of the fifth DefT module are subjected to characteristic splicing and then enter a tail convolution module for convolution processing;

Each DefT module includes two DefT units connected in sequence, and each DefT unit is constructed based on a local perception module LPB, a cascaded pooled self-attention module LMPS, and a convolutional feed forward network CFFN.

In each DefT unit, the local perception module LPB processes an image x ₁ input to the DefT unit to obtain an image x ₂, inputs the image x ₃ to the cascade pooling self-attention module LMPS after layer normalization, performs residual connection on the image x ₂ and the image x ₃ to obtain an image x ₄, inputs the image x ₄ to the convolution feedforward network CFFN after layer normalization to obtain an image x ₅, and performs residual connection on the image x ₄ and the image x ₅ to output as DefT units.

The further technical scheme is that the image x ₂＝x₁+conv_3×3(x₁),conv_3×3 output by the local perception module LPB is a 3X 3 convolution filled with 1;

Cascading pooling images output from attention module LMPS Softmax () represents the Softmax function, q, k and v represent query, key and value, respectively, and are derived based on image x ₂, d represents the number of channels of k and is used to balance the scale;

The image x ₅＝conv_1×1(conv_3×3(conv_1×1(x₄))),conv_1×1 output by the convolution feed forward network CFFN represents a1 x 1 convolution.

The further technical scheme is that q, k and v are obtained based on an image x ₂, and the method comprises the following steps:

The determination of the parameter combination (q, k, v) = (x ₂w_q,x′₂w_k,x′₂w_v),w_q、w_k and w _v are respectively the learnable parameters for the linear mapping, x' ₂＝cat(img2seq[avgpool_j(x₂)|j∈P]),avgpool_j(x₂) means that the image x ₂ is averaged to be pooled according to the pooling scale j, P is the set of pooling scales, img2seq means that the image is vectorized and cat means that it is concatenated along the channel dimension.

The method for extracting the multi-scale characteristics of the input image comprises the following steps of:

inputting an input image into a convolutional neural network, and outputting a plurality of feature images with different image sizes through different convolutional layers;

and after adjusting the feature images to the same image size, connecting the feature images in series along the channel dimension, and extracting to obtain the multi-scale feature images of the input image.

The further technical proposal is that the reconstructed feature map is obtained by combining the repaired areas in each repair mapWherein/>The blocked feature map F (I) _i is a repair map obtained by performing image repair on the blocked feature map F (I) _i, and the blocked feature map F (I) is obtained by processing the multi-scale feature map F (I) using the sub-mask map M _i.

The further technical scheme is that comparing the reconstructed feature map with the multi-scale feature map to obtain an abnormality detection result of the input image comprises:

Computing a multiscale feature map F (I) and reconstructing the feature map Square error plot between/>And obtaining an abnormality detection result of the input image.

The beneficial technical effects of the application are as follows:

The application discloses an unsupervised anomaly detection method based on multi-feature cross mask restoration, which is used for carrying out reconstruction tasks on an extracted multi-scale feature map, so that more semantic information which is beneficial to distinguishing normal samples from abnormal samples can be obtained, a sub mask map is used for converting a feature reconstruction problem into a feature restoration problem in a cross mask mode, reconstruction of an anomaly region can be effectively prevented, a blocked feature map region is restored with high quality by adopting a restoration network based on a transform to obtain a reconstructed feature map, and then the anomaly region can be accurately and rapidly detected by comparing the original multi-scale feature map with the reconstructed feature map. The method combines two types of methods in the existing unsupervised anomaly detection method, combines feature learning and depth reconstruction, and has good detection accuracy.

Drawings

FIG. 1 is a method flow diagram of an unsupervised anomaly detection method of one embodiment of the present application.

FIG. 2 is a process flow diagram of an unsupervised anomaly detection method according to one embodiment of the present application.

FIG. 3 is a schematic diagram of processing a multi-scale feature map using a sub-mask map in one embodiment of the application.

Fig. 4 is a network structure diagram of a repair network in one embodiment of the application.

Fig. 5 is a block diagram of each DefT unit in one embodiment of the application.

Detailed Description

The following describes the embodiments of the present application further with reference to the drawings.

The application discloses an unsupervised anomaly detection method based on multi-feature cross mask repair, please refer to a flow chart shown in fig. 1 and an information flow diagram shown in fig. 2, the method comprises the following steps:

Step 1, multi-scale feature extraction is carried out on an input image to obtain a multi-scale feature map F (I) of the input image, wherein the image size of the multi-scale feature map F (I) is W multiplied by H multiplied by C, W is the width of the multi-scale feature map, H is the height of the multi-scale feature map, and C is the channel number of the multi-scale feature map.

In one embodiment, an input image is input into a convolutional neural network, multiple feature maps with different image sizes are output through different convolutional layers, then the multiple feature maps are adjusted to the same image size W×H, and the multiple feature maps are connected in series along a channel dimension to extract a multi-scale feature map of the input image. The convolutional neural network used in this step may employ an existing network structure, such as VGG16.

And 2, processing the multi-scale feature map by using the plurality of sub-mask maps to obtain a plurality of blocked feature maps.

The size of each sub-mask map used in the step is the same as that of the multi-scale feature map, and the shielding areas of the sub-mask maps are mutually exclusive. One method of constructing a plurality of sub-mask patterns is to refer to the schematic diagram shown in fig. 3: and uniformly meshing an all-zero mask map with the same image size W multiplied by H as the multi-scale feature map. And randomly reserving pixel points with pixel values of 0 in a plurality of grid areas in the all-zero mask image, and setting the pixel values of the pixel points in the rest grid areas to be 1 to construct a sub-mask image. And constructing a plurality of complementary sub-mask patterns according to the same method, wherein each sub-mask pattern retains the pixels in different grid areas in the all-zero mask pattern, and the pixels in each grid area in the all-zero mask pattern are retained in only one sub-mask pattern.

Processing the multi-scale feature map F (I) with each sub-mask map to obtain a corresponding occluded feature map includes: and copying the sub-mask map along the channel dimension of the multi-scale feature map, and then multiplying the sub-mask map with the multi-scale feature map pixel by pixel to obtain a corresponding blocked feature map. After the n sub-mask patterns M ₁、M₂……M_n are used to process the multi-scale feature patterns F (I), n blocked feature patterns F (I) ₁、F(I)₂……F(I)_n with different blocked areas are correspondingly obtained. Referring to fig. 3, taking n=3 as an example, the pixel value in the black grid area in each sub-mask map is 0, and the shading is realized after the pixel-by-pixel multiplication of the multi-scale feature map, while the pixel value in the white grid area is1, and the original pixel value is not changed after the pixel-by-pixel multiplication of the multi-scale feature map.

And 3, performing image restoration on each blocked feature map by using a restoration network based on a transducer to obtain a corresponding restoration map.

Referring to the network structure diagram of the repair network shown in fig. 4, the repair network includes, from input to output, a head convolution module, a first DefT module, a first downsampling module, a second DefT module, a second downsampling module, a third DefT module, a first upsampling module, a fourth DefT module, a second upsampling module, a fifth DefT module, and a tail convolution module, which are sequentially connected. The output characteristics of the second DefT module and the output characteristics of the fourth DefT module are subjected to characteristic splicing and then enter a second up-sampling module for up-sampling, and the output characteristics of the first DefT module and the output characteristics of the fifth DefT module are subjected to characteristic splicing and then enter a tail convolution module for convolution processing.

The occluded feature map of the input repair network is first processed by a head convolution module by 3×3 convolution to reduce the dimension of the occluded feature map, and the occluded feature map of w×h×c is processed as w×h×c ₀,C₀ < C. The method further comprises a DefT module with five stages, wherein each DefT module comprises two DefT units for feature conversion and learning. Furthermore, the back ends of the first two stages each contain a downsampling module for halving the image size and doubling the channel. The front ends of the two stages then contain an up-sampling module for recovering the feature map size and channel number. Both these up and down sampling modules are implemented by a3 x 3 convolution operation. Finally, the size and channel number of the feature map are restored to the input size by a tail convolution module containing 3×3 convolutions.

Each DefT module includes two DefT units connected in sequence, and each DefT unit is constructed based on a local perception module LPB, a cascaded pooled self-attention module LMPS, and a convolutional feed forward network CFFN. Referring to the network structure diagram of each DefT unit shown in fig. 5, in each DefT unit, the local perception module LPB processes an image x ₁ input into a DefT unit to obtain an image x ₂, inputs the image x ₃ into the cascade pooling self-attention module LMPS after layer normalization LN, performs residual connection on the image x ₂ and the image x ₃ to obtain an image x ₄, inputs the image x ₄ into the convolution feedforward network CFFN after layer normalization LN to obtain an image x ₅, and performs residual connection on the image x ₄ and the image x ₅ to obtain an output of the DefT unit. The processing process of each module to the image is as follows:

(1) Local perception module LPB

Image x ₂＝x₁+conv_3×3(x₁ output by local perception module LPB). Wherein conv _3×3 is a 3 x 3 convolution with a1 padding.

(2) Cascading pooling self-attention module LMPS

Cascading pooling images output from attention module LMPS

Where Softmax () represents the Softmax function, q, k and v represent query, key and value, respectively, and are derived based on image x ₂, d represents the number of channels of k and is used to balance the scale. The parameter combination (q, k, v) = (x ₂w_q,x′₂w_k,x′₂w_v),w_q、w_k and w _v are respectively the learnable parameters for linear mapping, x' ₂＝cat(img2seq[avgpool_j(x₂)|j∈P]),avgpool_j(x₂) represents an average pooling of the image x ₂ according to the pooling scale j, P is a set of pooling scales and in one embodiment p= {2,3,4,5, }, img2seq represents a vectorization of the image, cat represents a concatenation along the channel dimension.

(3) Convolutional feed forward network CFFN

The image x ₅＝conv_1×1(conv_3×3(conv_1×1(x₄ output by the convolution feed forward network CFFN)), both conv _1×1 represent 1 x 1 convolutions for expanding and contracting the feature dimension, respectively.

Step 4, combining the repaired areas in each repair map to obtain a reconstructed feature mapThe obtained reconstructed characteristic diagramThe image size of (a) is the same as that of the multi-scale feature map F (I), and is W×H×C.

Any one of the blocked characteristic images F (I) _i is input into a repair network to repair the missing region with high quality, so that a corresponding repair image can be obtainedThe integer parameter is 1-n. Thereby repairing the occluded characteristic map F (I) ₁、F(I)₂……F(I)_n respectively to obtain/>

Then combining the repaired areas in each repairing image to obtain a reconstructed characteristic image asWherein/>The blocked feature map F (I) _i is a repair map obtained by performing image repair on the blocked feature map F (I) _i, and the blocked feature map F (I) is obtained by processing the multi-scale feature map F (I) using the sub-mask map M _i.

And 5, comparing the reconstructed feature map with the multi-scale feature map to obtain an abnormal detection result of the input image. Comprising the following steps: computing a multiscale feature map F (I) and reconstructing the feature mapSquare error plot between/>And obtaining an abnormality detection result of the input image.

The above is only a preferred embodiment of the present application, and the present application is not limited to the above examples. It is to be understood that other modifications and variations which may be directly derived or contemplated by those skilled in the art without departing from the spirit and concepts of the present application are deemed to be included within the scope of the present application.

Claims

1. An unsupervised anomaly detection method based on multi-feature cross mask repair, which is characterized by comprising the following steps:

Performing multi-scale feature extraction on an input image to obtain a multi-scale feature map of the input image;

comparing the reconstructed feature map with the multi-scale feature map to obtain an abnormality detection result of the input image;

The repair network comprises a head convolution module, a first DefT module, a first downsampling module, a second DefT module, a second downsampling module, a third DefT module, a first upsampling module, a fourth DefT module, a second upsampling module, a fifth DefT module and a tail convolution module which are sequentially connected from input to output; the output characteristics of the second DefT module and the output characteristics of the fourth DefT module are subjected to characteristic splicing and then enter the second upsampling module for upsampling, and the output characteristics of the first DefT module and the output characteristics of the fifth DefT module are subjected to characteristic splicing and then enter the tail convolution module for convolution processing; each DefT module includes two DefT units connected in sequence, and each DefT unit is constructed based on a local perception module LPB, a cascaded pooled self-attention module LMPS, and a convolutional feed forward network CFFN.

2. The method of unsupervised anomaly detection of claim 1, wherein the method of generating a plurality of sub-mask patterns comprises:

3. The method of claim 1, wherein processing the multi-scale feature map with each sub-mask map to obtain a corresponding occluded feature map comprises:

4. The method of claim 1, wherein in each DefT units, the local perception module LPB inputs DefT images of the unitProcessing to obtain an image/>Then, the images/>, obtained by the layer normalization, are input into a cascade pooling self-attention module LMPSImage/>And image/>Residual connection is carried out to obtain an image/>Image/>The image/>, obtained by layer normalization and input into a convolution feedforward network CFFNImage/>And image/>And performing residual connection and then taking the residual connection as the output of the DefT unit.

5. The method for unsupervised anomaly detection according to claim 4,

The local perception module LPB outputs images，/>Is a3 x 3 convolution with a fill of 1;

The cascade pooling images output by the self-attention module LMPS ，/>Representing a Softmax function,/>、/>And/>Representing queries, keys, and values, respectively, and based on image/>Obtained,/>Representation/>Is used for balancing the scale;

The convolution feedforward network CFFN outputs an image ，/>Representing a1 x1 convolution.

6. The unsupervised anomaly detection method of claim 5, wherein the image-based anomaly detection method comprisesObtain/>、/>And/>Comprising the following steps:

Determining parameter combinations ，/>、/>And/>Respectively, leachable parameters for linear mapping,/>，/>Representation of image/>According to the pooling proportion/>Average pooling,/>Is a collection of pooling proportions,/>Representing vectorization of images,/>The representations are concatenated along the channel dimension.

7. The method of claim 1, wherein the performing multi-scale feature extraction on the input image to obtain a multi-scale feature map of the input image comprises:

Inputting the input image into a convolutional neural network, and outputting a plurality of characteristic images with different image sizes through different convolutional layers;

and after adjusting the sizes of the multiple feature images to the same image size, connecting the feature images in series along the channel dimension, and extracting to obtain the multi-scale feature image of the input image.

8. The method for unsupervised anomaly detection according to claim 1, wherein the reconstructed feature map obtained by combining the repaired regions in each of the repair maps isWherein/>Is to the occluded characteristic diagramRepair map obtained by performing image repair, occluded feature map/>Is to use the sub-mask pattern/>For the multiscale feature map/>And processing the obtained occluded characteristic map.

9. The method of claim 1, wherein comparing the reconstructed feature map to the multi-scale feature map to obtain an anomaly detection result for the input image comprises:

Computing a multi-scale feature map And reconstruct feature map/>Square error plot between/>And obtaining an abnormality detection result of the input image.