CN114022475A

CN114022475A - Image anomaly detection and anomaly positioning method and system based on self-supervision mask

Info

Publication number: CN114022475A
Application number: CN202111397389.4A
Authority: CN
Inventors: 王延峰; 黄潮钦; 徐勤伟; 张娅
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2021-11-23
Filing date: 2021-11-23
Publication date: 2022-02-08
Anticipated expiration: 2041-11-23
Also published as: CN114022475B

Abstract

The invention provides an image anomaly detection and anomaly positioning method and system based on an automatic supervision mask, which relate to the technical field of computer vision and image processing, and comprise the following steps: the method comprises the steps of mask random generation, mask initialization, mask generation initialization, image feature extraction, image reconstruction, reconstructed image alignment, mask updating termination decision making and anomaly assessment. The invention improves the abnormal positioning capability of the abnormal detection algorithm by introducing the training of the self-monitoring mask, thereby obtaining better performance on the tasks of abnormal detection and abnormal positioning.

Description

Image anomaly detection and anomaly positioning method and system based on self-supervision mask

Technical Field

The invention relates to the technical field of computer vision and image processing, in particular to an unsupervised image anomaly detection and anomaly positioning method and system based on an automatic supervision mask, and particularly relates to an image anomaly detection and anomaly positioning method and system based on the automatic supervision mask.

Background

Currently, deep learning techniques based on deep neural networks have achieved significant success in object classification tasks, and such data-driven approaches typically require large amounts of labeled data for training. However, in the task of anomaly detection, the variety of anomalies is not exhaustive, and therefore, it is too costly to collect enough anomaly data for model training. In this case, the anomaly detection task usually only provides normal data for model training, and requires that the anomaly detection method must be able to still have data anomaly detection capability without abnormal data training.

The image anomaly detection solution based on image reconstruction uses data of normal category to train an image reconstruction model, and assumes that the model cannot be applied to image reconstruction of abnormal data. In the abnormal detection stage, the image reconstruction model has limited capability of image reconstruction aiming at abnormal data, and larger image reconstruction errors can be caused. Therefore, the reconstruction error can be used as a detection index for abnormality detection. However, for practical applications, such as medical diagnosis and industrial defect detection tasks, the abnormality often appears in only a small portion of pixels of the image, and the above method can only be used for judging whether the abnormality exists in the whole image, and cannot accurately complete the positioning of the abnormal region. In fact, in order to improve the anomaly detection performance of the algorithm and the interpretability of the anomaly detection algorithm, anomaly positioning is very important, but the important task is often ignored by the existing anomaly detection algorithm.

An invention patent with publication number CN110866908B discloses an image processing method, apparatus, server and storage medium, comprising: acquiring an image to be detected, and performing down-sampling abnormal classification processing on the image to be detected to obtain an abnormal class prediction label and a target characteristic diagram; performing primary abnormal positioning processing based on the abnormal category prediction label and the target characteristic image to obtain an initial positioning image corresponding to the image to be detected; carrying out up-sampling abnormal positioning processing on the initial positioning image to obtain a target positioning image corresponding to the image to be detected; and outputting the abnormal category prediction label and the target positioning image.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides an image anomaly detection and anomaly positioning method and system based on an automatic supervision mask.

According to the image anomaly detection and anomaly positioning method and system based on the self-supervision mask, the scheme is as follows:

in a first aspect, a method for detecting and locating an image anomaly based on an auto-supervision mask is provided, the method comprising:

a mask random generation step: randomly generating a mask with the size consistent with that of the model training image, and applying the mask on the image to remove information of a partial region of the image;

mask initialization step: generating multi-scale initialization masks according to the tested images, and respectively applying the masks to the images to be tested to generate multi-scale images to be tested with partial image area information removed;

an image feature extraction step: extracting high-dimensional features of the image by using a depth convolution neural network for the image obtained in the mask random generation step or the mask initialization step;

an image reconstruction step: carrying out image reconstruction on the high-dimensional features of the image by using a depth convolution neural network to obtain a reconstructed training image or a reconstructed test image;

and aligning the reconstructed images: according to the reconstruction training image and the model training image, an image reconstruction loss function is used for realizing self-supervision learning;

a mask updating step: updating the multi-scale mask according to the reconstructed test image by using a mask updating algorithm, so that the mask is more concentrated on the abnormal part of the image;

a mask updating termination decision step: judging whether the mask is consistent with the mask before updating according to the updated mask obtained in the mask updating step, if so, entering an abnormal evaluation step, and if not, acting the updated mask on the image to be tested, and entering the image feature extraction step again;

an abnormality evaluation step: and according to the result of the mask updating termination decision step, using an anomaly evaluation function to realize image anomaly evaluation.

Preferably, the mask random generating step includes:

using images for model training as input, decomposing each input image into

Wherein H and W are the height and width of the image, where k controls the size of the grid;

each grid is composed of a square of k × k pixels and is set as a basic unit of the mask;

k from the set

Middle sampling, wherein N_kIs the set cardinality, k_iRepresents the ith grid size k;

each grid is randomly selected for masking or retention and the resulting mask matrix is denoted M.

Preferably, the mask initializing step includes: given an image to be tested, as an initialization, from a set of multi-scale masks

The mask is composed of eight chessboard-like matrices with different scales, wherein the size K of the grid belongs to K;

for each grid size k, a pair of complementary masks is included that collectively cover all pixels in the image.

Preferably, the image feature extraction step includes: and taking the image obtained in the mask random generation step or the mask initialization step as input, and extracting high-dimensional feature information of the image by using a deep convolution neural network, wherein the image feature extraction network consists of a plurality of layers of convolution and down-sampling operations.

Preferably, the image reconstructing step includes: the high-dimensional characteristic information obtained in the image characteristic extraction step is used as input, image reconstruction is realized by utilizing a deep convolution neural network model to obtain a reconstructed image, and an image attribute recovery network is formed by a plurality of layers of convolution and up-sampling operations;

and if the input is the model training image, outputting a reconstructed training image, and if the input is the image to be tested, outputting a reconstructed testing image.

Preferably, the step of aligning the reconstructed images specifically includes:

and for the model training image, comparing the reconstructed training image obtained in the image reconstruction step with the model training image, and respectively calculating the following loss functions:

(1) mean square loss function:

wherein,

represents a two-norm;

(2) gradient magnitude similarity loss function:

wherein 1 represents a full 1 matrix;

i and

respectively representing a model training image and a reconstruction training image;

I^cand

respectively representing the c-th color channel of the model training image and the reconstructed training image;

gradient magnitude representing model training image and reconstructed training imageA degree similarity function;

i, j represents two-dimensional coordinates of the image;

representing the dimension of the matrix;

and the gradient magnitude similarity loss matrix for channel c:

wherein a represents a constant;

h_xand h_yIs a Prewitt filter in the x and y dimensions;

(3) structural similarity exponential loss function:

wherein,

representing a structural similarity index function centered on the image two-dimensional coordinates i, j.

Preferably, the mask updating step includes:

in each iteration updating, a region with small reconstruction error is regarded as a normal region and is removed from the mask in the next iteration, so that the mask is updated by the reconstruction error;

given the grid size k, the image is divided into k × k grids, and the mask is updated by taking the k × k grids as a unit, so that the algorithm is more stable, and the iterative update times are reduced;

for each grid, the average reconstruction error is calculated, the mask is updated according to a threshold value, and the parts of the mask where the reconstruction error is higher than the threshold value are reserved.

Preferably, the mask update termination decision step includes:

when most of the area covered by the mask is an abnormal area, stopping updating the mask and obtaining a final mask;

after the process is finished, the expected mask only covers the abnormal part of the image, and the final mask and the reconstructed image are used as the input of the abnormality evaluation step;

and if the mask is continuously changed in the mask updating step, not entering an abnormal evaluation step, but re-entering the image feature extraction step until the output mask of the mask updating step is kept unchanged.

Preferably, the abnormality assessing step includes:

for the image to be tested, comparing the final reconstructed image obtained in the mask updating and stopping decision step with the image to be tested, thereby calculating the following abnormal evaluation function:

wherein,

l representing a test image and a reconstructed image₂Distance.

In a second aspect, there is provided an image anomaly detection and anomaly localization system based on an auto-supervised mask, the system comprising:

a mask random generation module: randomly generating a mask with the size consistent with that of the model training image, and applying the mask on the image to remove information of a partial region of the image;

a mask initialization module: generating multi-scale initialization masks according to the tested images, and respectively applying the masks to the images to be tested to generate multi-scale images to be tested with partial image area information removed;

an image feature extraction module: extracting high-dimensional features of the image by using a depth convolution neural network for the image obtained by the mask random generation module or the mask initialization module;

an image reconstruction module: carrying out image reconstruction on the high-dimensional features of the image by using a depth convolution neural network to obtain a reconstructed training image or a reconstructed test image;

a reconstructed image alignment module: according to the reconstruction training image and the model training image, an image reconstruction loss function is used for realizing self-supervision learning;

a mask updating module: updating the multi-scale mask according to the reconstructed test image by using a mask updating algorithm, so that the mask is more concentrated on the abnormal part of the image;

a mask update termination decision module: judging whether the mask is consistent with the mask before updating according to the updated mask obtained by the mask updating module, if so, entering an abnormal evaluation module, and if not, acting the updated mask on the image to be tested and entering an image feature extraction module again;

an anomaly assessment module: and according to the result of the mask updating termination decision module, using an anomaly evaluation function to realize image anomaly evaluation.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention expands the image reconstruction task for image anomaly detection to the image anomaly detection and anomaly positioning field through the training of the self-supervision mask;

2. in practical application, such as medical diagnosis and industrial defect detection tasks, the abnormity often only appears in a small part of pixels of an image, and the abnormity detection method based on the image reconstruction task cannot accurately complete the positioning of an abnormity region.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a flow chart of the method of the present invention;

fig. 2 is a schematic diagram of the system in the embodiment.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

The embodiment of the invention provides an image anomaly detection and anomaly positioning method based on an automatic supervision mask, and as shown in figure 1, the method specifically comprises the following steps:

in this step, images for model training are used as input, and each input image is decomposed into

Wherein H and W are the height and width of the image, where k controls the size of the grid; each grid is composed of a square of k × k pixels and is set as a basic unit of the mask; k from the set

Middle sampling, wherein N_kIs the set cardinality, k_iRepresents the ith grid size k; in our implementation, we use K ═ {4, 8, 16, 32}, because it covers a wide range of scale sizes of anomaly classes. To expand the mask exploration space, a random mask is dynamically generated for each image during each training phase. Each grid is then randomly selected for masking or retention, and the resulting mask matrix is denoted M. In this way, a set of random masks of different sizes and shapes can be generated.By this way of generating random masks, each image is enhanced into a different set of training triples

Where I is the input image, M is the resulting mask,

it is the generated model input image, which is a point product operation in the spatial domain (mask needs to be copied along the channel dimension).

in this step, in particular, the image to be tested is given as an initialization from a set of multi-scale masks

The mask is composed of eight chessboard-like matrices with different scales, wherein the size K of the grid belongs to K; for each grid size k, a pair of complementary masks is included that collectively cover all pixels in the image. Thereby avoiding missing any possible abnormal area.

An image feature extraction step: and for the image obtained in the mask random generation step or the mask initialization step, extracting the high-dimensional features of the image by using a deep convolution neural network, wherein the image feature extraction network consists of a plurality of layers of convolution and down-sampling operations.

specifically, high-dimensional feature information obtained in the image feature extraction step is used as input, image reconstruction is realized by utilizing a deep convolution neural network model to obtain a reconstructed image, and an image attribute recovery network is formed by a plurality of layers of convolution and upsampling operations; and if the input is the model training image, outputting a reconstructed training image, and if the input is the image to be tested, outputting a reconstructed testing image.

And aligning the reconstructed images: according to the reconstructed training image and the model training image, self-supervision learning is achieved by using an image reconstruction loss function;

specifically, for the model training image, comparing the reconstructed training image obtained in the image reconstruction step with the model training image, thereby respectively calculating the following loss functions:

(1) mean square loss function:

wherein,

represents a two-norm;

(2) gradient magnitude similarity loss function:

wherein 1 represents a full 1 matrix;

i and

I^cand

display moduleA gradient amplitude similarity function of the model training image and the reconstructed training image;

i, j represents two-dimensional coordinates of the image;

representing the dimension of the matrix;

and the gradient magnitude similarity loss matrix for channel c:

wherein a represents a constant;

h_xand h_yIs a Prewitt filter in the x and y dimensions.

(3) Structural similarity exponential loss function:

wherein,

the purpose of the mask update is to remove the regions of the mask that may correspond to normal regions of the image so that the image reconstruction network is more concerned with the remaining abnormal regions. In each iteration updating, a region with small reconstruction error is regarded as a normal region and is removed from the mask in the next iteration, so that the mask is updated by the reconstruction error; given the grid size k, the image is divided into k × k grids, and the mask is updated by taking the k × k grids as a unit, so that the algorithm is more stable, and the iterative update times are reduced; then, for each grid, an average reconstruction error is calculated, and the mask is updated according to a threshold value, leaving in the mask the portions of the reconstruction error above the threshold value.

in particular, when most of the area covered by the mask is an abnormal area, providing more image information cannot significantly reduce the reconstruction error of the abnormal area. In this case, the overall reconstruction error will not be significantly reduced and the corresponding mask will remain unchanged. At this time, the mask updating should be terminated, and the final mask is obtained; finally, when this method is finished, it is expected that the mask will cover only the abnormal part of the image, and the final mask and the reconstructed image are taken as input to the abnormality assessment step. And if the mask is continuously changed in the mask updating step, not entering the abnormality evaluating step, but re-entering the image feature extracting step until the output mask of the mask updating step is kept unchanged.

An abnormality evaluation step: according to the result of the mask updating termination decision step, using an anomaly evaluation function to realize image anomaly evaluation;

the method specifically comprises the following steps: for the image to be tested, comparing the final reconstructed image obtained in the mask updating and stopping decision step with the image to be tested, thereby calculating the following abnormal evaluation function:

wherein,

representing test images and reconstructionL of the image₂Distance.

Next, the present invention will be described in more detail.

The invention provides an image anomaly detection and anomaly positioning method based on an automatic supervision mask, as shown in figure 1, which is a flow chart of an embodiment of the image anomaly detection and anomaly positioning method based on the automatic supervision mask, the method randomly generates a mask with the same size with an image for model training, and applies the mask on the image to remove the information of part of the image; generating multi-scale initialization masks for the images to be tested, and respectively applying the masks to the images to be tested to generate multi-scale images to be tested with partial image area information removed; extracting high-dimensional features of the input image by using a deep convolutional neural network; carrying out image reconstruction on the high-dimensional characteristics of the image by using a deep convolution neural network to obtain a reconstructed training image or a reconstructed test image, and realizing self-supervision learning by using an image reconstruction loss function; and updating the multi-scale mask by using a mask updating algorithm according to the reconstructed test image, so that the mask is more concentrated on the abnormal part of the image, performing a mask updating termination decision by judging whether the mask is consistent with the mask before updating, and realizing the abnormal evaluation of the image by using an abnormal evaluation function.

The invention expands the image reconstruction task for image anomaly detection to the fields of image anomaly detection and anomaly positioning through the training of the self-supervision mask. In practical applications, such as medical diagnosis and industrial defect detection tasks, abnormalities often appear in only a small portion of pixels of an image, and the abnormality detection method based on the image reconstruction task cannot accurately complete the positioning of an abnormal region. By introducing the training of the self-supervision mask, the abnormity positioning capability of the abnormity detection algorithm is improved, and the interpretability of the abnormity detection algorithm is improved, so that better performance is obtained on the tasks of abnormity detection and abnormity positioning.

Specifically, with reference to fig. 1, the method comprises the steps of:

a mask random generation step: randomly generating a mask with the size consistent with that of an image for model training, applying the mask on the image and removing information of a partial region of the image;

mask initialization step: generating multi-scale initialization masks for the images to be tested, and respectively applying the masks to the images to be tested to generate multi-scale images to be tested with partial image area information removed;

an image reconstruction step: carrying out image reconstruction on the high-dimensional features of the image obtained in the image feature extraction step by using a deep convolutional neural network to obtain a reconstructed training image or a reconstructed test image;

and aligning the reconstructed images: according to the reconstructed training image and the model training image obtained in the image reconstruction step, an image reconstruction loss function is used for realizing self-supervision learning;

a mask updating step: updating the multi-scale mask by using a mask updating algorithm on the reconstructed test image obtained according to the image reconstruction step, so that the mask is more concentrated on the abnormal part of the image;

In the embodiment of the invention, the mask random generation step comprises the following steps: a mask having a size corresponding to that of an image is randomly generated for the image for model training, and the mask is applied to the image to remove information in a partial region of the image.

In the mask initialization step: and generating multi-scale initialization masks for the images to be tested, and respectively applying the initialization masks to the images to be tested to generate multi-scale images to be tested with partial image area information removed.

In the image feature extraction step: and (3) extracting the high-dimensional features of the image by using a deep convolutional neural network for the image obtained in the mask random generation step or the mask initialization step.

In the image reconstruction step: and (3) carrying out image reconstruction on the high-dimensional features of the image obtained in the image feature extraction step by using a deep convolution neural network to obtain a reconstructed training image or a reconstructed test image.

In the reconstructed image alignment step: and realizing self-supervision learning by using an image reconstruction loss function according to the reconstructed training image and the model training image obtained in the image reconstruction step.

In the mask updating step: and updating the multi-scale mask by using a mask updating algorithm on the reconstructed test image obtained according to the image reconstruction step, so that the mask is more concentrated on the abnormal part of the image.

In the mask update termination decision step: and judging whether the mask is consistent with the mask before updating according to the updated mask obtained in the mask updating step, if so, entering an abnormal evaluation step, and if not, acting the updated mask on the image to be tested, and entering the image feature extraction step again.

In the abnormality assessment step: and according to the result of the mask updating termination decision step, using an anomaly evaluation function to realize image anomaly evaluation.

The invention also provides an image anomaly detection and anomaly positioning system based on the self-supervision mask, which specifically comprises the following steps:

Specifically, a network framework of a training system consisting of a mask random generation module, an image feature extraction module, an image reconstruction module, a reconstructed image alignment module, a mask updating termination decision module and an anomaly evaluation module is shown in fig. 2, and the whole system framework can be trained end to end.

In the embodiment system framework shown in FIG. 2, each input image is decomposed into a number of images for model training as input

Where H and W are the height and width of the image, where k controls the size of the grid. Each grid consists of a square of k × k pixels and is set as the basic unit of the mask. In particular, the size k is from the set

Middle sampling, wherein N_kIs the set cardinality. In our implementation, we use K ═ {4, 8, 16, 32}, because it covers a wide range of scale sizes of anomaly classes. To expand the mask exploration space, a random mask is dynamically generated for each image during each training phase. Each grid is then randomly selected for masking or retention, and the resulting mask matrix is denoted M. In this way, a set of random masks of different sizes and shapes can be generated. By this way of generating random masks, each image is enhanced into a different set of training triples

Where I is the input image, M is the resulting mask,

In the system framework of the embodiment shown in FIG. 2, given an image to be tested, as an initialization, from a set of multi-scale masks

Initially, the mask consists of eight tessellated matrices of different dimensions, where the grid size K ∈ K. For each grid size k, a pair of complementary masks is included that collectively cover all pixels in the image, thereby avoiding the loss of any possible outlier regions.

In the system framework of the embodiment shown in fig. 2, the image obtained by the mask random generation module or the mask initialization module is used as an input, a deep convolution neural network is used to extract high-dimensional feature information of the image, and the image feature extraction network is composed of a plurality of layers of convolution and down-sampling operations.

In the system framework of the embodiment shown in fig. 2, high-dimensional feature information obtained by an image feature extraction module is used as input, image reconstruction is realized by using a deep convolutional neural network model to obtain a reconstructed image, and an image attribute recovery network is formed by a plurality of layers of convolution and upsampling operations. And if the input is the model training image, outputting a reconstructed training image, and if the input is the image to be tested, outputting a reconstructed testing image.

In the system framework of the embodiment shown in fig. 2, for the model training image, the reconstructed training image obtained by the image reconstruction module is compared with the model training image, so as to calculate the following loss functions respectively:

1) mean square loss function:

wherein,

represents a two-norm;

(2) gradient magnitude similarity loss function:

wherein 1 represents a full 1 matrix;

i and

I^cand

representing a gradient amplitude similarity function of the model training image and the reconstructed training image;

i, j represents two-dimensional coordinates of the image;

representing the dimension of the matrix;

and the gradient magnitude similarity loss matrix for channel c:

wherein a represents a constant;

h_xand h_yIs a Prewitt filter in the x and y dimensions.

(3) Structural similarity exponential loss function:

wherein,

In the embodiment system framework shown in fig. 2, the purpose of the mask update is to remove the mask regions that may correspond to normal regions of the image, so that the image reconstruction network is more concerned about the remaining abnormal regions. In each iteration update, the mask is updated with the reconstruction error by considering the region with the smaller reconstruction error as a normal region and removing it from the mask in the next iteration. The mask is updated by taking a grid of k multiplied by k as a unit, so that the algorithm is more stable, and the iterative updating times are reduced. Thus, given a mesh size k, the image is segmented into a k × k mesh. Then, for each grid, the average reconstruction error over it is calculated, and the mask is updated according to a threshold, leaving in the mask the portions of the reconstruction error above the threshold.

Referring to fig. 2, when most of the area covered by the mask is an abnormal area, providing more image information does not significantly reduce the reconstruction error of the abnormal area. In this case, the overall reconstruction error will not be significantly reduced and the corresponding mask will remain unchanged. The mask update should be terminated at this point and the final mask is obtained. Finally, when this method ends, it is expected that the mask will cover only the abnormal portion of the image, and the final mask and the reconstructed image are taken as input to the abnormality assessment module. And if the mask is continuously changed in the mask updating step, the abnormal evaluation module is not entered, but the image feature extraction module is entered again until the output mask of the mask updating module is kept unchanged.

In the system framework of the embodiment shown in fig. 2, for the image to be tested, the final reconstructed image obtained by the reconstruction mask update stop decision module is compared with the image to be tested, so as to calculate the following abnormal rating function:

wherein,

l representing a test image and a reconstructed image₂Distance.

In summary, the embodiment of the invention provides an image anomaly detection and anomaly positioning method and system based on an automatic supervision mask, and an image reconstruction task for image anomaly detection is expanded to the field of image anomaly detection and anomaly positioning through the training of the automatic supervision mask. In practical applications, such as medical diagnosis and industrial defect detection tasks, abnormalities often appear in only a small portion of pixels of an image, and the abnormality detection method based on the image reconstruction task cannot accurately complete the positioning of an abnormal region. By introducing the training of the self-supervision mask, the abnormity positioning capability of the abnormity detection algorithm is improved, the interpretability of the abnormity detection algorithm is improved, and therefore better performance is achieved on the tasks of abnormity detection and abnormity positioning.

Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. An image anomaly detection and anomaly positioning method based on an automatic supervision mask is characterized by comprising the following steps:

2. The method for image anomaly detection and anomaly localization based on an unsupervised mask according to claim 1, wherein the mask random generation step comprises:

using images for model training as input, decomposing each input image into

k from the set

3. The method for image anomaly detection and anomaly localization based on an unsupervised mask according to claim 1, wherein the mask initialization step comprises: given an image to be tested, as an initialization, from a set of multi-scale masks

4. The method for detecting and locating image abnormality based on self-supervision mask as claimed in claim 1, wherein the image feature extraction step includes: and taking the image obtained in the mask random generation step or the mask initialization step as input, and extracting high-dimensional feature information of the image by using a deep convolution neural network, wherein the image feature extraction network consists of a plurality of layers of convolution and down-sampling operations.

5. The method for detecting and locating image anomalies based on self-supervised masks according to claim 1, characterized in that the image reconstruction step comprises: the high-dimensional characteristic information obtained in the image characteristic extraction step is used as input, image reconstruction is realized by utilizing a deep convolution neural network model to obtain a reconstructed image, and an image attribute recovery network is formed by a plurality of layers of convolution and up-sampling operations;

6. The image anomaly detection and anomaly positioning method based on the self-supervision mask according to claim 1, characterized in that the reconstructed image alignment step specifically comprises the following steps:

(1) mean square loss function:

wherein,

represents a two-norm;

(2) gradient magnitude similarity loss function:

wherein 1 represents a full 1 matrix;

i and

I^cand

i, j represents two-dimensional coordinates of the image;

representing the dimension of the matrix;

and the gradient magnitude similarity loss matrix for channel c:

wherein a represents a constant;

h_xand h_yIs a Prewitt filter in the x and y dimensions;

(3) structural similarity exponential loss function:

wherein,

7. The method for image anomaly detection and anomaly localization based on an unsupervised mask according to claim 1, wherein the mask updating step comprises:

8. The method of claim 1, wherein the mask update termination decision step comprises:

9. The method of claim 1, wherein the anomaly assessment step comprises:

wherein,

l representing a test image and a reconstructed image₂Distance.

10. An image anomaly detection and anomaly localization system based on an auto-supervised mask, comprising: