CN114022475A - Image anomaly detection and anomaly positioning method and system based on self-supervision mask - Google Patents

Image anomaly detection and anomaly positioning method and system based on self-supervision mask Download PDF

Info

Publication number
CN114022475A
CN114022475A CN202111397389.4A CN202111397389A CN114022475A CN 114022475 A CN114022475 A CN 114022475A CN 202111397389 A CN202111397389 A CN 202111397389A CN 114022475 A CN114022475 A CN 114022475A
Authority
CN
China
Prior art keywords
image
mask
reconstructed
updating
anomaly
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111397389.4A
Other languages
Chinese (zh)
Other versions
CN114022475B (en
Inventor
王延峰
黄潮钦
徐勤伟
张娅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202111397389.4A priority Critical patent/CN114022475B/en
Publication of CN114022475A publication Critical patent/CN114022475A/en
Application granted granted Critical
Publication of CN114022475B publication Critical patent/CN114022475B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an image anomaly detection and anomaly positioning method and system based on an automatic supervision mask, which relate to the technical field of computer vision and image processing, and comprise the following steps: the method comprises the steps of mask random generation, mask initialization, mask generation initialization, image feature extraction, image reconstruction, reconstructed image alignment, mask updating termination decision making and anomaly assessment. The invention improves the abnormal positioning capability of the abnormal detection algorithm by introducing the training of the self-monitoring mask, thereby obtaining better performance on the tasks of abnormal detection and abnormal positioning.

Description

Image anomaly detection and anomaly positioning method and system based on self-supervision mask
Technical Field
The invention relates to the technical field of computer vision and image processing, in particular to an unsupervised image anomaly detection and anomaly positioning method and system based on an automatic supervision mask, and particularly relates to an image anomaly detection and anomaly positioning method and system based on the automatic supervision mask.
Background
Currently, deep learning techniques based on deep neural networks have achieved significant success in object classification tasks, and such data-driven approaches typically require large amounts of labeled data for training. However, in the task of anomaly detection, the variety of anomalies is not exhaustive, and therefore, it is too costly to collect enough anomaly data for model training. In this case, the anomaly detection task usually only provides normal data for model training, and requires that the anomaly detection method must be able to still have data anomaly detection capability without abnormal data training.
The image anomaly detection solution based on image reconstruction uses data of normal category to train an image reconstruction model, and assumes that the model cannot be applied to image reconstruction of abnormal data. In the abnormal detection stage, the image reconstruction model has limited capability of image reconstruction aiming at abnormal data, and larger image reconstruction errors can be caused. Therefore, the reconstruction error can be used as a detection index for abnormality detection. However, for practical applications, such as medical diagnosis and industrial defect detection tasks, the abnormality often appears in only a small portion of pixels of the image, and the above method can only be used for judging whether the abnormality exists in the whole image, and cannot accurately complete the positioning of the abnormal region. In fact, in order to improve the anomaly detection performance of the algorithm and the interpretability of the anomaly detection algorithm, anomaly positioning is very important, but the important task is often ignored by the existing anomaly detection algorithm.
An invention patent with publication number CN110866908B discloses an image processing method, apparatus, server and storage medium, comprising: acquiring an image to be detected, and performing down-sampling abnormal classification processing on the image to be detected to obtain an abnormal class prediction label and a target characteristic diagram; performing primary abnormal positioning processing based on the abnormal category prediction label and the target characteristic image to obtain an initial positioning image corresponding to the image to be detected; carrying out up-sampling abnormal positioning processing on the initial positioning image to obtain a target positioning image corresponding to the image to be detected; and outputting the abnormal category prediction label and the target positioning image.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an image anomaly detection and anomaly positioning method and system based on an automatic supervision mask.
According to the image anomaly detection and anomaly positioning method and system based on the self-supervision mask, the scheme is as follows:
in a first aspect, a method for detecting and locating an image anomaly based on an auto-supervision mask is provided, the method comprising:
a mask random generation step: randomly generating a mask with the size consistent with that of the model training image, and applying the mask on the image to remove information of a partial region of the image;
mask initialization step: generating multi-scale initialization masks according to the tested images, and respectively applying the masks to the images to be tested to generate multi-scale images to be tested with partial image area information removed;
an image feature extraction step: extracting high-dimensional features of the image by using a depth convolution neural network for the image obtained in the mask random generation step or the mask initialization step;
an image reconstruction step: carrying out image reconstruction on the high-dimensional features of the image by using a depth convolution neural network to obtain a reconstructed training image or a reconstructed test image;
and aligning the reconstructed images: according to the reconstruction training image and the model training image, an image reconstruction loss function is used for realizing self-supervision learning;
a mask updating step: updating the multi-scale mask according to the reconstructed test image by using a mask updating algorithm, so that the mask is more concentrated on the abnormal part of the image;
a mask updating termination decision step: judging whether the mask is consistent with the mask before updating according to the updated mask obtained in the mask updating step, if so, entering an abnormal evaluation step, and if not, acting the updated mask on the image to be tested, and entering the image feature extraction step again;
an abnormality evaluation step: and according to the result of the mask updating termination decision step, using an anomaly evaluation function to realize image anomaly evaluation.
Preferably, the mask random generating step includes:
using images for model training as input, decomposing each input image into
Figure BDA0003370399620000021
Wherein H and W are the height and width of the image, where k controls the size of the grid;
each grid is composed of a square of k × k pixels and is set as a basic unit of the mask;
k from the set
Figure BDA0003370399620000022
Middle sampling, wherein NkIs the set cardinality, kiRepresents the ith grid size k;
each grid is randomly selected for masking or retention and the resulting mask matrix is denoted M.
Preferably, the mask initializing step includes: given an image to be tested, as an initialization, from a set of multi-scale masks
Figure BDA0003370399620000039
The mask is composed of eight chessboard-like matrices with different scales, wherein the size K of the grid belongs to K;
for each grid size k, a pair of complementary masks is included that collectively cover all pixels in the image.
Preferably, the image feature extraction step includes: and taking the image obtained in the mask random generation step or the mask initialization step as input, and extracting high-dimensional feature information of the image by using a deep convolution neural network, wherein the image feature extraction network consists of a plurality of layers of convolution and down-sampling operations.
Preferably, the image reconstructing step includes: the high-dimensional characteristic information obtained in the image characteristic extraction step is used as input, image reconstruction is realized by utilizing a deep convolution neural network model to obtain a reconstructed image, and an image attribute recovery network is formed by a plurality of layers of convolution and up-sampling operations;
and if the input is the model training image, outputting a reconstructed training image, and if the input is the image to be tested, outputting a reconstructed testing image.
Preferably, the step of aligning the reconstructed images specifically includes:
and for the model training image, comparing the reconstructed training image obtained in the image reconstruction step with the model training image, and respectively calculating the following loss functions:
(1) mean square loss function:
Figure BDA0003370399620000031
wherein,
Figure BDA0003370399620000038
represents a two-norm;
(2) gradient magnitude similarity loss function:
Figure BDA0003370399620000032
Figure BDA0003370399620000033
wherein 1 represents a full 1 matrix;
i and
Figure BDA0003370399620000034
respectively representing a model training image and a reconstruction training image;
Icand
Figure BDA0003370399620000035
respectively representing the c-th color channel of the model training image and the reconstructed training image;
Figure BDA0003370399620000036
gradient magnitude representing model training image and reconstructed training imageA degree similarity function;
i, j represents two-dimensional coordinates of the image;
Figure BDA00033703996200000310
representing the dimension of the matrix;
and the gradient magnitude similarity loss matrix for channel c:
Figure BDA0003370399620000037
Figure BDA0003370399620000041
wherein a represents a constant;
hxand hyIs a Prewitt filter in the x and y dimensions;
(3) structural similarity exponential loss function:
Figure BDA0003370399620000042
wherein,
Figure BDA0003370399620000043
representing a structural similarity index function centered on the image two-dimensional coordinates i, j.
Preferably, the mask updating step includes:
in each iteration updating, a region with small reconstruction error is regarded as a normal region and is removed from the mask in the next iteration, so that the mask is updated by the reconstruction error;
given the grid size k, the image is divided into k × k grids, and the mask is updated by taking the k × k grids as a unit, so that the algorithm is more stable, and the iterative update times are reduced;
for each grid, the average reconstruction error is calculated, the mask is updated according to a threshold value, and the parts of the mask where the reconstruction error is higher than the threshold value are reserved.
Preferably, the mask update termination decision step includes:
when most of the area covered by the mask is an abnormal area, stopping updating the mask and obtaining a final mask;
after the process is finished, the expected mask only covers the abnormal part of the image, and the final mask and the reconstructed image are used as the input of the abnormality evaluation step;
and if the mask is continuously changed in the mask updating step, not entering an abnormal evaluation step, but re-entering the image feature extraction step until the output mask of the mask updating step is kept unchanged.
Preferably, the abnormality assessing step includes:
for the image to be tested, comparing the final reconstructed image obtained in the mask updating and stopping decision step with the image to be tested, thereby calculating the following abnormal evaluation function:
Figure BDA0003370399620000044
wherein,
Figure BDA0003370399620000045
l representing a test image and a reconstructed image2Distance.
In a second aspect, there is provided an image anomaly detection and anomaly localization system based on an auto-supervised mask, the system comprising:
a mask random generation module: randomly generating a mask with the size consistent with that of the model training image, and applying the mask on the image to remove information of a partial region of the image;
a mask initialization module: generating multi-scale initialization masks according to the tested images, and respectively applying the masks to the images to be tested to generate multi-scale images to be tested with partial image area information removed;
an image feature extraction module: extracting high-dimensional features of the image by using a depth convolution neural network for the image obtained by the mask random generation module or the mask initialization module;
an image reconstruction module: carrying out image reconstruction on the high-dimensional features of the image by using a depth convolution neural network to obtain a reconstructed training image or a reconstructed test image;
a reconstructed image alignment module: according to the reconstruction training image and the model training image, an image reconstruction loss function is used for realizing self-supervision learning;
a mask updating module: updating the multi-scale mask according to the reconstructed test image by using a mask updating algorithm, so that the mask is more concentrated on the abnormal part of the image;
a mask update termination decision module: judging whether the mask is consistent with the mask before updating according to the updated mask obtained by the mask updating module, if so, entering an abnormal evaluation module, and if not, acting the updated mask on the image to be tested and entering an image feature extraction module again;
an anomaly assessment module: and according to the result of the mask updating termination decision module, using an anomaly evaluation function to realize image anomaly evaluation.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention expands the image reconstruction task for image anomaly detection to the image anomaly detection and anomaly positioning field through the training of the self-supervision mask;
2. in practical application, such as medical diagnosis and industrial defect detection tasks, the abnormity often only appears in a small part of pixels of an image, and the abnormity detection method based on the image reconstruction task cannot accurately complete the positioning of an abnormity region.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a schematic diagram of the system in the embodiment.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The embodiment of the invention provides an image anomaly detection and anomaly positioning method based on an automatic supervision mask, and as shown in figure 1, the method specifically comprises the following steps:
a mask random generation step: randomly generating a mask with the size consistent with that of the model training image, and applying the mask on the image to remove information of a partial region of the image;
in this step, images for model training are used as input, and each input image is decomposed into
Figure BDA0003370399620000061
Wherein H and W are the height and width of the image, where k controls the size of the grid; each grid is composed of a square of k × k pixels and is set as a basic unit of the mask; k from the set
Figure BDA0003370399620000062
Middle sampling, wherein NkIs the set cardinality, kiRepresents the ith grid size k; in our implementation, we use K ═ {4, 8, 16, 32}, because it covers a wide range of scale sizes of anomaly classes. To expand the mask exploration space, a random mask is dynamically generated for each image during each training phase. Each grid is then randomly selected for masking or retention, and the resulting mask matrix is denoted M. In this way, a set of random masks of different sizes and shapes can be generated.By this way of generating random masks, each image is enhanced into a different set of training triples
Figure BDA0003370399620000063
Where I is the input image, M is the resulting mask,
Figure BDA0003370399620000064
it is the generated model input image, which is a point product operation in the spatial domain (mask needs to be copied along the channel dimension).
Mask initialization step: generating multi-scale initialization masks according to the tested images, and respectively applying the masks to the images to be tested to generate multi-scale images to be tested with partial image area information removed;
in this step, in particular, the image to be tested is given as an initialization from a set of multi-scale masks
Figure BDA0003370399620000065
The mask is composed of eight chessboard-like matrices with different scales, wherein the size K of the grid belongs to K; for each grid size k, a pair of complementary masks is included that collectively cover all pixels in the image. Thereby avoiding missing any possible abnormal area.
An image feature extraction step: and for the image obtained in the mask random generation step or the mask initialization step, extracting the high-dimensional features of the image by using a deep convolution neural network, wherein the image feature extraction network consists of a plurality of layers of convolution and down-sampling operations.
An image reconstruction step: carrying out image reconstruction on the high-dimensional features of the image by using a depth convolution neural network to obtain a reconstructed training image or a reconstructed test image;
specifically, high-dimensional feature information obtained in the image feature extraction step is used as input, image reconstruction is realized by utilizing a deep convolution neural network model to obtain a reconstructed image, and an image attribute recovery network is formed by a plurality of layers of convolution and upsampling operations; and if the input is the model training image, outputting a reconstructed training image, and if the input is the image to be tested, outputting a reconstructed testing image.
And aligning the reconstructed images: according to the reconstructed training image and the model training image, self-supervision learning is achieved by using an image reconstruction loss function;
specifically, for the model training image, comparing the reconstructed training image obtained in the image reconstruction step with the model training image, thereby respectively calculating the following loss functions:
(1) mean square loss function:
Figure BDA0003370399620000071
wherein,
Figure BDA0003370399620000072
represents a two-norm;
(2) gradient magnitude similarity loss function:
Figure BDA0003370399620000073
Figure BDA0003370399620000074
wherein 1 represents a full 1 matrix;
i and
Figure BDA0003370399620000075
respectively representing a model training image and a reconstruction training image;
Icand
Figure BDA0003370399620000076
respectively representing the c-th color channel of the model training image and the reconstructed training image;
Figure BDA0003370399620000077
display moduleA gradient amplitude similarity function of the model training image and the reconstructed training image;
i, j represents two-dimensional coordinates of the image;
Figure BDA00033703996200000712
representing the dimension of the matrix;
and the gradient magnitude similarity loss matrix for channel c:
Figure BDA0003370399620000078
Figure BDA0003370399620000079
wherein a represents a constant;
hxand hyIs a Prewitt filter in the x and y dimensions.
(3) Structural similarity exponential loss function:
Figure BDA00033703996200000710
wherein,
Figure BDA00033703996200000711
representing a structural similarity index function centered on the image two-dimensional coordinates i, j.
A mask updating step: updating the multi-scale mask according to the reconstructed test image by using a mask updating algorithm, so that the mask is more concentrated on the abnormal part of the image;
the purpose of the mask update is to remove the regions of the mask that may correspond to normal regions of the image so that the image reconstruction network is more concerned with the remaining abnormal regions. In each iteration updating, a region with small reconstruction error is regarded as a normal region and is removed from the mask in the next iteration, so that the mask is updated by the reconstruction error; given the grid size k, the image is divided into k × k grids, and the mask is updated by taking the k × k grids as a unit, so that the algorithm is more stable, and the iterative update times are reduced; then, for each grid, an average reconstruction error is calculated, and the mask is updated according to a threshold value, leaving in the mask the portions of the reconstruction error above the threshold value.
A mask updating termination decision step: judging whether the mask is consistent with the mask before updating according to the updated mask obtained in the mask updating step, if so, entering an abnormal evaluation step, and if not, acting the updated mask on the image to be tested, and entering the image feature extraction step again;
in particular, when most of the area covered by the mask is an abnormal area, providing more image information cannot significantly reduce the reconstruction error of the abnormal area. In this case, the overall reconstruction error will not be significantly reduced and the corresponding mask will remain unchanged. At this time, the mask updating should be terminated, and the final mask is obtained; finally, when this method is finished, it is expected that the mask will cover only the abnormal part of the image, and the final mask and the reconstructed image are taken as input to the abnormality assessment step. And if the mask is continuously changed in the mask updating step, not entering the abnormality evaluating step, but re-entering the image feature extracting step until the output mask of the mask updating step is kept unchanged.
An abnormality evaluation step: according to the result of the mask updating termination decision step, using an anomaly evaluation function to realize image anomaly evaluation;
the method specifically comprises the following steps: for the image to be tested, comparing the final reconstructed image obtained in the mask updating and stopping decision step with the image to be tested, thereby calculating the following abnormal evaluation function:
Figure BDA0003370399620000081
wherein,
Figure BDA0003370399620000082
representing test images and reconstructionL of the image2Distance.
Next, the present invention will be described in more detail.
The invention provides an image anomaly detection and anomaly positioning method based on an automatic supervision mask, as shown in figure 1, which is a flow chart of an embodiment of the image anomaly detection and anomaly positioning method based on the automatic supervision mask, the method randomly generates a mask with the same size with an image for model training, and applies the mask on the image to remove the information of part of the image; generating multi-scale initialization masks for the images to be tested, and respectively applying the masks to the images to be tested to generate multi-scale images to be tested with partial image area information removed; extracting high-dimensional features of the input image by using a deep convolutional neural network; carrying out image reconstruction on the high-dimensional characteristics of the image by using a deep convolution neural network to obtain a reconstructed training image or a reconstructed test image, and realizing self-supervision learning by using an image reconstruction loss function; and updating the multi-scale mask by using a mask updating algorithm according to the reconstructed test image, so that the mask is more concentrated on the abnormal part of the image, performing a mask updating termination decision by judging whether the mask is consistent with the mask before updating, and realizing the abnormal evaluation of the image by using an abnormal evaluation function.
The invention expands the image reconstruction task for image anomaly detection to the fields of image anomaly detection and anomaly positioning through the training of the self-supervision mask. In practical applications, such as medical diagnosis and industrial defect detection tasks, abnormalities often appear in only a small portion of pixels of an image, and the abnormality detection method based on the image reconstruction task cannot accurately complete the positioning of an abnormal region. By introducing the training of the self-supervision mask, the abnormity positioning capability of the abnormity detection algorithm is improved, and the interpretability of the abnormity detection algorithm is improved, so that better performance is obtained on the tasks of abnormity detection and abnormity positioning.
Specifically, with reference to fig. 1, the method comprises the steps of:
a mask random generation step: randomly generating a mask with the size consistent with that of an image for model training, applying the mask on the image and removing information of a partial region of the image;
mask initialization step: generating multi-scale initialization masks for the images to be tested, and respectively applying the masks to the images to be tested to generate multi-scale images to be tested with partial image area information removed;
an image feature extraction step: extracting high-dimensional features of the image by using a depth convolution neural network for the image obtained in the mask random generation step or the mask initialization step;
an image reconstruction step: carrying out image reconstruction on the high-dimensional features of the image obtained in the image feature extraction step by using a deep convolutional neural network to obtain a reconstructed training image or a reconstructed test image;
and aligning the reconstructed images: according to the reconstructed training image and the model training image obtained in the image reconstruction step, an image reconstruction loss function is used for realizing self-supervision learning;
a mask updating step: updating the multi-scale mask by using a mask updating algorithm on the reconstructed test image obtained according to the image reconstruction step, so that the mask is more concentrated on the abnormal part of the image;
a mask updating termination decision step: judging whether the mask is consistent with the mask before updating according to the updated mask obtained in the mask updating step, if so, entering an abnormal evaluation step, and if not, acting the updated mask on the image to be tested, and entering the image feature extraction step again;
an abnormality evaluation step: and according to the result of the mask updating termination decision step, using an anomaly evaluation function to realize image anomaly evaluation.
In the embodiment of the invention, the mask random generation step comprises the following steps: a mask having a size corresponding to that of an image is randomly generated for the image for model training, and the mask is applied to the image to remove information in a partial region of the image.
In the mask initialization step: and generating multi-scale initialization masks for the images to be tested, and respectively applying the initialization masks to the images to be tested to generate multi-scale images to be tested with partial image area information removed.
In the image feature extraction step: and (3) extracting the high-dimensional features of the image by using a deep convolutional neural network for the image obtained in the mask random generation step or the mask initialization step.
In the image reconstruction step: and (3) carrying out image reconstruction on the high-dimensional features of the image obtained in the image feature extraction step by using a deep convolution neural network to obtain a reconstructed training image or a reconstructed test image.
In the reconstructed image alignment step: and realizing self-supervision learning by using an image reconstruction loss function according to the reconstructed training image and the model training image obtained in the image reconstruction step.
In the mask updating step: and updating the multi-scale mask by using a mask updating algorithm on the reconstructed test image obtained according to the image reconstruction step, so that the mask is more concentrated on the abnormal part of the image.
In the mask update termination decision step: and judging whether the mask is consistent with the mask before updating according to the updated mask obtained in the mask updating step, if so, entering an abnormal evaluation step, and if not, acting the updated mask on the image to be tested, and entering the image feature extraction step again.
In the abnormality assessment step: and according to the result of the mask updating termination decision step, using an anomaly evaluation function to realize image anomaly evaluation.
The invention also provides an image anomaly detection and anomaly positioning system based on the self-supervision mask, which specifically comprises the following steps:
a mask random generation module: randomly generating a mask with the size consistent with that of the model training image, and applying the mask on the image to remove information of a partial region of the image;
a mask initialization module: generating multi-scale initialization masks according to the tested images, and respectively applying the masks to the images to be tested to generate multi-scale images to be tested with partial image area information removed;
an image feature extraction module: extracting high-dimensional features of the image by using a depth convolution neural network for the image obtained by the mask random generation module or the mask initialization module;
an image reconstruction module: carrying out image reconstruction on the high-dimensional features of the image by using a depth convolution neural network to obtain a reconstructed training image or a reconstructed test image;
a reconstructed image alignment module: according to the reconstruction training image and the model training image, an image reconstruction loss function is used for realizing self-supervision learning;
a mask updating module: updating the multi-scale mask according to the reconstructed test image by using a mask updating algorithm, so that the mask is more concentrated on the abnormal part of the image;
a mask update termination decision module: judging whether the mask is consistent with the mask before updating according to the updated mask obtained by the mask updating module, if so, entering an abnormal evaluation module, and if not, acting the updated mask on the image to be tested and entering an image feature extraction module again;
an anomaly assessment module: and according to the result of the mask updating termination decision module, using an anomaly evaluation function to realize image anomaly evaluation.
Specifically, a network framework of a training system consisting of a mask random generation module, an image feature extraction module, an image reconstruction module, a reconstructed image alignment module, a mask updating termination decision module and an anomaly evaluation module is shown in fig. 2, and the whole system framework can be trained end to end.
In the embodiment system framework shown in FIG. 2, each input image is decomposed into a number of images for model training as input
Figure BDA0003370399620000111
Where H and W are the height and width of the image, where k controls the size of the grid. Each grid consists of a square of k × k pixels and is set as the basic unit of the mask. In particular, the size k is from the set
Figure BDA0003370399620000112
Figure BDA0003370399620000113
Middle sampling, wherein NkIs the set cardinality. In our implementation, we use K ═ {4, 8, 16, 32}, because it covers a wide range of scale sizes of anomaly classes. To expand the mask exploration space, a random mask is dynamically generated for each image during each training phase. Each grid is then randomly selected for masking or retention, and the resulting mask matrix is denoted M. In this way, a set of random masks of different sizes and shapes can be generated. By this way of generating random masks, each image is enhanced into a different set of training triples
Figure BDA0003370399620000114
Where I is the input image, M is the resulting mask,
Figure BDA0003370399620000115
it is the generated model input image, which is a point product operation in the spatial domain (mask needs to be copied along the channel dimension).
In the system framework of the embodiment shown in FIG. 2, given an image to be tested, as an initialization, from a set of multi-scale masks
Figure BDA0003370399620000118
Initially, the mask consists of eight tessellated matrices of different dimensions, where the grid size K ∈ K. For each grid size k, a pair of complementary masks is included that collectively cover all pixels in the image, thereby avoiding the loss of any possible outlier regions.
In the system framework of the embodiment shown in fig. 2, the image obtained by the mask random generation module or the mask initialization module is used as an input, a deep convolution neural network is used to extract high-dimensional feature information of the image, and the image feature extraction network is composed of a plurality of layers of convolution and down-sampling operations.
In the system framework of the embodiment shown in fig. 2, high-dimensional feature information obtained by an image feature extraction module is used as input, image reconstruction is realized by using a deep convolutional neural network model to obtain a reconstructed image, and an image attribute recovery network is formed by a plurality of layers of convolution and upsampling operations. And if the input is the model training image, outputting a reconstructed training image, and if the input is the image to be tested, outputting a reconstructed testing image.
In the system framework of the embodiment shown in fig. 2, for the model training image, the reconstructed training image obtained by the image reconstruction module is compared with the model training image, so as to calculate the following loss functions respectively:
1) mean square loss function:
Figure BDA0003370399620000116
wherein,
Figure BDA0003370399620000117
represents a two-norm;
(2) gradient magnitude similarity loss function:
Figure BDA0003370399620000121
Figure BDA0003370399620000122
wherein 1 represents a full 1 matrix;
i and
Figure BDA0003370399620000123
respectively representing a model training image and a reconstruction training image;
Icand
Figure BDA0003370399620000124
respectively representing the c-th color channel of the model training image and the reconstructed training image;
Figure BDA0003370399620000125
representing a gradient amplitude similarity function of the model training image and the reconstructed training image;
i, j represents two-dimensional coordinates of the image;
Figure BDA00033703996200001210
representing the dimension of the matrix;
and the gradient magnitude similarity loss matrix for channel c:
Figure BDA0003370399620000126
Figure BDA0003370399620000127
wherein a represents a constant;
hxand hyIs a Prewitt filter in the x and y dimensions.
(3) Structural similarity exponential loss function:
Figure BDA0003370399620000128
wherein,
Figure BDA0003370399620000129
representing a structural similarity index function centered on the image two-dimensional coordinates i, j.
In the embodiment system framework shown in fig. 2, the purpose of the mask update is to remove the mask regions that may correspond to normal regions of the image, so that the image reconstruction network is more concerned about the remaining abnormal regions. In each iteration update, the mask is updated with the reconstruction error by considering the region with the smaller reconstruction error as a normal region and removing it from the mask in the next iteration. The mask is updated by taking a grid of k multiplied by k as a unit, so that the algorithm is more stable, and the iterative updating times are reduced. Thus, given a mesh size k, the image is segmented into a k × k mesh. Then, for each grid, the average reconstruction error over it is calculated, and the mask is updated according to a threshold, leaving in the mask the portions of the reconstruction error above the threshold.
Referring to fig. 2, when most of the area covered by the mask is an abnormal area, providing more image information does not significantly reduce the reconstruction error of the abnormal area. In this case, the overall reconstruction error will not be significantly reduced and the corresponding mask will remain unchanged. The mask update should be terminated at this point and the final mask is obtained. Finally, when this method ends, it is expected that the mask will cover only the abnormal portion of the image, and the final mask and the reconstructed image are taken as input to the abnormality assessment module. And if the mask is continuously changed in the mask updating step, the abnormal evaluation module is not entered, but the image feature extraction module is entered again until the output mask of the mask updating module is kept unchanged.
In the system framework of the embodiment shown in fig. 2, for the image to be tested, the final reconstructed image obtained by the reconstruction mask update stop decision module is compared with the image to be tested, so as to calculate the following abnormal rating function:
Figure BDA0003370399620000131
wherein,
Figure BDA0003370399620000132
l representing a test image and a reconstructed image2Distance.
In summary, the embodiment of the invention provides an image anomaly detection and anomaly positioning method and system based on an automatic supervision mask, and an image reconstruction task for image anomaly detection is expanded to the field of image anomaly detection and anomaly positioning through the training of the automatic supervision mask. In practical applications, such as medical diagnosis and industrial defect detection tasks, abnormalities often appear in only a small portion of pixels of an image, and the abnormality detection method based on the image reconstruction task cannot accurately complete the positioning of an abnormal region. By introducing the training of the self-supervision mask, the abnormity positioning capability of the abnormity detection algorithm is improved, the interpretability of the abnormity detection algorithm is improved, and therefore better performance is achieved on the tasks of abnormity detection and abnormity positioning.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (10)

1. An image anomaly detection and anomaly positioning method based on an automatic supervision mask is characterized by comprising the following steps:
a mask random generation step: randomly generating a mask with the size consistent with that of the model training image, and applying the mask on the image to remove information of a partial region of the image;
mask initialization step: generating multi-scale initialization masks according to the tested images, and respectively applying the masks to the images to be tested to generate multi-scale images to be tested with partial image area information removed;
an image feature extraction step: extracting high-dimensional features of the image by using a depth convolution neural network for the image obtained in the mask random generation step or the mask initialization step;
an image reconstruction step: carrying out image reconstruction on the high-dimensional features of the image by using a depth convolution neural network to obtain a reconstructed training image or a reconstructed test image;
and aligning the reconstructed images: according to the reconstruction training image and the model training image, an image reconstruction loss function is used for realizing self-supervision learning;
a mask updating step: updating the multi-scale mask according to the reconstructed test image by using a mask updating algorithm, so that the mask is more concentrated on the abnormal part of the image;
a mask updating termination decision step: judging whether the mask is consistent with the mask before updating according to the updated mask obtained in the mask updating step, if so, entering an abnormal evaluation step, and if not, acting the updated mask on the image to be tested, and entering the image feature extraction step again;
an abnormality evaluation step: and according to the result of the mask updating termination decision step, using an anomaly evaluation function to realize image anomaly evaluation.
2. The method for image anomaly detection and anomaly localization based on an unsupervised mask according to claim 1, wherein the mask random generation step comprises:
using images for model training as input, decomposing each input image into
Figure FDA0003370399610000011
Wherein H and W are the height and width of the image, where k controls the size of the grid;
each grid is composed of a square of k × k pixels and is set as a basic unit of the mask;
k from the set
Figure FDA0003370399610000012
Middle sampling, wherein NkIs the set cardinality, kiRepresents the ith grid size k;
each grid is randomly selected for masking or retention and the resulting mask matrix is denoted M.
3. The method for image anomaly detection and anomaly localization based on an unsupervised mask according to claim 1, wherein the mask initialization step comprises: given an image to be tested, as an initialization, from a set of multi-scale masks
Figure FDA0003370399610000013
The mask is composed of eight chessboard-like matrices with different scales, wherein the size K of the grid belongs to K;
for each grid size k, a pair of complementary masks is included that collectively cover all pixels in the image.
4. The method for detecting and locating image abnormality based on self-supervision mask as claimed in claim 1, wherein the image feature extraction step includes: and taking the image obtained in the mask random generation step or the mask initialization step as input, and extracting high-dimensional feature information of the image by using a deep convolution neural network, wherein the image feature extraction network consists of a plurality of layers of convolution and down-sampling operations.
5. The method for detecting and locating image anomalies based on self-supervised masks according to claim 1, characterized in that the image reconstruction step comprises: the high-dimensional characteristic information obtained in the image characteristic extraction step is used as input, image reconstruction is realized by utilizing a deep convolution neural network model to obtain a reconstructed image, and an image attribute recovery network is formed by a plurality of layers of convolution and up-sampling operations;
and if the input is the model training image, outputting a reconstructed training image, and if the input is the image to be tested, outputting a reconstructed testing image.
6. The image anomaly detection and anomaly positioning method based on the self-supervision mask according to claim 1, characterized in that the reconstructed image alignment step specifically comprises the following steps:
and for the model training image, comparing the reconstructed training image obtained in the image reconstruction step with the model training image, and respectively calculating the following loss functions:
(1) mean square loss function:
Figure FDA0003370399610000021
wherein,
Figure FDA0003370399610000022
represents a two-norm;
(2) gradient magnitude similarity loss function:
Figure FDA0003370399610000023
Figure FDA0003370399610000024
wherein 1 represents a full 1 matrix;
i and
Figure FDA0003370399610000025
respectively representing a model training image and a reconstruction training image;
Icand
Figure FDA0003370399610000026
respectively representing the c-th color channel of the model training image and the reconstructed training image;
Figure FDA0003370399610000027
representing a gradient amplitude similarity function of the model training image and the reconstructed training image;
i, j represents two-dimensional coordinates of the image;
Figure FDA0003370399610000028
representing the dimension of the matrix;
and the gradient magnitude similarity loss matrix for channel c:
Figure FDA0003370399610000031
Figure FDA0003370399610000032
wherein a represents a constant;
hxand hyIs a Prewitt filter in the x and y dimensions;
(3) structural similarity exponential loss function:
Figure FDA0003370399610000033
wherein,
Figure FDA0003370399610000034
representing a structural similarity index function centered on the image two-dimensional coordinates i, j.
7. The method for image anomaly detection and anomaly localization based on an unsupervised mask according to claim 1, wherein the mask updating step comprises:
in each iteration updating, a region with small reconstruction error is regarded as a normal region and is removed from the mask in the next iteration, so that the mask is updated by the reconstruction error;
given the grid size k, the image is divided into k × k grids, and the mask is updated by taking the k × k grids as a unit, so that the algorithm is more stable, and the iterative update times are reduced;
for each grid, the average reconstruction error is calculated, the mask is updated according to a threshold value, and the parts of the mask where the reconstruction error is higher than the threshold value are reserved.
8. The method of claim 1, wherein the mask update termination decision step comprises:
when most of the area covered by the mask is an abnormal area, stopping updating the mask and obtaining a final mask;
after the process is finished, the expected mask only covers the abnormal part of the image, and the final mask and the reconstructed image are used as the input of the abnormality evaluation step;
and if the mask is continuously changed in the mask updating step, not entering an abnormal evaluation step, but re-entering the image feature extraction step until the output mask of the mask updating step is kept unchanged.
9. The method of claim 1, wherein the anomaly assessment step comprises:
for the image to be tested, comparing the final reconstructed image obtained in the mask updating and stopping decision step with the image to be tested, thereby calculating the following abnormal evaluation function:
Figure FDA0003370399610000035
wherein,
Figure FDA0003370399610000036
l representing a test image and a reconstructed image2Distance.
10. An image anomaly detection and anomaly localization system based on an auto-supervised mask, comprising:
a mask random generation module: randomly generating a mask with the size consistent with that of the model training image, and applying the mask on the image to remove information of a partial region of the image;
a mask initialization module: generating multi-scale initialization masks according to the tested images, and respectively applying the masks to the images to be tested to generate multi-scale images to be tested with partial image area information removed;
an image feature extraction module: extracting high-dimensional features of the image by using a depth convolution neural network for the image obtained by the mask random generation module or the mask initialization module;
an image reconstruction module: carrying out image reconstruction on the high-dimensional features of the image by using a depth convolution neural network to obtain a reconstructed training image or a reconstructed test image;
a reconstructed image alignment module: according to the reconstruction training image and the model training image, an image reconstruction loss function is used for realizing self-supervision learning;
a mask updating module: updating the multi-scale mask according to the reconstructed test image by using a mask updating algorithm, so that the mask is more concentrated on the abnormal part of the image;
a mask update termination decision module: judging whether the mask is consistent with the mask before updating according to the updated mask obtained by the mask updating module, if so, entering an abnormal evaluation module, and if not, acting the updated mask on the image to be tested and entering an image feature extraction module again;
an anomaly assessment module: and according to the result of the mask updating termination decision module, using an anomaly evaluation function to realize image anomaly evaluation.
CN202111397389.4A 2021-11-23 2021-11-23 Image anomaly detection and anomaly positioning method and system based on self-supervision mask Active CN114022475B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111397389.4A CN114022475B (en) 2021-11-23 2021-11-23 Image anomaly detection and anomaly positioning method and system based on self-supervision mask

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111397389.4A CN114022475B (en) 2021-11-23 2021-11-23 Image anomaly detection and anomaly positioning method and system based on self-supervision mask

Publications (2)

Publication Number Publication Date
CN114022475A true CN114022475A (en) 2022-02-08
CN114022475B CN114022475B (en) 2024-08-02

Family

ID=80066084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111397389.4A Active CN114022475B (en) 2021-11-23 2021-11-23 Image anomaly detection and anomaly positioning method and system based on self-supervision mask

Country Status (1)

Country Link
CN (1) CN114022475B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114565594A (en) * 2022-03-04 2022-05-31 西安电子科技大学 Image anomaly detection method based on soft mask contrast loss
CN114862814A (en) * 2022-05-18 2022-08-05 上海师范大学天华学院 Solar cell panel defect detection method and system, storage medium and terminal
CN116246114A (en) * 2023-03-14 2023-06-09 哈尔滨市科佳通用机电股份有限公司 Method and device for detecting pull ring falling image abnormality of self-supervision derailment automatic device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112651916A (en) * 2020-12-25 2021-04-13 上海交通大学 Method, system and medium for pre-training of self-monitoring model
WO2021191908A1 (en) * 2020-03-25 2021-09-30 Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. Deep learning-based anomaly detection in images
CN113658115A (en) * 2021-07-30 2021-11-16 华南理工大学 Image anomaly detection method for generating countermeasure network based on deep convolution

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021191908A1 (en) * 2020-03-25 2021-09-30 Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. Deep learning-based anomaly detection in images
CN112651916A (en) * 2020-12-25 2021-04-13 上海交通大学 Method, system and medium for pre-training of self-monitoring model
CN113658115A (en) * 2021-07-30 2021-11-16 华南理工大学 Image anomaly detection method for generating countermeasure network based on deep convolution

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHAOQIN HUANG等: "Self-Supervised Masking for Unsupervised Anomaly Detection and Localization", 《 IEEE TRANSACTIONS ON MULTIMEDIA》, 19 May 2022 (2022-05-19) *
陈辰;唐胜;李锦涛;: "动态生成掩膜弱监督语义分割", 中国图象图形学报, no. 06, 16 June 2020 (2020-06-16) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114565594A (en) * 2022-03-04 2022-05-31 西安电子科技大学 Image anomaly detection method based on soft mask contrast loss
CN114862814A (en) * 2022-05-18 2022-08-05 上海师范大学天华学院 Solar cell panel defect detection method and system, storage medium and terminal
CN116246114A (en) * 2023-03-14 2023-06-09 哈尔滨市科佳通用机电股份有限公司 Method and device for detecting pull ring falling image abnormality of self-supervision derailment automatic device
CN116246114B (en) * 2023-03-14 2023-10-10 哈尔滨市科佳通用机电股份有限公司 Method and device for detecting pull ring falling image abnormality of self-supervision derailment automatic device

Also Published As

Publication number Publication date
CN114022475B (en) 2024-08-02

Similar Documents

Publication Publication Date Title
CN111488789B (en) Pedestrian detection method and device for monitoring based on image analysis
CN114022475A (en) Image anomaly detection and anomaly positioning method and system based on self-supervision mask
CN108764292B (en) Deep learning image target mapping and positioning method based on weak supervision information
US11645744B2 (en) Inspection device and inspection method
US8379994B2 (en) Digital image analysis utilizing multiple human labels
CN111275660B (en) Flat panel display defect detection method and device
Zhao et al. Real‐time fabric defect detection based on multi‐scale convolutional neural network
CN109685830B (en) Target tracking method, device and equipment and computer storage medium
CN112906794A (en) Target detection method, device, storage medium and terminal
Shah et al. Abnormality detection using deep neural networks with robust quasi-norm autoencoding and semi-supervised learning
CN113065459B (en) Video instance segmentation method and system based on dynamic condition convolution
CN114266894A (en) Image segmentation method and device, electronic equipment and storage medium
CN115147426B (en) Model training and image segmentation method and system based on semi-supervised learning
CN112862799A (en) Image attribute recovery-based image anomaly detection method and system
Wang et al. Building correlations between filters in convolutional neural networks
CN116958131A (en) Image processing method, device, equipment and storage medium
Lee et al. Semi-supervised learning for simultaneous location detection and classification of mixed-type defect patterns in wafer bin maps
Shao et al. Generative image inpainting with salient prior and relative total variation
CN117134958B (en) Information processing method and system for network technology service
CN117173404A (en) Remote sensing target automatic detection and hiding method based on deep learning
CN116630286A (en) Method, device, equipment and storage medium for detecting and positioning image abnormality
CN115810152A (en) Remote sensing image change detection method and device based on graph convolution and computer equipment
CN114119970A (en) Target tracking method and device
CN114049361A (en) Self-supervision tumour segmentation system based on picture layer decomposition
CN114022458A (en) Skeleton detection method and device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant