WO2022095645A1 - 基于内存增强潜在空间自回归的图像异常检测方法 - Google Patents
基于内存增强潜在空间自回归的图像异常检测方法 Download PDFInfo
- Publication number
- WO2022095645A1 WO2022095645A1 PCT/CN2021/122056 CN2021122056W WO2022095645A1 WO 2022095645 A1 WO2022095645 A1 WO 2022095645A1 CN 2021122056 W CN2021122056 W CN 2021122056W WO 2022095645 A1 WO2022095645 A1 WO 2022095645A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- memory
- latent space
- image
- module
- autoregressive
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 45
- 230000006993 memory improvement Effects 0.000 title abstract 7
- 238000000034 method Methods 0.000 claims abstract description 35
- 238000012549 training Methods 0.000 claims abstract description 26
- 238000009826 distribution Methods 0.000 claims abstract description 22
- 238000012360 testing method Methods 0.000 claims abstract description 12
- 230000006870 function Effects 0.000 claims description 22
- 230000002159 abnormal effect Effects 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 14
- 230000004913 activation Effects 0.000 claims description 13
- 238000010606 normalization Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 5
- 238000011423 initialization method Methods 0.000 claims description 4
- 238000009827 uniform distribution Methods 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 abstract description 3
- 230000002547 anomalous effect Effects 0.000 abstract 3
- 238000010801 machine learning Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000006883 memory enhancing effect Effects 0.000 description 2
- 238000013450 outlier detection Methods 0.000 description 2
- 206010000117 Abnormal behaviour Diseases 0.000 description 1
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 241000269350 Anura Species 0.000 description 1
- 241000271566 Aves Species 0.000 description 1
- 241000282472 Canis lupus familiaris Species 0.000 description 1
- 241000282994 Cervidae Species 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/98—Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
- G06V10/7747—Organisation of the process, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/776—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/778—Active pattern-learning, e.g. online learning of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
Definitions
- the present application relates to the field of anomaly detection in computer vision, in particular to an image anomaly detection method based on memory-enhanced latent space autoregression.
- Anomaly detection also known as outlier detection, novelty detection, etc.
- outlier detection is the detection process of finding objects whose behavior differs greatly from the expected object. These detected objects are also called outliers or outliers.
- Anomaly detection has a wide range of applications in production and life, such as credit card anti-fraud, advertising click anti-cheating, network intrusion detection, etc.
- anomaly detection in computer vision satisfies the relevant definition of anomaly detection, and the input objects become images, videos and other information. For example, finding objects that do not conform to such pictures in a large number of pictures; detecting wrongly produced parts in industrial production; applying anomaly detection to surveillance videos, which can automatically analyze abnormal behaviors, objects, etc. appearing in surveillance videos. It is precisely because of the rapid development of computers and the rapid expansion of data that there is an urgent need for a technology that can analyze information such as images and videos.
- anomaly detection needs to manually analyze the data distribution, design appropriate features, and then use traditional machine learning algorithms (support vector machines, isolated forests, etc.) to model and analyze the data.
- machine learning algorithms support vector machines, isolated forests, etc.
- the anomaly detection methods in computer vision mainly include: methods based on reconstruction loss differences, methods based on classification learning, and methods based on density estimation.
- Method based on reconstruction loss difference This type of method often uses the characteristics of the data itself to reconstruct the input data through a deep auto-encoder, and the auto-encoder can remember the characteristics of normal samples, and judge whether the data is abnormal by reconstructing the difference. Samples (abnormal samples are usually not well reconstructed, and a threshold can be set to detect abnormal samples).
- This type of method is mainly used for outlier detection.
- a normal sample is often a set of data with label information.
- the probability that the data is a certain class is learned. .
- the probability value of normal samples in a certain category is very large, while the probability value of abnormal samples in all categories is very small because they do not belong to this distribution. This feature is used to distinguish whether the data is abnormal data or not.
- the existing anomaly detection methods are difficult to achieve good results due to the lack of clear supervision information (the abnormal data is difficult to collect, and the collection of normal data is time-consuming and labor-intensive, and it is difficult to obtain complete data).
- the model based on deep autoencoder lacks a good solution for problems such as large data distribution and large data variance.
- the present application provides an image anomaly detection method based on memory-enhanced latent space autoregression capable of better judging abnormal images.
- the image anomaly detection method based on memory-enhanced latent space autoregression can include the following steps:
- Step 1 Select the data set and divide the data set into training set and test set;
- Step 2 Build a network structure based on the memory-enhanced latent space autoregressive model
- Step 3 Preprocess the training set
- Step 4 initializing the memory-based enhanced latent space autoregressive model
- Step 5 Use the preprocessed training set to train an initialized memory-enhanced latent space autoregressive model
- Step 6 Verify the trained memory-based enhanced latent space autoregressive model through the test set, and use the trained memory-based enhanced latent space autoregressive model to determine whether the input image is an abnormal image.
- the data set may include MNIST data set and CIFAR10 data set.
- the memory-based latent space autoregressive model may include: an autoencoder, an autoregressive module and a memory module;
- the self-encoder can include an encoder and a decoder, and the self-encoder can compress the image into the latent space through the encoder, learn the feature expression, and then use the decoder to decode the feature expression of the latent space back into the image space;
- the autoregressive module can be configured to use the features of the latent space to model the data and fit the real distribution, and the fitting process is represented by the following formula:
- p(z) is the latent space distribution
- z ⁇ i ) is the conditional probability distribution
- d represents the dimension of the feature vector z
- z i represents the i-th dimension of the feature vector z
- z ⁇ i represents the feature vector z is less than the part of the i-th dimension
- the memory module can be configured to save the feature representation of the latent space, and then the feature representation that does not belong to the latent space will be forcibly converted into the most relevant feature representation in the memory by the memory module, and the process is:
- M is the memory module
- w represents the similarity between the latent space and each piece of memory
- m i represents the ith block of memory of the memory module
- wi represents the similarity between the feature vector z and mi
- N represents the memory module’s similarity.
- exp() represents the exponential function with base e
- is the modulo operation
- m j represents the jth block of memory of the memory module.
- step 2 in the network structure:
- the encoder network structure of the autoencoder can include a downsampling module, a downsampling module and a fully connected layer. Each block uses the structure of the residual network, and is cascaded by three consecutive convolutional layers + batch normalization + activation function structure composition;
- the decoder network structure of the auto-encoder can include a fully connected layer, an upsampling module, an upsampling module and a convolutional layer.
- Each block uses the structure of the residual network, and consists of three substructures: transposed convolutional layer + batch Standardization + activation function, convolution layer + batch normalization + activation function, transposed convolution layer + batch normalization + activation function structure cascade composition;
- the network structure of the autoregressive module can be composed of multiple autoregressive layers
- the autoregressive module z dist H(z) and acting on z, when
- the processing process of the picture by the self-encoder may include the following steps:
- the channel dimension of the upsampling module changes from 64 to 32 to 16;
- step 2 may specifically include the following steps:
- Step 201 select a training set
- Step 202 analyze training set information, the training set information includes image size, image intensity and image noise;
- Step 203 construct a network structure suitable for current data according to the obtained information
- Step 204 Assemble the autoencoder, the autoregressive module and the memory module together.
- step 3 may specifically include the following steps:
- Step 301 read image data
- Step 302 adjust the image size to a specific size
- Step 303 processing a certain amount of pictures that are different from the image space of the overall data, specifically: converting the grayscale space to the RGB space and converting the RGB space to the grayscale space;
- Step 304 perform a regularization operation on the image data.
- step 4 may specifically refer to: using different initialization methods to initialize the network, that is, using a random initialization method for the autoencoder and the autoregressive module, and using a uniform distribution initialization for the memory module.
- step 5 may specifically include the following steps:
- Step 501 loading the preprocessed data
- Step 502 setting the learning rate for the autoencoder, the autoregressive module and the memory module respectively;
- Step 503 the fixed memory module trains the autoregressive module
- Step 504 fixing the autoregressive module to train the memory module
- Step 505 iteratively perform steps 503 and 504 until the memory-based latent space autoregressive model converges.
- the loss function of the model may be:
- L rec represents the reconstruction loss of the original image and the reconstructed image, represents the negative log-likelihood loss
- ⁇ , ⁇ respectively represent the weight coefficient of the loss function, which is used to balance the proportion of different losses.
- ⁇ , ⁇ are different for different datasets. For MNIST and CIFAR10, ⁇ is equal to 1, 0.1, and ⁇ is equal to 0.0002, 0.0002, respectively.
- step 6 may specifically refer to: using the trained memory-based latent space autoregressive model, inputting the picture into the memory-based latent space autoregressive model, and obtaining the probability of the output of the autoregressive module and the reconstructed picture from the autoencoder.
- the reconstruction difference of the original image is regarded as two scores, and the two scores are added to obtain the final score, and whether it is an abnormal image is determined by the previously set threshold.
- the built and trained memory-enhanced latent space autoregressive model does not need to set a priori distribution so as not to destroy the distribution of the data itself, and can prevent model reconstruction from abnormality image, and finally can better judge abnormal images.
- FIG. 1 is a flowchart of an image anomaly detection method based on memory-enhanced latent space autoregression in an embodiment of the present application
- FIG. 2 is a schematic diagram of the network structure of the memory-based latent space autoregressive model in an embodiment of the present application
- FIG. 3 is a schematic diagram of an autoregressive module in an embodiment of the application.
- FIG. 4 is a schematic diagram of a memory module in an embodiment of the present application.
- FIG. 5 is a schematic diagram of an up-sampling module and a down-sampling module in an embodiment of the present application
- Figure 6 is a comparison table of model performance (AUC) on the MNIST dataset
- Figure 7 is a comparison table of model performance (AUC) on the CIFAR10 dataset.
- This embodiment proposes an image anomaly detection method based on memory-enhanced latent space autoregression, the flowchart of which is shown in Figure 1, wherein the method may include the following steps:
- two mainstream image anomaly detection datasets are selected for experiments, which may include MNIST and CIFAR10.
- the MNIST dataset is a handwritten dataset that is selected for many tasks. It contains a training set of 60,000 examples and a test set of 10,000 examples.
- the dataset can contain handwritten characters of numbers 0-9, a total of 10 categories, each picture All are grayscale images of size 28*28.
- the CIFAR10 dataset is a color image dataset that is closer to universal objects. It contains a total of 50,000 training data and 10,000 test data, and contains a total of 10 categories of color RGB images: airplanes, cars, birds, cats, deer, dogs. , frogs, horses, boats, trucks, each picture is a color image with a size of 32*32.
- the memory-based latent space autoregressive model in this embodiment may include three parts: an autoencoder, an autoregressive module, and a memory module, wherein:
- the autoencoder can include an encoder and a decoder.
- the autoencoder compresses the image into the latent space through the encoder, learns the feature representation, and then uses the decoder to decode the feature representation of the latent space back into the image space;
- the autoregressive module can be configured to use the features of the latent space to model the data and fit the true distribution.
- the fitting process is expressed by the following formula:
- p(z) is the latent space distribution
- z ⁇ i ) is the conditional probability distribution
- d represents the dimension of the feature vector z
- z i represents the i-th dimension of the feature vector z
- z ⁇ i represents the feature vector
- the memory module can be configured to save the feature representation of the latent space, and then the feature representation that does not belong to the latent space will be forcibly converted into the most relevant feature representation in the memory by the memory module.
- the process can be:
- M is the memory module
- w represents the similarity between the latent space and each piece of memory
- m i represents the ith block of memory of the memory module
- wi represents the similarity between the feature vector z and mi
- N represents the memory module’s similarity.
- exp() represents the exponential function with base e
- is the modulo operation
- m j represents the jth block of memory of the memory module.
- the memory module is used to store the distributed sparse feature representation, which strengthens the generation effect of the autoencoder, and limits the weight, which effectively prevents the model from being able to reconstruct abnormal images.
- the encoder network structure of the self-encoder can include a downsampling module, a downsampling module and a fully connected layer. Each block uses the structure of a residual network and is composed of three consecutive convolutional layers + batch normalization + activation function structure cascade.
- the decoder network structure of the self-encoder may include a fully connected layer, an upsampling module, an upsampling module and a convolutional layer, each block uses the structure of a residual network, and the three substructures can be divided into three substructures.
- the network structure of the autoregressive module is constructed using the structure shown in Figure 3.
- Figure 3 represents the operation process of an autoregressive layer. The number of input and output features remains unchanged, and the feature dimension will change.
- Each autoregressive layer is composed of Multiple mutation fully connected layers are implemented, the current features are generated using the features before the feature vector, and finally assembled into a feature vector.
- the autoregressive network is composed of multiple such autoregressive layers.
- the network structure of the memory module can be constructed using the structure shown in Figure 4.
- Figure 4 shows the reading mechanism of the memory module. First, an extra memory space is selected as the memory, and the size of each memory block is the same as the input size. Calculate the similarity between the input and each piece of memory, and then the similarity is subjected to a filtering operation (filtering out the similarity with a relatively small value), and the similarity is multiplied by each piece of memory and then added to obtain the output.
- the autoregressive module z dist H(z) and acting on z, when
- the processing process of the picture by the self-encoder may include the following steps:
- the channel dimension of the upsampling module changes from 64 to 32 to 16;
- the solution adopted here is to use the random initialization method for the autoencoder module and the autoregressive module.
- the random initialization process is to ensure that the network weight is as small as possible, and the deviation is carried out. Set to 0 to operate.
- N the size of the memory module
- feature_dim represents that the size of the information stored in each piece of memory is consistent with the latent space dimension.
- the image sizes of the input network are 28*28 and 32*32 respectively
- the feature_dim is set to 64
- the output dimension of the autoregressive module is 100
- the memory quantity is set to 100 and 500 respectively
- the Batch_Size size is both 256
- the learning rates were set to 0.0001 and 0.001, respectively, and the Adam optimizer was used for learning, and the total epoch was set to 100, and the learning rate was multiplied by 0.1 every 20 epochs.
- the memory module proposes to use a uniform distribution for initialization and set a separate learning rate, which effectively solves the problem that the memory module is difficult to train.
- L rec represents the reconstruction loss of the original image and the reconstructed image, represents the negative log-likelihood loss
- ⁇ , ⁇ respectively represent the weight coefficient of the loss function, which is used to balance the proportion of different losses.
- ⁇ , ⁇ are different for different datasets. For MNIST and CIFAR10, ⁇ is equal to 1, 0.1, and ⁇ is equal to 0.0002, 0.0002, respectively.
- S6 Verify the trained memory-based enhanced latent space autoregressive model through the test set, and use the trained memory-based enhanced latent space autoregressive model to determine whether the input image is an abnormal image.
- the area under the ROC curve, AUC is mainly used to evaluate the quality of the method.
- this indicator is calculated from the four elements of true positive (TruePositive, TP), false positive (FalsePositive, FP), false negative (FalsNegative, FN) and true negative (TrueNegative, TN) in the confusion matrix of the classification problem, where , the confusion matrix is shown in Table 1 below:
- the ROC curve consists of two coordinates, the abscissa FPR and the ordinate TPR.
- a curve can be drawn by adjusting different thresholds.
- AUC is the area of the lower part of the curve.
- the present embodiment outperforms existing methods on each class of the MNIST dataset.
- the method of this embodiment achieves a final avg score of 0.981, which is the best performance so far.
- the performance of this embodiment has been greatly improved on 4, 6, and 9 of the CIFAR10 dataset, and the final avg score reaches 0.673, which is the best performance at present.
- the present application provides an image anomaly detection method based on memory-enhanced latent space autoregression, which belongs to the field of anomaly detection in computer vision.
- the present application includes: selecting a training data set; constructing a network structure based on a memory-enhanced latent space autoregressive model; preprocessing the training data set; initializing the memory-enhancing latent space autoregressive model; training the memory-enhancing latent space autoregressive model autoregressive model; validate the model on the selected dataset, and use the trained model to determine whether the input image is an abnormal image.
- the present application does not need to set a priori distribution so as not to destroy the distribution of the data itself, and can prevent the model from reconstructing abnormal images, and finally can better determine abnormal images.
- the memory-augmented latent space autoregressive-based image anomaly detection method of the present application is reproducible and can be used in a variety of industrial applications.
- the image anomaly detection method based on memory-enhanced latent space autoregression of the present application can be used in applications that require image anomaly detection.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
Description
预测异常 | 预测正常 | |
实际异常 | 真阳性(TP) | 假阴性(FN) |
实际正常 | 假阳性(FP) | 真阴性(TN) |
Claims (11)
- 基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,包括如下步骤:步骤1、选择数据集,并将数据集划分成训练集和测试集;步骤2、构建基于内存增强潜在空间自回归模型的网络结构;步骤3、对训练集进行预处理;步骤4、初始化所述基于内存增强潜在空间自回归模型;步骤5、利用预处理后的训练集训练初始化后的基于内存增强潜在空间自回归模型;步骤6、通过测试集验证训练好的基于内存增强潜在空间自回归模型,并使用训练好的基于内存增强潜在空间自回归模型判断输入图像是否为异常图像。
- 根据权利要求1所述的基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,步骤1中,所述数据集包括MNIST数据集和CIFAR10数据集。
- 根据权利要求1或2所述的基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,步骤2中,所述基于内存增强潜在空间自回归模型包括:自编码器、自回归模块和内存模块;所述自编码器,包括编码器和解码器,自编码器通过编码器将图像压缩到潜在空间,学习到特征表达,然后使用解码器将潜在空间的特征表达解码回图像空间;所述自回归模块,用于利用潜在空间的特征对数据进行建模,拟合真实分布,其拟合过程通过如下公式表示:其中,p(z)为潜在空间分布,p(z i|z <i)为条件概率分布,d表示特征向量z的维度,z i表示特征向量z的第i维,z <i表示特征向量z小于第i维的部分;所述内存模块,用于保存潜在空间的特征表达,然后不属于该潜在空间的特征表达将被内存模块强行转换为内存中最相关的特征表达,其过程是:
- 根据权利要求1至3中任一项所述的基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,步骤2中,所述网络结构中:自编码器的编码器网络结构包括下采样模块、下采样模块和全连接层,每个块使用残差网络的结构,且由三个连续的卷积层+批标准化+激活函数结构级联组成;自编码器的解码器网络结构包括全连接层、上采样模块、上采样模块和卷积层,每个块使用残差网络的结构,且由三个子结构分别为转置卷积层+批标准化+激活函数、卷积层+批标准化+激活函数、转置卷积层+批标准化+激活函数结构级联组成;自回归模块网络结构是由多个自回归层组成;
- 根据权利要求4所述的基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,所述自编码器对图片的处理过程包括如下步骤:a、输入一张大小为N*N的图片,经过自编码器的编码阶段,经过一次下采样模块后基于内存增强潜在空间自回归模型的尺寸变小2倍,通道维度由1到32再到64,最后经过整平操作输入编码器中的全连接层,最终得到潜在空间z∈R 64,此时d、经过最后一个卷积层将特征还原到原有图像空间。
- 根据权利要求1至5中任一项所述的基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,步骤2具体包括以下步骤:步骤201、选择训练集;步骤202、分析训练集信息,所述训练集信息包括图像尺寸、图像强度和图像噪声;步骤203、根据所得到的信息构建适用于当前数据的网络结构;步骤204、将自编码器、自回归模块和内存模块组装在一起。
- 根据权利要求1至6中任一项所述的基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,步骤3具体包括以下步骤:步骤301、读取图像数据;步骤302、将图像尺寸调整到特定大小;步骤303、处理一定量与整体数据的图像空间不同的图片,具体为:灰度空间转RGB空间以及RGB空间转灰度空间;步骤304、对图像数据进行正则化操作。
- 根据权利要求1至7中任一项所述的基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,步骤4具体是指:使用不同的初始化方法对网络进行初始化,即:对自编码器和自回归模块使用随机初始化方式,对内存模块使用均匀分布初始化。
- 根据权利要求1至8中任一项所述的基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,步骤5具体包括以下步骤:步骤501、加载预处理之后的数据;步骤502、为自编码器、自回归模块和内存模块分别设置学习率;步骤503、固定内存模块训练自回归模块;步骤504、固定自回归模块训练内存模块;步骤505、迭代进行步骤503和步骤504直至基于内存增强潜在空间自回归模型收敛。
- 根据权利要求1至10中任一项所述的基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,步骤6具体是指:使用训练好的基于内存增强潜在空间自回归模型,将图片输入基于内存增强潜在空间自回归模型,获取自回归模块输出的概率和自编码器重建图片与原图片的重建差异,分别作为两个分数,将两个分数相加得到最终分数,通过先前设定的阈值判定是否是异常图像。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/618,162 US20230154177A1 (en) | 2020-11-04 | 2021-09-30 | Autoregression Image Abnormity Detection Method of Enhancing Latent Space Based on Memory |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011212882.XA CN112036513B (zh) | 2020-11-04 | 2020-11-04 | 基于内存增强潜在空间自回归的图像异常检测方法 |
CN202011212882.X | 2020-11-04 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022095645A1 true WO2022095645A1 (zh) | 2022-05-12 |
Family
ID=73573153
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/122056 WO2022095645A1 (zh) | 2020-11-04 | 2021-09-30 | 基于内存增强潜在空间自回归的图像异常检测方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20230154177A1 (zh) |
CN (1) | CN112036513B (zh) |
WO (1) | WO2022095645A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114998633A (zh) * | 2022-06-28 | 2022-09-02 | 河南大学 | 一种基于视图注意力驱动的多视图聚类方法 |
CN116736372A (zh) * | 2023-06-05 | 2023-09-12 | 成都理工大学 | 一种基于谱归一化生成对抗网络的地震插值方法及系统 |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112036513B (zh) * | 2020-11-04 | 2021-03-09 | 成都考拉悠然科技有限公司 | 基于内存增强潜在空间自回归的图像异常检测方法 |
CN112967251B (zh) * | 2021-03-03 | 2024-06-04 | 网易(杭州)网络有限公司 | 图片检测方法、图片检测模型的训练方法及装置 |
CN113222972B (zh) * | 2021-05-31 | 2024-03-19 | 辽宁工程技术大学 | 基于变分自编码器算法的图像异常检测方法 |
CN113658119B (zh) * | 2021-08-02 | 2024-06-18 | 上海影谱科技有限公司 | 一种基于vae的人脑损伤检测方法及装置 |
CN113985900B (zh) * | 2021-08-04 | 2023-09-08 | 铜陵有色金属集团股份有限公司金威铜业分公司 | 一种四旋翼无人机姿态动态特性模型、辨识方法及自适应柔化预测控制方法 |
CN115205650B (zh) * | 2022-09-15 | 2022-11-29 | 成都考拉悠然科技有限公司 | 基于多尺度标准化流的无监督异常定位与检测方法及装置 |
CN117077085B (zh) * | 2023-10-17 | 2024-02-09 | 中国科学技术大学 | 大模型结合双路记忆的多模态有害社交媒体内容识别方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018096310A1 (en) * | 2016-11-24 | 2018-05-31 | Oxford University Innovation Limited | Patient status monitor and method of monitoring patient status |
CN109697974A (zh) * | 2017-10-19 | 2019-04-30 | 百度(美国)有限责任公司 | 使用卷积序列学习的神经文本转语音的系统和方法 |
CN111708739A (zh) * | 2020-05-21 | 2020-09-25 | 北京奇艺世纪科技有限公司 | 时序数据的异常检测方法、装置、电子设备及存储介质 |
CN112036513A (zh) * | 2020-11-04 | 2020-12-04 | 成都考拉悠然科技有限公司 | 基于内存增强潜在空间自回归的图像异常检测方法 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7738683B2 (en) * | 2005-07-22 | 2010-06-15 | Carestream Health, Inc. | Abnormality detection in medical images |
US10685159B2 (en) * | 2018-06-27 | 2020-06-16 | Intel Corporation | Analog functional safety with anomaly detection |
US10878570B2 (en) * | 2018-07-17 | 2020-12-29 | International Business Machines Corporation | Knockout autoencoder for detecting anomalies in biomedical images |
CN109949278B (zh) * | 2019-03-06 | 2021-10-29 | 西安电子科技大学 | 基于对抗自编码网络的高光谱异常检测方法 |
CN110910982A (zh) * | 2019-11-04 | 2020-03-24 | 广州金域医学检验中心有限公司 | 自编码模型训练方法、装置、设备及存储介质 |
CN111104241A (zh) * | 2019-11-29 | 2020-05-05 | 苏州浪潮智能科技有限公司 | 基于自编码器的服务器内存异常检测方法、系统及设备 |
CN111598881B (zh) * | 2020-05-19 | 2022-07-12 | 西安电子科技大学 | 基于变分自编码器的图像异常检测方法 |
-
2020
- 2020-11-04 CN CN202011212882.XA patent/CN112036513B/zh active Active
-
2021
- 2021-09-30 WO PCT/CN2021/122056 patent/WO2022095645A1/zh active Application Filing
- 2021-09-30 US US17/618,162 patent/US20230154177A1/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018096310A1 (en) * | 2016-11-24 | 2018-05-31 | Oxford University Innovation Limited | Patient status monitor and method of monitoring patient status |
CN109697974A (zh) * | 2017-10-19 | 2019-04-30 | 百度(美国)有限责任公司 | 使用卷积序列学习的神经文本转语音的系统和方法 |
CN111708739A (zh) * | 2020-05-21 | 2020-09-25 | 北京奇艺世纪科技有限公司 | 时序数据的异常检测方法、装置、电子设备及存储介质 |
CN112036513A (zh) * | 2020-11-04 | 2020-12-04 | 成都考拉悠然科技有限公司 | 基于内存增强潜在空间自回归的图像异常检测方法 |
Non-Patent Citations (2)
Title |
---|
ABATI DAVIDE; PORRELLO ANGELO; CALDERARA SIMONE; CUCCHIARA RITA: "Latent Space Autoregression for Novelty Detection", 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, 15 June 2019 (2019-06-15), pages 481 - 490, XP033686628, DOI: 10.1109/CVPR.2019.00057 * |
GONG DONG; LIU LINGQIAO; LE VUONG; SAHA BUDHADITYA; MANSOUR MOUSSA REDA; VENKATESH SVETHA; VAN DEN HENGEL ANTON: "Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection", 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), IEEE, 27 October 2019 (2019-10-27), pages 1705 - 1714, XP033724029, DOI: 10.1109/ICCV.2019.00179 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114998633A (zh) * | 2022-06-28 | 2022-09-02 | 河南大学 | 一种基于视图注意力驱动的多视图聚类方法 |
CN116736372A (zh) * | 2023-06-05 | 2023-09-12 | 成都理工大学 | 一种基于谱归一化生成对抗网络的地震插值方法及系统 |
CN116736372B (zh) * | 2023-06-05 | 2024-01-26 | 成都理工大学 | 一种基于谱归一化生成对抗网络的地震插值方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
US20230154177A1 (en) | 2023-05-18 |
CN112036513B (zh) | 2021-03-09 |
CN112036513A (zh) | 2020-12-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022095645A1 (zh) | 基于内存增强潜在空间自回归的图像异常检测方法 | |
CN110443143B (zh) | 多分支卷积神经网络融合的遥感图像场景分类方法 | |
CN111598881B (zh) | 基于变分自编码器的图像异常检测方法 | |
CN111353373B (zh) | 一种相关对齐域适应故障诊断方法 | |
CN109993236B (zh) | 基于one-shot Siamese卷积神经网络的少样本满文匹配方法 | |
US11087452B2 (en) | False alarm reduction system for automatic manufacturing quality control | |
Graham et al. | Denoising diffusion models for out-of-distribution detection | |
CN110287777B (zh) | 一种自然场景下的金丝猴躯体分割算法 | |
CN111222457B (zh) | 一种基于深度可分离卷积的鉴别视频真伪性的检测方法 | |
CN109389166A (zh) | 基于局部结构保存的深度迁移嵌入聚类机器学习方法 | |
CN110321805B (zh) | 一种基于时序关系推理的动态表情识别方法 | |
CN116910752B (zh) | 一种基于大数据的恶意代码检测方法 | |
CN115526847A (zh) | 一种基于半监督学习的主板表面缺陷检测方法 | |
CN111371611B (zh) | 一种基于深度学习的加权网络社区发现方法及装置 | |
CN114897764A (zh) | 基于标准化通道注意力的肺结节假阳性排除方法及装置 | |
CN116206227B (zh) | 5g富媒体信息的图片审查系统、方法、电子设备及介质 | |
CN110705631B (zh) | 一种基于svm的散货船舶设备状态检测方法 | |
CN112949344B (zh) | 一种用于异常检测的特征自回归方法 | |
CN115862119A (zh) | 基于注意力机制的人脸年龄估计方法及装置 | |
CN111797732B (zh) | 一种对采样不敏感的视频动作识别对抗攻击方法 | |
CN110728615B (zh) | 基于序贯假设检验的隐写分析方法、终端设备及存储介质 | |
CN114529746B (zh) | 基于低秩子空间一致性的图像聚类方法 | |
Pacheco Reina | Convolutional neural network for distortion Classification in face images. | |
US20240013523A1 (en) | Model training method and model training system | |
CN111340111B (zh) | 基于小波核极限学习机识别人脸图像集方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21888342 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21888342 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21888342 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23.11.2023) |