WO2022095645A1 - 基于内存增强潜在空间自回归的图像异常检测方法 - Google Patents

基于内存增强潜在空间自回归的图像异常检测方法 Download PDF

Info

Publication number
WO2022095645A1
WO2022095645A1 PCT/CN2021/122056 CN2021122056W WO2022095645A1 WO 2022095645 A1 WO2022095645 A1 WO 2022095645A1 CN 2021122056 W CN2021122056 W CN 2021122056W WO 2022095645 A1 WO2022095645 A1 WO 2022095645A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
latent space
image
module
autoregressive
Prior art date
Application number
PCT/CN2021/122056
Other languages
English (en)
French (fr)
Inventor
徐行
王甜
沈复民
贾可
申恒涛
Original Assignee
成都考拉悠然科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 成都考拉悠然科技有限公司 filed Critical 成都考拉悠然科技有限公司
Priority to US17/618,162 priority Critical patent/US20230154177A1/en
Publication of WO2022095645A1 publication Critical patent/WO2022095645A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06V10/7747Organisation of the process, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects

Definitions

  • the present application relates to the field of anomaly detection in computer vision, in particular to an image anomaly detection method based on memory-enhanced latent space autoregression.
  • Anomaly detection also known as outlier detection, novelty detection, etc.
  • outlier detection is the detection process of finding objects whose behavior differs greatly from the expected object. These detected objects are also called outliers or outliers.
  • Anomaly detection has a wide range of applications in production and life, such as credit card anti-fraud, advertising click anti-cheating, network intrusion detection, etc.
  • anomaly detection in computer vision satisfies the relevant definition of anomaly detection, and the input objects become images, videos and other information. For example, finding objects that do not conform to such pictures in a large number of pictures; detecting wrongly produced parts in industrial production; applying anomaly detection to surveillance videos, which can automatically analyze abnormal behaviors, objects, etc. appearing in surveillance videos. It is precisely because of the rapid development of computers and the rapid expansion of data that there is an urgent need for a technology that can analyze information such as images and videos.
  • anomaly detection needs to manually analyze the data distribution, design appropriate features, and then use traditional machine learning algorithms (support vector machines, isolated forests, etc.) to model and analyze the data.
  • machine learning algorithms support vector machines, isolated forests, etc.
  • the anomaly detection methods in computer vision mainly include: methods based on reconstruction loss differences, methods based on classification learning, and methods based on density estimation.
  • Method based on reconstruction loss difference This type of method often uses the characteristics of the data itself to reconstruct the input data through a deep auto-encoder, and the auto-encoder can remember the characteristics of normal samples, and judge whether the data is abnormal by reconstructing the difference. Samples (abnormal samples are usually not well reconstructed, and a threshold can be set to detect abnormal samples).
  • This type of method is mainly used for outlier detection.
  • a normal sample is often a set of data with label information.
  • the probability that the data is a certain class is learned. .
  • the probability value of normal samples in a certain category is very large, while the probability value of abnormal samples in all categories is very small because they do not belong to this distribution. This feature is used to distinguish whether the data is abnormal data or not.
  • the existing anomaly detection methods are difficult to achieve good results due to the lack of clear supervision information (the abnormal data is difficult to collect, and the collection of normal data is time-consuming and labor-intensive, and it is difficult to obtain complete data).
  • the model based on deep autoencoder lacks a good solution for problems such as large data distribution and large data variance.
  • the present application provides an image anomaly detection method based on memory-enhanced latent space autoregression capable of better judging abnormal images.
  • the image anomaly detection method based on memory-enhanced latent space autoregression can include the following steps:
  • Step 1 Select the data set and divide the data set into training set and test set;
  • Step 2 Build a network structure based on the memory-enhanced latent space autoregressive model
  • Step 3 Preprocess the training set
  • Step 4 initializing the memory-based enhanced latent space autoregressive model
  • Step 5 Use the preprocessed training set to train an initialized memory-enhanced latent space autoregressive model
  • Step 6 Verify the trained memory-based enhanced latent space autoregressive model through the test set, and use the trained memory-based enhanced latent space autoregressive model to determine whether the input image is an abnormal image.
  • the data set may include MNIST data set and CIFAR10 data set.
  • the memory-based latent space autoregressive model may include: an autoencoder, an autoregressive module and a memory module;
  • the self-encoder can include an encoder and a decoder, and the self-encoder can compress the image into the latent space through the encoder, learn the feature expression, and then use the decoder to decode the feature expression of the latent space back into the image space;
  • the autoregressive module can be configured to use the features of the latent space to model the data and fit the real distribution, and the fitting process is represented by the following formula:
  • p(z) is the latent space distribution
  • z ⁇ i ) is the conditional probability distribution
  • d represents the dimension of the feature vector z
  • z i represents the i-th dimension of the feature vector z
  • z ⁇ i represents the feature vector z is less than the part of the i-th dimension
  • the memory module can be configured to save the feature representation of the latent space, and then the feature representation that does not belong to the latent space will be forcibly converted into the most relevant feature representation in the memory by the memory module, and the process is:
  • M is the memory module
  • w represents the similarity between the latent space and each piece of memory
  • m i represents the ith block of memory of the memory module
  • wi represents the similarity between the feature vector z and mi
  • N represents the memory module’s similarity.
  • exp() represents the exponential function with base e
  • is the modulo operation
  • m j represents the jth block of memory of the memory module.
  • step 2 in the network structure:
  • the encoder network structure of the autoencoder can include a downsampling module, a downsampling module and a fully connected layer. Each block uses the structure of the residual network, and is cascaded by three consecutive convolutional layers + batch normalization + activation function structure composition;
  • the decoder network structure of the auto-encoder can include a fully connected layer, an upsampling module, an upsampling module and a convolutional layer.
  • Each block uses the structure of the residual network, and consists of three substructures: transposed convolutional layer + batch Standardization + activation function, convolution layer + batch normalization + activation function, transposed convolution layer + batch normalization + activation function structure cascade composition;
  • the network structure of the autoregressive module can be composed of multiple autoregressive layers
  • the autoregressive module z dist H(z) and acting on z, when
  • the processing process of the picture by the self-encoder may include the following steps:
  • the channel dimension of the upsampling module changes from 64 to 32 to 16;
  • step 2 may specifically include the following steps:
  • Step 201 select a training set
  • Step 202 analyze training set information, the training set information includes image size, image intensity and image noise;
  • Step 203 construct a network structure suitable for current data according to the obtained information
  • Step 204 Assemble the autoencoder, the autoregressive module and the memory module together.
  • step 3 may specifically include the following steps:
  • Step 301 read image data
  • Step 302 adjust the image size to a specific size
  • Step 303 processing a certain amount of pictures that are different from the image space of the overall data, specifically: converting the grayscale space to the RGB space and converting the RGB space to the grayscale space;
  • Step 304 perform a regularization operation on the image data.
  • step 4 may specifically refer to: using different initialization methods to initialize the network, that is, using a random initialization method for the autoencoder and the autoregressive module, and using a uniform distribution initialization for the memory module.
  • step 5 may specifically include the following steps:
  • Step 501 loading the preprocessed data
  • Step 502 setting the learning rate for the autoencoder, the autoregressive module and the memory module respectively;
  • Step 503 the fixed memory module trains the autoregressive module
  • Step 504 fixing the autoregressive module to train the memory module
  • Step 505 iteratively perform steps 503 and 504 until the memory-based latent space autoregressive model converges.
  • the loss function of the model may be:
  • L rec represents the reconstruction loss of the original image and the reconstructed image, represents the negative log-likelihood loss
  • ⁇ , ⁇ respectively represent the weight coefficient of the loss function, which is used to balance the proportion of different losses.
  • ⁇ , ⁇ are different for different datasets. For MNIST and CIFAR10, ⁇ is equal to 1, 0.1, and ⁇ is equal to 0.0002, 0.0002, respectively.
  • step 6 may specifically refer to: using the trained memory-based latent space autoregressive model, inputting the picture into the memory-based latent space autoregressive model, and obtaining the probability of the output of the autoregressive module and the reconstructed picture from the autoencoder.
  • the reconstruction difference of the original image is regarded as two scores, and the two scores are added to obtain the final score, and whether it is an abnormal image is determined by the previously set threshold.
  • the built and trained memory-enhanced latent space autoregressive model does not need to set a priori distribution so as not to destroy the distribution of the data itself, and can prevent model reconstruction from abnormality image, and finally can better judge abnormal images.
  • FIG. 1 is a flowchart of an image anomaly detection method based on memory-enhanced latent space autoregression in an embodiment of the present application
  • FIG. 2 is a schematic diagram of the network structure of the memory-based latent space autoregressive model in an embodiment of the present application
  • FIG. 3 is a schematic diagram of an autoregressive module in an embodiment of the application.
  • FIG. 4 is a schematic diagram of a memory module in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an up-sampling module and a down-sampling module in an embodiment of the present application
  • Figure 6 is a comparison table of model performance (AUC) on the MNIST dataset
  • Figure 7 is a comparison table of model performance (AUC) on the CIFAR10 dataset.
  • This embodiment proposes an image anomaly detection method based on memory-enhanced latent space autoregression, the flowchart of which is shown in Figure 1, wherein the method may include the following steps:
  • two mainstream image anomaly detection datasets are selected for experiments, which may include MNIST and CIFAR10.
  • the MNIST dataset is a handwritten dataset that is selected for many tasks. It contains a training set of 60,000 examples and a test set of 10,000 examples.
  • the dataset can contain handwritten characters of numbers 0-9, a total of 10 categories, each picture All are grayscale images of size 28*28.
  • the CIFAR10 dataset is a color image dataset that is closer to universal objects. It contains a total of 50,000 training data and 10,000 test data, and contains a total of 10 categories of color RGB images: airplanes, cars, birds, cats, deer, dogs. , frogs, horses, boats, trucks, each picture is a color image with a size of 32*32.
  • the memory-based latent space autoregressive model in this embodiment may include three parts: an autoencoder, an autoregressive module, and a memory module, wherein:
  • the autoencoder can include an encoder and a decoder.
  • the autoencoder compresses the image into the latent space through the encoder, learns the feature representation, and then uses the decoder to decode the feature representation of the latent space back into the image space;
  • the autoregressive module can be configured to use the features of the latent space to model the data and fit the true distribution.
  • the fitting process is expressed by the following formula:
  • p(z) is the latent space distribution
  • z ⁇ i ) is the conditional probability distribution
  • d represents the dimension of the feature vector z
  • z i represents the i-th dimension of the feature vector z
  • z ⁇ i represents the feature vector
  • the memory module can be configured to save the feature representation of the latent space, and then the feature representation that does not belong to the latent space will be forcibly converted into the most relevant feature representation in the memory by the memory module.
  • the process can be:
  • M is the memory module
  • w represents the similarity between the latent space and each piece of memory
  • m i represents the ith block of memory of the memory module
  • wi represents the similarity between the feature vector z and mi
  • N represents the memory module’s similarity.
  • exp() represents the exponential function with base e
  • is the modulo operation
  • m j represents the jth block of memory of the memory module.
  • the memory module is used to store the distributed sparse feature representation, which strengthens the generation effect of the autoencoder, and limits the weight, which effectively prevents the model from being able to reconstruct abnormal images.
  • the encoder network structure of the self-encoder can include a downsampling module, a downsampling module and a fully connected layer. Each block uses the structure of a residual network and is composed of three consecutive convolutional layers + batch normalization + activation function structure cascade.
  • the decoder network structure of the self-encoder may include a fully connected layer, an upsampling module, an upsampling module and a convolutional layer, each block uses the structure of a residual network, and the three substructures can be divided into three substructures.
  • the network structure of the autoregressive module is constructed using the structure shown in Figure 3.
  • Figure 3 represents the operation process of an autoregressive layer. The number of input and output features remains unchanged, and the feature dimension will change.
  • Each autoregressive layer is composed of Multiple mutation fully connected layers are implemented, the current features are generated using the features before the feature vector, and finally assembled into a feature vector.
  • the autoregressive network is composed of multiple such autoregressive layers.
  • the network structure of the memory module can be constructed using the structure shown in Figure 4.
  • Figure 4 shows the reading mechanism of the memory module. First, an extra memory space is selected as the memory, and the size of each memory block is the same as the input size. Calculate the similarity between the input and each piece of memory, and then the similarity is subjected to a filtering operation (filtering out the similarity with a relatively small value), and the similarity is multiplied by each piece of memory and then added to obtain the output.
  • the autoregressive module z dist H(z) and acting on z, when
  • the processing process of the picture by the self-encoder may include the following steps:
  • the channel dimension of the upsampling module changes from 64 to 32 to 16;
  • the solution adopted here is to use the random initialization method for the autoencoder module and the autoregressive module.
  • the random initialization process is to ensure that the network weight is as small as possible, and the deviation is carried out. Set to 0 to operate.
  • N the size of the memory module
  • feature_dim represents that the size of the information stored in each piece of memory is consistent with the latent space dimension.
  • the image sizes of the input network are 28*28 and 32*32 respectively
  • the feature_dim is set to 64
  • the output dimension of the autoregressive module is 100
  • the memory quantity is set to 100 and 500 respectively
  • the Batch_Size size is both 256
  • the learning rates were set to 0.0001 and 0.001, respectively, and the Adam optimizer was used for learning, and the total epoch was set to 100, and the learning rate was multiplied by 0.1 every 20 epochs.
  • the memory module proposes to use a uniform distribution for initialization and set a separate learning rate, which effectively solves the problem that the memory module is difficult to train.
  • L rec represents the reconstruction loss of the original image and the reconstructed image, represents the negative log-likelihood loss
  • ⁇ , ⁇ respectively represent the weight coefficient of the loss function, which is used to balance the proportion of different losses.
  • ⁇ , ⁇ are different for different datasets. For MNIST and CIFAR10, ⁇ is equal to 1, 0.1, and ⁇ is equal to 0.0002, 0.0002, respectively.
  • S6 Verify the trained memory-based enhanced latent space autoregressive model through the test set, and use the trained memory-based enhanced latent space autoregressive model to determine whether the input image is an abnormal image.
  • the area under the ROC curve, AUC is mainly used to evaluate the quality of the method.
  • this indicator is calculated from the four elements of true positive (TruePositive, TP), false positive (FalsePositive, FP), false negative (FalsNegative, FN) and true negative (TrueNegative, TN) in the confusion matrix of the classification problem, where , the confusion matrix is shown in Table 1 below:
  • the ROC curve consists of two coordinates, the abscissa FPR and the ordinate TPR.
  • a curve can be drawn by adjusting different thresholds.
  • AUC is the area of the lower part of the curve.
  • the present embodiment outperforms existing methods on each class of the MNIST dataset.
  • the method of this embodiment achieves a final avg score of 0.981, which is the best performance so far.
  • the performance of this embodiment has been greatly improved on 4, 6, and 9 of the CIFAR10 dataset, and the final avg score reaches 0.673, which is the best performance at present.
  • the present application provides an image anomaly detection method based on memory-enhanced latent space autoregression, which belongs to the field of anomaly detection in computer vision.
  • the present application includes: selecting a training data set; constructing a network structure based on a memory-enhanced latent space autoregressive model; preprocessing the training data set; initializing the memory-enhancing latent space autoregressive model; training the memory-enhancing latent space autoregressive model autoregressive model; validate the model on the selected dataset, and use the trained model to determine whether the input image is an abnormal image.
  • the present application does not need to set a priori distribution so as not to destroy the distribution of the data itself, and can prevent the model from reconstructing abnormal images, and finally can better determine abnormal images.
  • the memory-augmented latent space autoregressive-based image anomaly detection method of the present application is reproducible and can be used in a variety of industrial applications.
  • the image anomaly detection method based on memory-enhanced latent space autoregression of the present application can be used in applications that require image anomaly detection.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

一种基于内存增强潜在空间自回归的图像异常检测方法,属于计算机视觉中的异常检测领域。该方法包括:选择数据集,并将数据集划分为训练集和测试集(S1);构建基于内存增强潜在空间自回归模型的网络结构(S2);对训练数据集进行预处理(S3);初始化基于内存增强潜在空间自回归模型(S4);利用预处理后的训练集训练初始化后的基于内存增强潜在空间自回归模型(S5);通过测试集验证训练好的基于内存增强潜在空间自回归模型,并使用训练好的基于内存增强潜在空间自回归模型判断输入图像是否为异常图像(S6)。该方法不需设置先验分布从而不会破坏数据本身的分布,并且能够阻止模型重建异常图片,最终能够更好地判断出异常图像。

Description

基于内存增强潜在空间自回归的图像异常检测方法
相关申请的交叉引用
本申请要求于2020年11月04日提交中国国家知识产权局的申请号为202011212882.X、名称为“基于内存增强潜在空间自回归的图像异常检测方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机视觉中的异常检测领域,尤其涉及一种基于内存增强潜在空间自回归的图像异常检测方法。
背景技术
异常检测,又称为离群点检测、新颖检测等,是找出与预期对象的行为差异较大的对象的检测过程。这些被检测出的对象又称为异常点或者离群点。异常检测在生产生活中有着广泛的应用,比如信用卡反欺诈、广告点击反作弊,网络入侵检测等。
随着最近几年深度学习的崛起,异常检测用于计算机视觉方面的研究开始火热。计算机视觉中的异常检测满足异常检测的相关定义,输入对象变为图像,视频等信息。比如,在一大堆图片中找出不符合这类图片的对象;在工业生产中检测错误生产的零件;将异常检测应用于监控视频,可以自动分析监控视频中出现的异常行为、对象等。正是由于计算机的火热发展和数据的急速膨胀,急迫需要一种技术能够对图像和视频等信息进行分析。
随着机器学习特别是深度学习技术的发展,基于机器学习的图像异常检测技术不断涌现。相较于传统异常检测,针对图像需要提取更加紧凑的信息表达。传统机器学习阶段,异常检测需要人工手动的分析数据分布,设计合适的特征,然后使用传统机器学习算法(支持向量机、孤立森林等)对数据进行建模分析。相较于传统机器学习,深度学习可以自动学习到数据的特征然后在针对特征进行建模分析,具有更高的鲁棒性。
目前,计算机视觉中的异常检测方法主要有:基于重建损失差异的方法、基于分类学习的方法和基于密度估计的方法。
1)基于重建损失差异的方法:这类方法往往是利用数据本身特征通过深度自编码器对输入数据进行重建,利用自编码器能够记住正常样本的特性,通过重建差异来评判数据是否是异常样本(异常样本通常不能很好的重建,设定一个阈值就可以检测出异常样本)。
2)基于分类学习的方法:这类方法主要针对用于离群点检测,正常样本往往是一组具有标签信息的数据,通过对这组数据应用分类算法,学习到数据是某一类的概率。在测试阶段,正常样本在某一类的概率值很大,而异常样本由于不属于这个分布,在所有类别概 率值都很小,以此特性来区分数据是否是异常数据。
3)基于密度估计的方法:这类方法往往是针对一大堆数据中存在一小部分异常样本,通过传统机器学习或者深度学习提取特征的方法,通过应用密度估计方法对数据进行建模,异常数据往往就位于概率值较低的部分。
当然也有许多算法是上述算法的变种和组合来达到异常检测的效果,包括自编码器与生成对抗网络的结合、自编码器与密度估计方法的结合等。
但是,现有的异常检测方法由于缺乏明确的监督信息(异常数据难以收集,正常数据的收集费时费力难以获取完整的数据),导致异常检测方法难以达到好的效果。尤其是基于深度自编码器的模型,针对数据分布较大,数据方差大等问题缺乏良好的解决方法。
发明内容
本申请提供了一种能够更好地判断出异常图像的基于内存增强潜在空间自回归的图像异常检测方法。
本申请解决其技术问题,采用的技术方案是:
基于内存增强潜在空间自回归的图像异常检测方法,可以包括如下步骤:
步骤1、选择数据集,并将数据集划分成训练集和测试集;
步骤2、构建基于内存增强潜在空间自回归模型的网络结构;
步骤3、对训练集进行预处理;
步骤4、初始化所述基于内存增强潜在空间自回归模型;
步骤5、利用预处理后的训练集训练初始化后的基于内存增强潜在空间自回归模型;
步骤6、通过测试集验证训练好的基于内存增强潜在空间自回归模型,并使用训练好的基于内存增强潜在空间自回归模型判断输入图像是否为异常图像。
可选地,步骤1中,所述数据集可以包括MNIST数据集和CIFAR10数据集。
可选地,步骤2中,所述基于内存增强潜在空间自回归模型可以包括:自编码器、自回归模块和内存模块;
所述自编码器,可以包括编码器和解码器,自编码器可以通过编码器将图像压缩到潜在空间,学习到特征表达,然后可以使用解码器将潜在空间的特征表达解码回图像空间;
所述自回归模块,可以被配置成用于利用潜在空间的特征对数据进行建模,拟合真实分布,其拟合过程通过如下公式表示:
Figure PCTCN2021122056-appb-000001
其中,p(z)为潜在空间分布,p(z i|z <i)为条件概率分布,d表示特征向量z的维度,z i表示特征向量z的第i维,z <i表示特征向量z小于第i维的部分;
所述内存模块,可以被配置成用于保存潜在空间的特征表达,然后不属于该潜在空间的特征表达将被内存模块强行转换为内存中最相关的特征表达,其过程是:
Figure PCTCN2021122056-appb-000002
其中,M为内存模块,
Figure PCTCN2021122056-appb-000003
为内存模块对该特征的表达,w表示潜在空间与每一块内存的相似度,m i表示内存模块的第i块内存,w i表示特征向量z与m i的相似性,N表示内存模块的大小,
Figure PCTCN2021122056-appb-000004
其中,exp()表示以e为底的指数函数,||·||为取模运算,
Figure PCTCN2021122056-appb-000005
为m i的转置,m j表示内存模块的第j块内存。
可选地,步骤2中,所述网络结构中:
自编码器的编码器网络结构可以包括下采样模块、下采样模块和全连接层,每个块使用残差网络的结构,且由三个连续的卷积层+批标准化+激活函数结构级联组成;
自编码器的解码器网络结构可以包括全连接层、上采样模块、上采样模块和卷积层,每个块使用残差网络的结构,且由三个子结构分别为转置卷积层+批标准化+激活函数、卷积层+批标准化+激活函数、转置卷积层+批标准化+激活函数结构级联组成;
自回归模块网络结构可以是由多个自回归层组成;
其中,自编码器中编码器表达为数学模式为:z=en(X),解码器表达为数学 模式为:
Figure PCTCN2021122056-appb-000006
自回归模块z dist=H(z)和
Figure PCTCN2021122056-appb-000007
作用于z,此时
Figure PCTCN2021122056-appb-000008
可选地,所述自编码器对图片的处理过程可以包括如下步骤:
a、输入一张大小为N*N的图片,经过自编码器的编码阶段,经过一次下采样模块后基于内存增强潜在空间自回归模型的尺寸变小2倍,通道维度由1到32再到64,最后经过整平操作输入编码器中的全连接层,最终得到潜在空间z∈R 64,此时
Figure PCTCN2021122056-appb-000009
b、将z送入内存模块,取得z与每一块内存的相似度w,将w经过一次
Figure PCTCN2021122056-appb-000010
操作后得到
Figure PCTCN2021122056-appb-000011
最后通过内存表示
Figure PCTCN2021122056-appb-000012
c、将
Figure PCTCN2021122056-appb-000013
经过解码器的全连接层得到大小为
Figure PCTCN2021122056-appb-000014
的特征,最终经过两次上采样模块还原为原有尺寸,上采样模块的通道维度变化为64到32再到16;
d、经过最后一个卷积层将特征还原到原有图像空间。
可选地,步骤2具体可以包括以下步骤:
步骤201、选择训练集;
步骤202、分析训练集信息,所述训练集信息包括图像尺寸、图像强度和图像噪声;
步骤203、根据所得到的信息构建适用于当前数据的网络结构;
步骤204、将自编码器、自回归模块和内存模块组装在一起。
可选地,步骤3具体可以包括以下步骤:
步骤301、读取图像数据;
步骤302、将图像尺寸调整到特定大小;
步骤303、处理一定量与整体数据的图像空间不同的图片,具体为:灰度空间转RGB空间以及RGB空间转灰度空间;
步骤304、对图像数据进行正则化操作。
可选地,步骤4具体可以是指:使用不同的初始化方法对网络进行初始化,即:对自编码器和自回归模块使用随机初始化方式,对内存模块使用均匀分布初始化。
可选地,步骤5具体可以包括以下步骤:
步骤501、加载预处理之后的数据;
步骤502、为自编码器、自回归模块和内存模块分别设置学习率;
步骤503、固定内存模块训练自回归模块;
步骤504、固定自回归模块训练内存模块;
步骤505、迭代进行步骤503和步骤504直至基于内存增强潜在空间自回归模型收敛。
可选地,在步骤5中,所述模型的损失函数可以为:
L=L rec+αL llk+βL mem
其中,L rec表示原图片与重建图片的重建损失,
Figure PCTCN2021122056-appb-000015
表示负对数似然损失,
Figure PCTCN2021122056-appb-000016
表示特征与内存模块的权重系数的熵,α,β分别表示损失函数的权重系数,用以平衡不同损失的比例。对于不同的数据集,α,β有所不同。对于MNIST和CIFAR10,α分别等于1、0.1,β分别等于0.0002、0.0002。
可选地,步骤6可以具体是指:使用训练好的基于内存增强潜在空间自回归模型,将图片输入基于内存增强潜在空间自回归模型,获取自回归模块输出的概率和自编码器重建图片与原图片的重建差异,分别作为两个分数,将两个分数相加得到最终分数,通过先前设定的阈值判定是否是异常图像。
本申请的有益效果至少是:
通过上述基于内存增强潜在空间自回归的图像异常检测方法,通过构建并训练的基于内存增强潜在空间自回归模型,不需设置先验分布从而不会破坏数据本身的分布,并且能够阻止模型重建异常图片,最终能够更好地判断出异常图像。
附图说明
图1为本申请实施例中基于内存增强潜在空间自回归的图像异常检测方法的流程图;
图2为本申请实施例中基于内存增强潜在空间自回归模型的网络结构示意图;
图3为本申请实施例中自回归模块的示意图;
图4为本申请实施例中内存模块的示意图;
图5为本申请实施例中上、下采样模块的示意图;
图6为在MNIST数据集上模型性能(AUC)对比表;
图7为在CIFAR10数据集上模型性能(AUC)对比表。
具体实施方式
下面结合附图及实施例,详细描述本申请的技术方案。
本实施例提出了一种基于内存增强潜在空间自回归的图像异常检测方法,其流程图见如1,其中,该方法可以包括如下步骤:
S1:选择数据集,并将数据集划分成训练集和测试集
本实施例选取了两个主流的图像异常检测数据集进行实验,可以包括MNIST、CIFAR10。
MNIST数据集是很多任务都会选择的一个手写数据集,共包含60000个实例的训练集和10000个示例的测试集,该数据集可以包含数字0-9的手写字符,共10类,每张图片都是大小为28*28的灰度图像。
CIFAR10数据集是一个更接近普适物体的彩色图像数据集,共包含50000张训练数据和10000张测试数据,一共包含10个类别的彩色RGB图像:飞机、汽车、鸟类、猫、鹿、狗、蛙类、马、船、卡车,每张图片都是大小为32*32的彩色图像。
选择以上两个数据集是为了验证模型对于不同类型数据集的适应性和鲁棒性,MNIST和CIFAR10均包含10个分类,大多数实验都会选择这两个数据集,10个分类可以很好地适应异常检测的背景设置,并且满足数据多样的特点。
S2:构建基于内存增强潜在空间自回归模型的网络结构
如图2、3、4、5所示,本实施例中基于内存增强潜在空间自回归模型可以包含三个部分:自编码器、自回归模块和内存模块,其中:
自编码器,可以包括编码器和解码器,自编码器通过编码器将图像压缩到潜在空间,学习到特征表达,然后使用解码器将潜在空间的特征表达解码回图像空间;
自回归模块,可以被配置成用于利用潜在空间的特征对数据进行建模,拟合真实分布,其拟合过程通过如下公式表示:
Figure PCTCN2021122056-appb-000017
其中,p(z)为潜在空间分布,p(z i|z <i)为条件概率分布,d表示特征向量z的维度,z i表示特征向量z的第i维,z <i表示特征向量z小于第i维的部分;这里,使用自回归模块学习数据的分布,不会像变分自编码器和对抗自编码器为数据设置一个先验分布,设置先验分布会损坏数据本身的分布,使用自回归模块可以有效避免这个问题。
内存模块,可以被配置成用于保存潜在空间的特征表达,然后不属于该潜在空间的特征表达将被内存模块强行转换为内存中最相关的特征表达,其过程可以是:
Figure PCTCN2021122056-appb-000018
其中,M为内存模块,
Figure PCTCN2021122056-appb-000019
为内存模块对该特征的表达,w表示潜在空间与每一块内存的相似度,m i表示内存模块的第i块内存,w i表示特征向量z与m i的相似性,N表示内存模块的大小,
Figure PCTCN2021122056-appb-000020
其中,exp()表示以e为底的指数函数,||·||为取模运算,
Figure PCTCN2021122056-appb-000021
为m i的转置,m j表示内存模块的第j块内存。
这里,使用内存模块去存储分布的稀疏特征表达,加强了自编码器的生成效果,并且限制权重,有效阻止了模型能够重建异常图片的问题。
参见图5,其为上、下采样模块的示意图,其中,Conv2d代表卷积层,Bn代表批标准 化,ReLu代表激活函数,DeConv代表转置卷积层,这里,自编码器的编码器网络结构可以包括下采样模块、下采样模块和全连接层,每个块使用残差网络的结构,且由三个连续的卷积层+批标准化+激活函数结构级联组成。
本实施例中,自编码器的解码器网络结构可以包括全连接层、上采样模块、上采样模块和卷积层,每个块使用残差网络的结构,且由三个子结构可以分别为转置卷积层+批标准化+激活函数、卷积层+批标准化+激活函数、转置卷积层+批标准化+激活函数结构级联组成;
自回归模块网络结构使用如图3所示结构进行构建,图3代表一个自回归层的运行过程,输入与输出的特征数量保持不变,特征维度会发生改变,每一个自回归层都是由多个变异全连接层实现,使用特征向量之前的特征生成当前的特征,最后组装为一个特征向量,自回归网络是由多个这样的自回归层组成。
内存模块网络结构可以使用如图4所示结构进行构造,图4表明了内存模块的读取机制,首先,选取一块额外的内存空间作为内存,其每一块内存的大小与输入的大小一致,先用输入与每一块内存计算其相似度,然后相似度经过一次过滤操作(过滤掉值比较小的相似度),使用相似度和每一块内存相乘之后再相加,得到输出。
需要指出的是,自编码器中编码器表达为数学模式为:z=en(X),解码器表达为数学模式为:
Figure PCTCN2021122056-appb-000022
自回归模块z dist=H(z)和
Figure PCTCN2021122056-appb-000023
作用于z,此时
Figure PCTCN2021122056-appb-000024
具体应用过程中,自编码器对图片的处理过程可以包括如下步骤:
a、输入一张大小为N*N的图片,经过自编码器的编码阶段,经过一次下采样模块后基于内存增强潜在空间自回归模型的尺寸变小2倍,通道维度由1到32再到64,最后经过整平操作输入编码器中的全连接层,最终得到潜在空间z∈R 64,此时
Figure PCTCN2021122056-appb-000025
b、将z送入内存模块,取得z与每一块内存的相似度w,将w经过一次
Figure PCTCN2021122056-appb-000026
操作后得到
Figure PCTCN2021122056-appb-000027
最后通过内存表示
Figure PCTCN2021122056-appb-000028
c、将
Figure PCTCN2021122056-appb-000029
经过解码器的全连接层得到大小为
Figure PCTCN2021122056-appb-000030
的特征,最终经过两次上采样模块还原为原有尺寸,上采样模块的通道维度变化为64到32再到16;
d、经过最后一个卷积层将特征还原到原有图像空间。
S3:对训练集进行预处理
在模型训练的过程中,需要将所有图片尺寸调整为N*N,转换到对应的图像空间,根据数据的需要可以适当使用随机旋转,翻转,噪声等操作。
S4:初始化基于内存增强潜在空间自回归模型
由于模型初始化可以有效帮助网络进行训练和收敛,在这里采取的方案是针对自编码器模块和自回归模块使用随机初始化的方法,随机初始化的过程是尽可能保证网络权重很小,并且对偏差进行置0操作。
针对内存模块M∈R N*feature_dim,其中N表示内存模块的大小,feature_dim表示每一块内存保存的信息大小与潜在空间维度保持一致,对
Figure PCTCN2021122056-appb-000031
使用均匀分布π~U(0,1)进行feature_dim对初始化操作,即:任意n属于N,就内存中的每一个小块进行初始化。
S5:利用预处理后的训练集训练初始化后的基于内存增强潜在空间自回归模型
在训练过程中,由于主要使用MNIST和CIFAR10两个数据集。
这里,输入网络的图片大小分别为28*28和32*32,feature_dim都设置为64,自回归模块的输出维度都是100,内存数量分别设置为100和500,并且对Batch_Size大小都是256,学习率分别设定为0.0001和0.001,使用Adam优化器进行学习,总的epoch设置为100,每隔20个epoch学习率乘以0.1。这里,内存模块提出了使用均匀分布进行初始化,设置单独学习率,有效解决了内存模块难以训练的问题。
另外,模型的损失函数如下:
L=L rec+αL llk+βL mem
其中,L rec表示原图片与重建图片的重建损失,
Figure PCTCN2021122056-appb-000032
表示负对数似然损失,
Figure PCTCN2021122056-appb-000033
表示特征与内存模块的权重系数的熵,α,β分别表示损失函数的权重系数,用以平衡不同损失的比例。对于不同的数据集,α,β有所不同。对于MNIST和CIFAR10,α分别等于1、0.1,β分别等于0.0002、0.0002。
S6:通过测试集验证训练好的基于内存增强潜在空间自回归模型,并使用训练好的基于内存增强潜在空间自回归模型判断输入图像是否为异常图像。
本实施例主要是采用ROC曲线下的面积AUC来评价方法的优劣。通常这个指标由分类问题的混淆矩阵中的真阳性(TruePositive,TP)、假阳性(FalsePositive,FP)、假阴性(FalsNegative,FN)和真阴性(TrueNegative,TN)四个元素计算而来,其中,混淆矩阵如下表1所示:
表1
  预测异常 预测正常
实际异常 真阳性(TP) 假阴性(FN)
实际正常 假阳性(FP) 真阴性(TN)
另外,计算如下公式:
Figure PCTCN2021122056-appb-000034
Figure PCTCN2021122056-appb-000035
Figure PCTCN2021122056-appb-000036
ROC曲线由两个坐标组成,横坐标FPR,纵坐标TPR,通过调整不同的阈值可以画出一条曲线,AUC就是该曲线下面部分的面积大小。
另外,可以分别在MNIST和CIFAR10两个数据集测试了模型的性能,与当前流行的方法相比都达到了良好的性能。测试比较结果如图6、7所示,其中,图6为在MNIST数据集上模型性能(AUC)对比表,图7为在CIFAR10数据集上模型性能(AUC)对比表:
从图6可以看出,本实施例在MNIST数据集的各个类上均优于现有方法。本实施例的方法在最终的avg分数达到了0.981,该分数是目前最好的性能。从图7可以看出,本实施例在CIFAR10数据集的4、6、9上性能得到了巨大的提升,并且在最终的avg分数达到了0.673,该分数是目前最好的性能。这证明了本申请提出的基于内存增强潜在空间自回归模型能够有效应用于图像的异常检测,能极大地弥补当前方法所存在的缺点。
工业实用性
本申请提供了一种基于内存增强潜在空间自回归的图像异常检测方法,属于计算机视觉中的异常检测领域。本申请包括:选择训练数据集;构建基于内存增强潜在空间自回归模型的网络结构;对训练数据集进行预处理;初始化所述基于内存增强潜在空间自回归模型;训练所述基于内存增强潜在空间自回归模型;在选取数据集上验证模型,并使用训练好的模型判断输入图像是否为异常图像。本申请不需设置先验分布从而不会破坏数据本身的分布,并且能够阻止模型重建异常图片,最终能够更好地判断出异常图像。
此外,可以理解的是,本申请的基于内存增强潜在空间自回归的图像异常检测方法是可以重现的,并且可以用在多种工业应用中。例如,本申请的基于内存增强潜在空间自回归的图像异常检测方法可以用于需要进行图像异常检测的应用。

Claims (11)

  1. 基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,包括如下步骤:
    步骤1、选择数据集,并将数据集划分成训练集和测试集;
    步骤2、构建基于内存增强潜在空间自回归模型的网络结构;
    步骤3、对训练集进行预处理;
    步骤4、初始化所述基于内存增强潜在空间自回归模型;
    步骤5、利用预处理后的训练集训练初始化后的基于内存增强潜在空间自回归模型;
    步骤6、通过测试集验证训练好的基于内存增强潜在空间自回归模型,并使用训练好的基于内存增强潜在空间自回归模型判断输入图像是否为异常图像。
  2. 根据权利要求1所述的基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,步骤1中,所述数据集包括MNIST数据集和CIFAR10数据集。
  3. 根据权利要求1或2所述的基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,步骤2中,所述基于内存增强潜在空间自回归模型包括:自编码器、自回归模块和内存模块;
    所述自编码器,包括编码器和解码器,自编码器通过编码器将图像压缩到潜在空间,学习到特征表达,然后使用解码器将潜在空间的特征表达解码回图像空间;
    所述自回归模块,用于利用潜在空间的特征对数据进行建模,拟合真实分布,其拟合过程通过如下公式表示:
    Figure PCTCN2021122056-appb-100001
    其中,p(z)为潜在空间分布,p(z i|z <i)为条件概率分布,d表示特征向量z的维度,z i表示特征向量z的第i维,z <i表示特征向量z小于第i维的部分;所述内存模块,用于保存潜在空间的特征表达,然后不属于该潜在空间的特征表达将被内存模块强行转换为内存中最相关的特征表达,其过程是:
    Figure PCTCN2021122056-appb-100002
    其中,M为内存模块,
    Figure PCTCN2021122056-appb-100003
    为内存模块对该特征的表达,w表示潜在空间与每一块内存的相似度,m i表示内存模块的第i块内存,w i表示特征向量z与m i的相似性,N表示内存模块的大小,
    Figure PCTCN2021122056-appb-100004
    其中,exp()表示以e为底的指数函数,||·||为取模运算,
    Figure PCTCN2021122056-appb-100005
    为m i的转置,m j表示内存模块的第j块内存。
  4. 根据权利要求1至3中任一项所述的基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,步骤2中,所述网络结构中:
    自编码器的编码器网络结构包括下采样模块、下采样模块和全连接层,每个块使用残差网络的结构,且由三个连续的卷积层+批标准化+激活函数结构级联组成;
    自编码器的解码器网络结构包括全连接层、上采样模块、上采样模块和卷积层,每个块使用残差网络的结构,且由三个子结构分别为转置卷积层+批标准化+激活函数、卷积层+批标准化+激活函数、转置卷积层+批标准化+激活函数结构级联组成;
    自回归模块网络结构是由多个自回归层组成;
    其中,自编码器中编码器表达为数学模式为:z=en(X),解码器表达为数学模式为:
    Figure PCTCN2021122056-appb-100006
    自回归模块z dist=H(z)和
    Figure PCTCN2021122056-appb-100007
    作用于z,此时
    Figure PCTCN2021122056-appb-100008
  5. 根据权利要求4所述的基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,所述自编码器对图片的处理过程包括如下步骤:
    a、输入一张大小为N*N的图片,经过自编码器的编码阶段,经过一次下采样模块后基于内存增强潜在空间自回归模型的尺寸变小2倍,通道维度由1到32再到64,最后经过整平操作输入编码器中的全连接层,最终得到潜在空间z∈R 64,此时
    Figure PCTCN2021122056-appb-100009
    b、将z送入内存模块,取得z与每一块内存的相似度w,将w经过一次
    Figure PCTCN2021122056-appb-100010
    操作后得到
    Figure PCTCN2021122056-appb-100011
    最后通过内存表示
    Figure PCTCN2021122056-appb-100012
    c、将z∈R 64经过解码器的全连接层得到大小为
    Figure PCTCN2021122056-appb-100013
    的特征,最终经过两次上采样模块还原为原有尺寸,上采样模块的通道维度变化为64到32再到16;
    d、经过最后一个卷积层将特征还原到原有图像空间。
  6. 根据权利要求1至5中任一项所述的基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,步骤2具体包括以下步骤:
    步骤201、选择训练集;
    步骤202、分析训练集信息,所述训练集信息包括图像尺寸、图像强度和图像噪声;
    步骤203、根据所得到的信息构建适用于当前数据的网络结构;
    步骤204、将自编码器、自回归模块和内存模块组装在一起。
  7. 根据权利要求1至6中任一项所述的基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,步骤3具体包括以下步骤:
    步骤301、读取图像数据;
    步骤302、将图像尺寸调整到特定大小;
    步骤303、处理一定量与整体数据的图像空间不同的图片,具体为:灰度空间转RGB空间以及RGB空间转灰度空间;
    步骤304、对图像数据进行正则化操作。
  8. 根据权利要求1至7中任一项所述的基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,步骤4具体是指:使用不同的初始化方法对网络进行初始化,即:对自编码器和自回归模块使用随机初始化方式,对内存模块使用均匀分布初始化。
  9. 根据权利要求1至8中任一项所述的基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,步骤5具体包括以下步骤:
    步骤501、加载预处理之后的数据;
    步骤502、为自编码器、自回归模块和内存模块分别设置学习率;
    步骤503、固定内存模块训练自回归模块;
    步骤504、固定自回归模块训练内存模块;
    步骤505、迭代进行步骤503和步骤504直至基于内存增强潜在空间自回归模型收敛。
  10. 根据权利要求1至9中任一项所述的基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,在步骤5中,所述模型的损失函数为:
    L=L rec+αL llk+βL mem
    其中,L rec表示原图片与重建图片的重建损失,
    Figure PCTCN2021122056-appb-100014
    表示负对数似然损失,
    Figure PCTCN2021122056-appb-100015
    表示特征与内存模块的权重系数的熵,α,β分别表示损失函数的权重系数,用以平衡不同损失的比例。对于不同的数据集,α,β有所不同。对于MNIST和CIFAR10,α分别等于1、0.1,β分别等于0.0002、0.0002。
  11. 根据权利要求1至10中任一项所述的基于内存增强潜在空间自回归的图像异常检测方法,其特征在于,步骤6具体是指:使用训练好的基于内存增强潜在空间自回归模型,将图片输入基于内存增强潜在空间自回归模型,获取自回归模块输出的概率和自编码器重建图片与原图片的重建差异,分别作为两个分数,将两个分数相加得到最终分数,通过先前设定的阈值判定是否是异常图像。
PCT/CN2021/122056 2020-11-04 2021-09-30 基于内存增强潜在空间自回归的图像异常检测方法 WO2022095645A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/618,162 US20230154177A1 (en) 2020-11-04 2021-09-30 Autoregression Image Abnormity Detection Method of Enhancing Latent Space Based on Memory

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011212882.XA CN112036513B (zh) 2020-11-04 2020-11-04 基于内存增强潜在空间自回归的图像异常检测方法
CN202011212882.X 2020-11-04

Publications (1)

Publication Number Publication Date
WO2022095645A1 true WO2022095645A1 (zh) 2022-05-12

Family

ID=73573153

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/122056 WO2022095645A1 (zh) 2020-11-04 2021-09-30 基于内存增强潜在空间自回归的图像异常检测方法

Country Status (3)

Country Link
US (1) US20230154177A1 (zh)
CN (1) CN112036513B (zh)
WO (1) WO2022095645A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998633A (zh) * 2022-06-28 2022-09-02 河南大学 一种基于视图注意力驱动的多视图聚类方法
CN116736372A (zh) * 2023-06-05 2023-09-12 成都理工大学 一种基于谱归一化生成对抗网络的地震插值方法及系统

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036513B (zh) * 2020-11-04 2021-03-09 成都考拉悠然科技有限公司 基于内存增强潜在空间自回归的图像异常检测方法
CN112967251B (zh) * 2021-03-03 2024-06-04 网易(杭州)网络有限公司 图片检测方法、图片检测模型的训练方法及装置
CN113222972B (zh) * 2021-05-31 2024-03-19 辽宁工程技术大学 基于变分自编码器算法的图像异常检测方法
CN113658119B (zh) * 2021-08-02 2024-06-18 上海影谱科技有限公司 一种基于vae的人脑损伤检测方法及装置
CN113985900B (zh) * 2021-08-04 2023-09-08 铜陵有色金属集团股份有限公司金威铜业分公司 一种四旋翼无人机姿态动态特性模型、辨识方法及自适应柔化预测控制方法
CN115205650B (zh) * 2022-09-15 2022-11-29 成都考拉悠然科技有限公司 基于多尺度标准化流的无监督异常定位与检测方法及装置
CN117077085B (zh) * 2023-10-17 2024-02-09 中国科学技术大学 大模型结合双路记忆的多模态有害社交媒体内容识别方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018096310A1 (en) * 2016-11-24 2018-05-31 Oxford University Innovation Limited Patient status monitor and method of monitoring patient status
CN109697974A (zh) * 2017-10-19 2019-04-30 百度(美国)有限责任公司 使用卷积序列学习的神经文本转语音的系统和方法
CN111708739A (zh) * 2020-05-21 2020-09-25 北京奇艺世纪科技有限公司 时序数据的异常检测方法、装置、电子设备及存储介质
CN112036513A (zh) * 2020-11-04 2020-12-04 成都考拉悠然科技有限公司 基于内存增强潜在空间自回归的图像异常检测方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7738683B2 (en) * 2005-07-22 2010-06-15 Carestream Health, Inc. Abnormality detection in medical images
US10685159B2 (en) * 2018-06-27 2020-06-16 Intel Corporation Analog functional safety with anomaly detection
US10878570B2 (en) * 2018-07-17 2020-12-29 International Business Machines Corporation Knockout autoencoder for detecting anomalies in biomedical images
CN109949278B (zh) * 2019-03-06 2021-10-29 西安电子科技大学 基于对抗自编码网络的高光谱异常检测方法
CN110910982A (zh) * 2019-11-04 2020-03-24 广州金域医学检验中心有限公司 自编码模型训练方法、装置、设备及存储介质
CN111104241A (zh) * 2019-11-29 2020-05-05 苏州浪潮智能科技有限公司 基于自编码器的服务器内存异常检测方法、系统及设备
CN111598881B (zh) * 2020-05-19 2022-07-12 西安电子科技大学 基于变分自编码器的图像异常检测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018096310A1 (en) * 2016-11-24 2018-05-31 Oxford University Innovation Limited Patient status monitor and method of monitoring patient status
CN109697974A (zh) * 2017-10-19 2019-04-30 百度(美国)有限责任公司 使用卷积序列学习的神经文本转语音的系统和方法
CN111708739A (zh) * 2020-05-21 2020-09-25 北京奇艺世纪科技有限公司 时序数据的异常检测方法、装置、电子设备及存储介质
CN112036513A (zh) * 2020-11-04 2020-12-04 成都考拉悠然科技有限公司 基于内存增强潜在空间自回归的图像异常检测方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ABATI DAVIDE; PORRELLO ANGELO; CALDERARA SIMONE; CUCCHIARA RITA: "Latent Space Autoregression for Novelty Detection", 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, 15 June 2019 (2019-06-15), pages 481 - 490, XP033686628, DOI: 10.1109/CVPR.2019.00057 *
GONG DONG; LIU LINGQIAO; LE VUONG; SAHA BUDHADITYA; MANSOUR MOUSSA REDA; VENKATESH SVETHA; VAN DEN HENGEL ANTON: "Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection", 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), IEEE, 27 October 2019 (2019-10-27), pages 1705 - 1714, XP033724029, DOI: 10.1109/ICCV.2019.00179 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998633A (zh) * 2022-06-28 2022-09-02 河南大学 一种基于视图注意力驱动的多视图聚类方法
CN116736372A (zh) * 2023-06-05 2023-09-12 成都理工大学 一种基于谱归一化生成对抗网络的地震插值方法及系统
CN116736372B (zh) * 2023-06-05 2024-01-26 成都理工大学 一种基于谱归一化生成对抗网络的地震插值方法及系统

Also Published As

Publication number Publication date
US20230154177A1 (en) 2023-05-18
CN112036513B (zh) 2021-03-09
CN112036513A (zh) 2020-12-04

Similar Documents

Publication Publication Date Title
WO2022095645A1 (zh) 基于内存增强潜在空间自回归的图像异常检测方法
CN110443143B (zh) 多分支卷积神经网络融合的遥感图像场景分类方法
CN111598881B (zh) 基于变分自编码器的图像异常检测方法
CN111353373B (zh) 一种相关对齐域适应故障诊断方法
CN109993236B (zh) 基于one-shot Siamese卷积神经网络的少样本满文匹配方法
US11087452B2 (en) False alarm reduction system for automatic manufacturing quality control
Graham et al. Denoising diffusion models for out-of-distribution detection
CN110287777B (zh) 一种自然场景下的金丝猴躯体分割算法
CN111222457B (zh) 一种基于深度可分离卷积的鉴别视频真伪性的检测方法
CN109389166A (zh) 基于局部结构保存的深度迁移嵌入聚类机器学习方法
CN110321805B (zh) 一种基于时序关系推理的动态表情识别方法
CN116910752B (zh) 一种基于大数据的恶意代码检测方法
CN115526847A (zh) 一种基于半监督学习的主板表面缺陷检测方法
CN111371611B (zh) 一种基于深度学习的加权网络社区发现方法及装置
CN114897764A (zh) 基于标准化通道注意力的肺结节假阳性排除方法及装置
CN116206227B (zh) 5g富媒体信息的图片审查系统、方法、电子设备及介质
CN110705631B (zh) 一种基于svm的散货船舶设备状态检测方法
CN112949344B (zh) 一种用于异常检测的特征自回归方法
CN115862119A (zh) 基于注意力机制的人脸年龄估计方法及装置
CN111797732B (zh) 一种对采样不敏感的视频动作识别对抗攻击方法
CN110728615B (zh) 基于序贯假设检验的隐写分析方法、终端设备及存储介质
CN114529746B (zh) 基于低秩子空间一致性的图像聚类方法
Pacheco Reina Convolutional neural network for distortion Classification in face images.
US20240013523A1 (en) Model training method and model training system
CN111340111B (zh) 基于小波核极限学习机识别人脸图像集方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21888342

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21888342

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21888342

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23.11.2023)