CN113077013A - High-dimensional data fault anomaly detection method and system based on generation countermeasure network - Google Patents
High-dimensional data fault anomaly detection method and system based on generation countermeasure network Download PDFInfo
- Publication number
- CN113077013A CN113077013A CN202110468859.5A CN202110468859A CN113077013A CN 113077013 A CN113077013 A CN 113077013A CN 202110468859 A CN202110468859 A CN 202110468859A CN 113077013 A CN113077013 A CN 113077013A
- Authority
- CN
- China
- Prior art keywords
- network
- data
- generating
- distribution
- generation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 47
- 238000000034 method Methods 0.000 claims abstract description 31
- 230000002159 abnormal effect Effects 0.000 claims abstract description 16
- 238000009826 distribution Methods 0.000 claims description 54
- 230000006870 function Effects 0.000 claims description 17
- 230000003042 antagnostic effect Effects 0.000 claims description 14
- 230000005856 abnormality Effects 0.000 claims description 10
- 230000002547 anomalous effect Effects 0.000 claims description 7
- 239000013598 vector Substances 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 5
- 239000004065 semiconductor Substances 0.000 abstract description 4
- 230000008569 process Effects 0.000 abstract description 3
- 239000000523 sample Substances 0.000 description 27
- 238000002474 experimental method Methods 0.000 description 8
- 238000012360 testing method Methods 0.000 description 6
- 230000003044 adaptive effect Effects 0.000 description 4
- 238000010200 validation analysis Methods 0.000 description 4
- 238000002059 diagnostic imaging Methods 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 241000282472 Canis lupus familiaris Species 0.000 description 1
- RTAQQCXQSZGOHL-UHFFFAOYSA-N Titanium Chemical compound [Ti] RTAQQCXQSZGOHL-UHFFFAOYSA-N 0.000 description 1
- 238000002679 ablation Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012733 comparative method Methods 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013501 data transformation Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000013077 scoring method Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000009827 uniform distribution Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a high-dimensional data fault anomaly detection method and system based on a generation countermeasure network, and relates to the technical field of anomaly detection in multi-dimensional data, wherein the method comprises the following steps: step S1: generating a countermeasure network architecture; step S2: after the confrontation network architecture is generated, the confrontation network training is stably generated, and a training model is obtained; step S3: and setting a scoring function according to the training model, and performing abnormal scoring on the countermeasure network. The invention can detect abnormal data which does not appear before, and process the abnormal detection of the two-dimensional and three-dimensional image data of the semiconductor wafer.
Description
Technical Field
The invention relates to the technical field of anomaly detection in multidimensional data, in particular to a high-dimensional data fault anomaly detection method and system based on a generation countermeasure network.
Background
Anomaly detection in multidimensional data is a problem of great practical significance, and comprises a large number of practical applications in the real world, including network security manufacturing, fraud detection, medical imaging and the like. A typical anomaly detection method requires modeling of the pattern of normal data to identify anomalous samples that do not conform to the normal data pattern. Although anomaly detection has been subject to a great deal of research, there are still significant challenges to developing efficient methods suitable for complex and high-dimensional data.
Generating a countermeasure network is a powerful, high-dimensional data modeling framework that can address this challenge. The standard generative confrontation network consists of two neural networks, one is the generative network (G) and one is the discriminative network (J) which learns the mapping pattern by learning mapping from hidden data variables (z) (assumed to obey gaussian or uniform distribution) to the virtual real data space during training, while the discriminative network is used to learn samples that distinguish real data from virtual real data generated by the generative network. Generation of countermeasure networks has enjoyed great success in the application of virtual image generation and is increasingly used in speech and medical imaging applications.
The invention patent publication US6292582B1 discloses a method and system for identifying defects in semiconductors that is capable of classifying specific types of anomalies, including image acquisition and processing, but searches for pre-extracted features based on a nearest neighbor database to find nearest neighbor anomalies and fail to detect new anomalies.
The invention patent with publication number US8126681B2 discloses a method for identifying semiconductor outliers using a sequential combined data transformation process, which is based on simple statistical techniques, has a fast evaluation speed and a certain theoretical basis, but needs to use electrical test data with higher acquisition cost, and uses classical statistical methods to detect outliers, which may not be enough to capture outliers in complex image data.
Disclosure of Invention
In view of the defects in the prior art, the present invention provides a method and a system for detecting fault abnormality of high-dimensional data based on a generative countermeasure network, so as to solve the above problems.
According to the high-dimensional data fault abnormity detection method and system based on the generation countermeasure network provided by the invention, the scheme is as follows:
in a first aspect, a method for detecting fault abnormality of high-dimensional data based on generation of a countermeasure network is provided, the method including:
constructing and generating an antagonistic network architecture;
after a generated confrontation network architecture is constructed, stably generating confrontation network training to obtain a training model;
and setting a scoring function according to the training model, performing anomaly scoring on the generation countermeasure network, and performing anomaly detection on the high-dimensional data by using the generation countermeasure network.
Preferably, the specific steps of generating the countermeasure network architecture are as follows:
the standard generation countermeasure network comprises a generation network G and a discrimination network J, and the generation network G and the discrimination network J are arranged in a group of M data samplesTraining, wherein i ═ 1, 2, …, M;
in a hidden data space which obeys specific distribution, a generated network G maps a collected random hidden variable z to an input data space X;
discrimination network J attempts to combine actual data samples x(i)Judging with a sample G (z) generated by G;
p is to bex(x) Is defined as the probability of the distribution of the real data X in the sample space X, andfor hidden data z in hidden data spaceThe distribution probability of (1); p is a radical ofG(x) Defined as generating the distribution probability of the network G in the sample space X;
generating a Confrontation network model to distribute p jointlyG(x,z)=p(z)pG(x | z) and pE(x,z)=pX(x)pECountermeasure decision network J with (z | x) and x and z as inputsxzMatching;
generating a countermeasure network would identify network JxzThe generating network G and the coding network E are determined as a saddle point problem MING,EMAX Jxz V(JxzSolutions of E, G) in which V: (Jxz, E, G) is defined as:
wherein the content of the first and second substances,probability expectation functions representing the data distribution in the X and Z data spaces, respectively;
for fixed values of the encoding network E and the generating network G, optimally judging the networkComprises the following steps:
for optimal discriminant networksIf and only if pE(x,z)=pG(x, z), implementing training criterion C (E, G) ═ MAX Jxz V(DxzE, G).
Preferably, the specific steps of acquiring the training model are as follows:
implicit in the spatial condition H using an additional antagonistic learning discriminant network Jzzπ(x|z)=-Επ(x,z)[logπ(x|z)]Regularization, where π (x, z) is a joint distribution over x and z, with the following saddle point targets:
wherein, V (J)XZ,JXX,JZZE, G) are defined as
V(Jxz,Jxx,Jzz,E,G)=V(Jxz,E,G)+V(Jxx,E,G)+V(Jzz,E,G)
Wherein, Jzz、JxzAnd JxxEach represents a discriminating network, G represents a generating network, and E represents an encoder.
Preferably, the abnormality scoring specifically comprises the following steps:
performing effective modeling on data distribution, and learning normal data p by using generation network GG(x)=pX(x) In which
Learning the distribution of the data so as to accurately recover the re-expression of the latent data space;
ensuring that a normal sample can be accurately reconstructed;
preferably, the normal sample is reconstructed as follows:
is calculated at JxxDistance between two vectors projected in learned feature space, a (x) | | fxx(x,x)-fxx(x,G(E(x)))||1;
Wherein f () is the model JxxThe last fully connected layer of (a);
training a model on normal data to provide E, G, Jxz、JxxAnd JzzThen, a scoring function u (x) is defined, which measures the degree of abnormality of the example x according to the difference between the reconstructed sample and the abnormal sample:
U(x)=||fxx(x,x)-fxx(x,G(E(x)))||2
u (x) samples with large values are considered to have a large probability of being anomalous data.
In a second aspect, a system for detecting fault and anomaly of high-dimensional data based on generation of a countermeasure network is provided, the system comprising:
model M1: constructing and generating an antagonistic network architecture;
model M2: after a generated confrontation network architecture is constructed, stably generating confrontation network training to obtain a training model;
model M3: and setting a scoring function according to the training model, performing anomaly scoring on the generation countermeasure network, and performing anomaly detection on the high-dimensional data by using the generation countermeasure network.
Preferably, the module M1 includes:
the standard generation countermeasure network comprises a generation network G and a discrimination network J, and the generation network G and the discrimination network J are arranged in a group of M data samplesTraining, wherein i ═ 1, 2, …, M;
in a hidden data space which obeys specific distribution, a generated network G maps a collected random hidden variable z to an input data space X;
discrimination network J attempts to combine actual data samples x(i)Judging with a sample G (z) generated by G;
p is to bex(x) Is defined as the probability of the distribution of the real data X in the sample space X, andfor hidden data z in hidden data spaceThe distribution probability of (1); p is a radical ofG(x) Defined as generating the distribution probability of the network G in the sample space X;
generating a Confrontation network model to distribute p jointlyG(x,z)=p(z)pG(x | z) and pE(x,z)=pX(x)pECountermeasure decision network J with (z | x) and x and z as inputsxzMatching;
generating a countermeasure network would identify network JxzThe generating network G and the coding network E are determined as a saddle point problem MING,E MAX Jxz V(JxzE, G), where V (Jxz, E, G) is defined as:
wherein the content of the first and second substances,which are the expected functions of the data distribution in the X and Z data spaces, respectively.
For fixed values of the encoding network E and the generating network G, optimally judging the networkComprises the following steps:
for optimal discriminant networksIf and only if pE(x,z)=pG(x, z), implementing training criterion C (E, G) ═ MAX Jxz V(DxzE, G).
Preferably, the module M2 includes:
implicit in the spatial condition H using an additional antagonistic learning discriminant network Jzzπ(x|z)=-Επ(x,z)[logπ(x|z)]Regularization, where π (x, z) is a joint distribution over x and z, with the following saddle point targets:
wherein, V (J)XZ,JXX,JZZE, G) are defined as
V(Jxz,Jxx,jzz,E,G)=V(Jxz,E,G)+V(Jxx,E,G)+V(Jzz,E,G)
Wherein, Jzz、JxzAnd JxxEach represents a discriminating network, G represents a generating network, and E represents an encoder.
Preferably, the module M3 includes:
performing effective modeling on data distribution, and learning normal data p by using generation network GG(x)=pX(x) In which
Learning the distribution of the data so as to accurately recover the re-expression of the latent data space;
ensuring that a normal sample can be accurately reconstructed;
preferably, the method for ensuring accurate reconstruction of a normal sample specifically comprises the following steps:
is calculated at JxxDistance between two vectors projected in learned feature space, a (x) | | fxx(x,x)-fxx(x,G(E(x)))||1;
Wherein f () is the model JxxThe last fully connected layer of (a);
training a model on normal data to provide E, G, Jxz、JxxAnd JzzThen, a scoring function u (x) is defined, which measures the degree of abnormality of the example x according to the difference between the reconstructed sample and the abnormal sample:
U(x)=||fxx(x,x)-fxx(x,G(E(x)))||2
u (x) samples with large values are considered to have a large probability of being anomalous data.
Compared with the prior art, the invention has the following beneficial effects:
1. the deep learning anomaly detection model provided by the invention does not need known anomaly data during training;
2. the invention can detect the abnormal data which does not appear before;
3. the invention can process the abnormal detection of two-dimensional and three-dimensional image data of the semiconductor wafer.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a final generated countermeasure network model;
FIG. 2 is a graph of test data using outliers, encoded expression, and reconstruction.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The embodiment of the invention provides a high-dimensional data fault abnormity detection method based on a generated countermeasure network, and as shown in figure 1, firstly, a framework for generating the countermeasure network is constructed:
the standard generation countermeasure network comprises a generation network G and a discrimination network J, and the generation network G and the discrimination network J are arranged in a group of M data samplesTraining, wherein i ═ 1, 2, …, M; in a hidden data space which obeys specific distribution, a generated network G maps a collected random hidden variable z to an input data space X; discrimination network J attempts to combine actual data samples x(i)And G (z) is distinguished from the sample G (z) generated by G.
In the overall structure, the two networks, the generation network G and the discrimination network J compete with each other: the generating network G attempts to generate samples similar to the real data, while the discriminating network J will be used to discriminate between the pseudo samples and the real data samples generated by the generating network. Training the generating countermeasure network then typically takes an alternating gradient step so that the generating network G can better "fool" the discriminating network J and cause the discriminating network J to better discriminate the pseudo-samples generated by the generating network G.
Formally, p isx(x) Is defined as the probability of the distribution of the real data X in the sample space X, andfor hidden data z in hidden data spaceThe distribution probability of (1); p is a radical ofG(x) Defined as generating the distribution probability of the network G in the sample space X;
saddle point problem MIN needs to be solved in generating a countermeasure networkG MAXJV (J, G) for training a discrimination network J and generating a network G, wherein
The optimal generation network may produce a distribution p of data corresponding to the true dataX(x) Matched distribution pG(x)。
for optimal discriminant networksIf and only if pG(x)=pX(x) Then, we can implement the training criterion c (g) ═ MAXJGlobal minimum of V (J, G)
In practical applications, the method of alternating gradient descent is usually performed on the discrimination network J and the network generating the network G to train the discrimination network J and the generation network G: one of the network parameters is fixed, so that V (J, G) is maximized (for J) or minimized (for G) accordingly. After training the generative network to generate the antagonistic network, p can be acquired using the generative networkXReal sample, coincidenceGenerating dummy data samples for G (z). It should be noted that for a given data sample x, it is not possible to compute its distribution probability explicitly or to compute its distribution probability of the hidden data.
And (3) detecting the abnormal situation of the confrontation network:
the standard generation countermeasure network supports only valid data samples, which can be adjusted in several ways to achieve anomaly detection. For example, for data point x, a sample can be used to calculate the probability of distribution of an anomaly for x, thereby determining if it is an outlier. While effective sampling can be made from generating a competing network, the exact computation of the probabilities typically requires a large number of samples, resulting in a very large number of probabilistic computations. Another approach is to "invert" the generating network to find the hidden variable z by random gradient descent that reduces the reconstruction error or related objective as much as possible. Since each gradient calculation needs to be propagated backward through the generation network, the calculation amount of the method is very large, and the method is difficult to be practically applied.
To improve computational efficiency, we build a generative confrontation network containing a coding network E that maps data samples x to a hidden space z during training. In such models, computing the implicit spatial representation of the data point x (approximation) can be achieved by simply inputting x through the encoding network E. New improvements are incorporated in our model to improve the coding network by adding additional discriminant networks to achieve codec consistency, i.e. G (e (x)) ≈ x.
Theoretically, reference [ 1]]Jeff Donahue,PhilippGeneration of antagonistic networks and literature from and Trevor Darrell, Adversal failure recovery, International Conference on Learning recovery, 2017 [ 2]]Vincent Dumoulin,Ishmael Belghazi,Ben Poole,Alex Lamb,Martin Arjovsky,Olivier Mastropietro,and AaGeneration of antagonistic network models in ron Courville.Adversally left involved.Internationality Conference on Learning responses, 2017. the antagonistic network models will jointly distribute pG(x,z)=p(z)pG(x | z) and pE(x,z)=pX(x)pECountermeasure decision network J with (z | x) and x and z as inputsxzAnd (4) matching. Document [ 1]]Generation of countermeasure networks and documents [ 2]]The generation of the countermeasure network will determine the network JxzThe generating network G and the coding network E are determined as a saddle point problem MING,E MAX Jxz V(JxzE, G), where V (Jxz, E, G) is defined as:
wherein the content of the first and second substances,probability expectation functions representing the data distribution in the X and Z data spaces, respectively;
for fixed values of the encoding network E and the generating network G, optimally judging the networkComprises the following steps:
for optimal discriminant networksIf and only if pE(x,z)=pG(x, z), implementing training criterion C (E, G) ═ MAX Jxz V(DxzE, G).
Although theoretically, p is distributed jointlyE(x, z) and pG(x, z) should be equal, but in practice this is often not the case, as the training does not necessarily converge to the solution of the saddle point problem. This may result in a violation of codec consistency,resulting in G (E (x) ≠ x).
To solve this problem, the ALICE framework [3]]Chunyuan Li, Hao Liu, Changyou Chen, Yuche Pu, Liqun Chen, Ricardo Henao, and Lawrence Carin. Alice Towards understanding adaptive entropy learning for joint distribution matching in I.Guyon, U.V.Luxburg, S.Bengio, H.Wallach, R.Fergus, S.Vishwatanathan, and R.Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5495-5503. Curran Associates, Inc.,2017. estimating entropy in a antagonistic mannerπ(x|z)=-Επ(x,z)[logπ(x|z)](where π (x, z) is the probability of joint distribution over x and z), thereby resolving codec uniformity. The saddle point problem MING,E Max Jxz VALICE(Jxz, E, G) includes encoding the network E and generating conditional entropy regularization (V) on the network GCE):
VAlice(jxz,E,G0=V(Jxz,E,G)+VCE(E,G)
The conditional entropy regularization applied to encoder E and generation network G may be pre-estimated using an additional discriminant network Jxx (x, x),
and document [3] proves that the discrimination network can effectively ensure the consistency of encoding and decoding.
Stably generating the confrontation network training:
referring to FIG. 1, an additional counterstudy discriminant network Jzz is used to conceal the spatial condition Hπ(x|z)=-Επ(x,z)[logπ(x|z)]Regularization, where π (x, z) is a joint distribution over x and z, with the following saddle point targets:
in conclusion, our proposed strategy to combat learning anomaly detection solves the saddle point problem during training,
wherein, V (J)XZ,JXX,JZZE, G) are defined as
V(Jxz,Jxx,Jzz,E,G)=V(Jxz,E,G)+V(Jxx,E,G)+V(Jzz,E,G)
Wherein, Jzz、JxzAnd JxxEach represents a discriminating network, G represents a generating network, and E represents an encoder.
And (3) abnormal scoring:
referring to FIG. 2, for the task of anomaly detection, we want to model the data distribution efficiently and learn the normal data p using the generation network GG(x)=pX(x) In whichIn addition, the distribution of the data is learned so as to accurately recover the re-expression of the latent data space; and ensure that normal samples can be accurately reconstructed.
One example of a reconstruction-based anomaly detection technique that evaluates the distance between a sample and its reconstructed output. The normal samples will be reconstructed accurately, while the reconstruction results for the abnormal samples may be poor.
The generation of the confrontation network model ensures effective modeling of data distribution and the distribution of learning data, and ensures the distribution of the learning data and the accurate reconstruction of normal samples by using two symmetrical conditional entropy consistency regularizations.
Secondly, we need to use a good anomaly score to quantify the distance between the real sample and its reconstructed sample. We give an explanation of the reason why the chosen metric should work well. This was confirmed by ablation studies described in the experimental section.
The euclidean distance between the original image and its reconstruction in image space is not a reliable measure of dissimilarity. Since images with similar visual characteristics are not necessarily close to each other in terms of euclidean distance, a lot of noise may be included.
Therefore, these vectors must be projected in the feature space and the reconstruction distance calculated in this new space.
Is calculated at JxxDistance between two vectors projected in learned feature space, a (x) | | fxx(x,x)-fxx(x,G(E(x)))||1Wherein f () is the model JxxThe last fully connected layer.
For the anomaly criterion, JxxIs preferred over the simple output of model Jxx. Shall use JxxMeasuring dissimilarity of the two images; but if the system reaches a stable equilibrium, the approximate and true distributions of the resulting network can be completely fitted. J. the design is a squarexxThe prediction of (a) becomes randomized and is clearly not a suitable metric.
Training a model on normal data to provide E, G, Jxz、JxxAnd JzzThen, a scoring function u (x) is defined, which measures the degree of abnormality of the example x according to the difference between the reconstructed sample and the abnormal sample:
U(x)=||fxx(x,x)-fxx(x,G(E(x)))||2
this enhances the feasibility of discriminating networks, i.e. using our generating network to encode and reconstruct samples, resulting in samples from a true data distribution. Samples with large values of u (x) are considered to have a high probability of being anomalous data.
The specific test set-up was as follows:
data set: the counterlearning anomaly detection method was evaluated on a publicly available image data set. We used the SVHN dataset [9] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng.reading digits in natural images with unsupervised features leaving.012011, which contains house number images, and the CIFAR10 dataset [8] Alex Krizhevsky.leaving multiple layers of features from animals & trucks, including animals or vehicles such as horses, dogs, cars and trucks. The statistics of the data set are shown in table 1.
Data quantity distribution: we generated 10 different datasets from the SVHN dataset [9] and the CIFAR10 dataset [8] by treating one category as the normal category and the remaining 9 categories as the abnormal instances in turn.
For each data set, we first trained 80% of the full formal data set, and the rest was used for the test set.
Table 1: common reference dataset statistics
25% of the training set was deleted for the validation set and outlier samples were deleted from the training set and validation set for the novel detection task. We compared the models using the area under the receiver operating characteristics (AUROC). For image data, we use an early stop on the validation set to determine the number of epochs to use to train the model. We use reconstruction losses derived from the characteristics of the reconstruction discrimination network as validation losses to stop ahead.
Comparing models:
one type of support vector machine (OC-SVM) [7 ]]Andonia creating and oil creating bharath.invoking the generator of a generating adaptive network. NIPS Workshop on adaptive Training, 2016: the method is a classic abnormal detection method, and a judgment boundary is learned around a normal example. We set the v parameter to the assumed known expected anomaly proportion in the dataset and the gamma parameter to 1/m using the radial basis function kernel, where m is the number of input features. After a grid search of this parameter in all experiments (γ ═ 1/m or 10)nWhere n ═ 3, -2, -1, 0, 1), we have found that setting in a completely unsupervised manner is a viable option.
Isolated Forest (IF) [10] Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou in isolation for est. in Proceedings of the 2008 origin IEEE International Conference on Data Mining, ICDM' 08, pages 413-: is a newer classical machine learning technique that will look for isolated anomalous data rather than modeling the normal data distribution. The method constructs a decision tree by using randomly selected segmentation values for randomly selected features. The anomaly score is then defined as the average path length from a particular sample to the root. In all experiments we used the standard parameters provided by scinit-spare [11] F.Pedregosa, G.Varoquaux, A.Gramfort, V.Michel, B.Thirion, O.Grisel, M.Blondel, P.prettenhofer, R.Weiss, V.Dubourg, J.Vanderplas, A.Passos, D.Cournaeau, M.Brucher, M.Perrot, and E.Duchesnag.Scicket-spare: Machine spare in Pyrhon.journal of Machine spare Research,12: 2825-2830, 2011.
Deep structural energy model (DSEBM) [12] Shuangfei ZHai, Yu Cheng, Weining Lu, and Zhongfei ZHang. deep structural energy based models for environmental protection. International Conference on Machine Learning, pages 1100, 2016-: is one of the most advanced methods based on an automatic encoder. The main idea is to accumulate energy between layers, similar to a de-noising autoencoder. In this method, two anomaly determination criteria are studied: energy and reconstruction error. We included these two criteria in the experiment, namely DSEMB-r (reconstruction) and DSEBM-e (energy).
Ano generating a countermeasure network [5]]Thomas Schlegl,PhilippSebas t.Waldste in, Ursula Schmi dt-Erfunth, and Georg Langs.Unsupervi analog detection with general adaptive network to guide marker di scan. International Conference on information processing in Medical Imaging, page pp.146-157,2017: is the only published anomaly detection method based on generation of a countermeasure network. It trains the DC to generate a countermeasure network [4 ]]Yuval Netzer,Tao Wang,Adam Coates,Alessandro Bissacco,Bo Wu,and Andrew Y Ng.Reading digits in natural images with unsupervised features learning.012011. the weights of the network are frozen during reasoning to recover a potential representation of the test data. The anomaly criterion is a combination of reconstruction and discrimination components. The reconstruction component measures the ability of the generation of the countermeasure network to reconstruct data by generating the network, while the discrimination component takes into account the score based on the discrimination network. Document [5]]Two anomaly scoring methods were compared and we selected the variable settings that work best here.
Image data experiment:
on the SVHN dataset, we observed that our model outperformed all baselines. But our method is significantly competitive with other comparative methods on the CIFAR10 dataset. The intuitive understanding is that when training our model for a class, it only learns how to reconstruct samples from that class, possibly reconstructing the abnormal samples as the closest image to the normal class, resulting in false negatives when evaluating the features of the reconstructed discriminative network.
Table 2: image dataset performance
Inference time comparisons between Ano-generated countermeasure networks [5] and our models are reported. The reasoning experiments in table 3 were performed sequentially on the same GPU, which was only used for reasoning operations. The first class of inference times is illustrated for SVHN [9] and CIFAR10[8 ].
It is therefore observed that this model is orders of magnitude faster than other anomaly detection methods based on generation of a countermeasure network.
Table 3: average inference time (ms) on GeForce GTX TITAN X
Details of the experiment:
CIFAR10 and SVHN experimental details
Pretreatment: the pixels are scaled to the range of [ -1,1 ].
DSEBM:
For CIFAR10 and SVHN, we use the following architecture: one convolutional layer, core size 3, stride 2, 64 filters, "same" fill, one max pooling layer and one full link layer containing 128 cells.
Ano generates a competing network:
we performed these experiments using formal DC generation against network architecture and hyper-parameters. For the anomaly detection task, we used the same hyper-parameters as the original paper. The decay rate was estimated to be 0.999 using an exponential moving average.
The invention uses the output of the star layer in the discrimination network to carry out anomaly scoring, and all the convolution layers have the same filling.
The embodiment of the invention provides a high-dimensional data fault abnormity detection method based on a generation countermeasure network, which greatly improves the accuracy of abnormity fault detection and obviously improves the detection speed. The method uses a class of generative confrontation networks that simultaneously learn the encoder network during training, enabling efficient reasoning during testing. In addition, recent techniques have been employed to further improve the encoder network and to stably generate antagonistic network training, and ablative studies have shown that these techniques improve the performance of the anomaly detection task. Experiments on a series of high-dimensional image data prove the efficiency and effectiveness of the method.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.
Claims (10)
1. A high-dimensional data fault anomaly detection method based on a generation countermeasure network is characterized by comprising the following steps:
step S1: constructing and generating an antagonistic network architecture;
step S2: after a generated confrontation network architecture is constructed, stably generating confrontation network training to obtain a training model;
step S3: and setting a scoring function according to the training model, performing anomaly scoring on the generated countermeasure network, and performing anomaly detection on the high-dimensional data by using the generated countermeasure network.
2. The method for detecting fault and anomaly in high-dimensional data based on generation countermeasure network as claimed in claim 1, wherein the step S1 includes the following steps:
step S1.1: the standard generation countermeasure network comprises a generation network G and a discrimination network J, and the generation network G and the discrimination network J are arranged in a group of M data samplesTraining, wherein i ═ 1, 2, …, M;
in a hidden data space which obeys specific distribution, a generated network G maps a collected random hidden variable z to an input data space X;
discrimination network J attempts to combine actual data samples x(i)Judging with a sample G (z) generated by G;
p is to bex(x) Is defined as the probability of the distribution of the real data X in the sample space X, andfor hidden data z in hidden data spaceThe distribution probability of (1); p is a radical ofG(x) Defined as generating the distribution probability of the network G in the sample space X;
generating a Confrontation network model to distribute p jointlyG(x,z)=p(z)pG(x | z) and pE(x,z)=pX(x)pECountermeasure decision network J with (z | x) and x and z as inputsxzMatching;
generating a countermeasure network would identify network JxzThe generating network G and the coding network E are determined as a saddle point problem MING,EMAX JxzV(JxzE, G), where V (Jxz, E, G) is defined as:
wherein the content of the first and second substances,probability expectation functions representing the data distribution in the X and Z data spaces, respectively;
step S1.2: for fixed values of the encoding network E and the generating network G, optimally judging the networkComprises the following steps:
3. The method for detecting fault and anomaly in high-dimensional data based on generation countermeasure network as claimed in claim 1, wherein the step S2 includes the following steps:
implicit in the spatial condition H using an additional antagonistic learning discriminant network Jzzπ(x|z)=-Eπ(x,z)[logπ(x|z)]Regularization, where π (x, z) is a joint distribution over x and z, with the following saddle point targets:
wherein, V (J)XZ,JXX,JZZE, G) are defined as
V(Jxz,Jxx,Jzz,E,G)=V(Jxz,E,G)+V(Jxx,E,G)+V(Jzz,E,G)
Wherein, Jzz、JxzAnd JxxEach represents a discriminating network, G represents a generating network, and E represents an encoder.
4. The method for detecting fault and anomaly in high-dimensional data based on generation countermeasure network as claimed in claim 1, wherein the step S3 includes the following steps:
step S3.1: performing effective modeling on data distribution, and learning normal data p by using generation network GG(x)=pX(x) In which
Step S3.2: learning the distribution of the data so as to accurately recover the re-expression of the latent data space;
step S3.3: ensure that normal samples can be accurately reconstructed.
5. The method for detecting fault anomaly in high-dimensional data based on generation countermeasure network according to claim 4, characterized in that the step S3.3 is as follows:
is calculated at JxxDistance between two vectors projected in learned feature space, a (x) | | fxx(x,x)-fxx(x,G(E(x)))||1;
Wherein f () is the model JxxThe last fully connected layer of (a);
training a model on normal data to provide E, G, Jxz、JxxAnd JzzThen, a scoring function u (x) is defined, which measures the degree of abnormality of the example x according to the difference between the reconstructed sample and the abnormal sample:
U(x)=||fxx(x,x)-fxx(x,G(E(x)))||2
u (x) samples with large values are considered to have a large probability of being anomalous data.
6. A system for detecting fault abnormality of high-dimensional data based on a generation countermeasure network, comprising:
model M1: constructing and generating an antagonistic network architecture;
model M2: after a generated confrontation network architecture is constructed, stably generating confrontation network training to obtain a training model;
model M3: and setting a scoring function according to the training model, performing anomaly scoring on the generation countermeasure network, and performing anomaly detection on the high-dimensional data by using the generation countermeasure network.
7. The system for high-dimensional data failure anomaly detection based on generation countermeasure network according to claim 6, wherein the module M1 comprises:
the standard generation countermeasure network comprises a generation network G and a discrimination network J, and the generation network G and the discrimination network J are arranged in a group of M data samplesTraining, wherein i ═ 1, 2, …, M;
in a hidden data space which obeys specific distribution, a generated network G maps a collected random hidden variable z to an input data space X;
discrimination network J attempts to combine actual data samples x(i)Judging with a sample G (z) generated by G;
p is to bex(x) Is defined as the probability of the distribution of the real data X in the sample space X, andfor hidden data z in hidden data spaceThe distribution probability of (1); p is a radical ofG(x) Defined as generating the distribution probability of the network G in the sample space X;
generating a Confrontation network model to distribute p jointlyG(x,z)=p(z)pG(x | z) and pE(x,z)=pX(x)pECountermeasure decision network J with (z | x) and x and z as inputsxzMatching;
generating a countermeasure network would identify network JxzThe generating network G and the coding network E are determined as a saddle point problem MING,EMAX JxzV(JxzSolutions of E, G) in which V (J)xzE, G) is defined as:
wherein the content of the first and second substances,probability expectation functions representing the data distribution in the X and Z data spaces, respectively;
for fixed values of the encoding network E and the generating network G, optimally judging the networkComprises the following steps:
8. The system for high-dimensional data failure anomaly detection based on generation countermeasure network according to claim 6, wherein the module M2 comprises:
implicit in the spatial condition H using an additional antagonistic learning discriminant network Jzzπ(x|z)=-Eπ(x,z)[logπ(x|z)]Regularization, where π (x, z) is a joint distribution over x and z, with the following saddle point targets:
wherein, V (J)XZ,JXX,JZZE, G) are defined as
V(Jxz,Jxx,Jzz,E,G)=V(Jxz,E,G)+V(Jxx,E,G)+V(Jzz,E,G)
Wherein, Jzz、JxzAnd JxxEach represents a discriminating network, G represents a generating network, and E represents an encoder.
9. The system for high-dimensional data failure anomaly detection based on generation countermeasure network according to claim 6, wherein the module M3 comprises:
performing effective modeling on data distribution, and learning normal data p by using generation network GG(x)=pX(x) In which
Learning the distribution of the data so as to accurately recover the re-expression of the latent data space;
ensure that normal samples can be accurately reconstructed.
10. The system for detecting the high-dimensional data fault abnormality based on the generative countermeasure network according to claim 9, wherein the ensuring of the accurate reconstruction of the normal sample is as follows:
is calculated at JxxDistance between two vectors projected in learned feature space, a (x) | | fxx(x,x)-fxx(x,G(E(x)))||1;
Wherein f () is the model JxxThe last fully connected layer of (a);
training a model on normal data to provide E, G, Jxz、JxxAnd JzzThen, a scoring function u (x) is defined, which measures the degree of abnormality of the example x according to the difference between the reconstructed sample and the abnormal sample:
U(x)=||fxx(x,x)-fxx(x,G(E(x)))||2
u (x) samples with large values are considered to have a large probability of being anomalous data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110468859.5A CN113077013A (en) | 2021-04-28 | 2021-04-28 | High-dimensional data fault anomaly detection method and system based on generation countermeasure network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110468859.5A CN113077013A (en) | 2021-04-28 | 2021-04-28 | High-dimensional data fault anomaly detection method and system based on generation countermeasure network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113077013A true CN113077013A (en) | 2021-07-06 |
Family
ID=76619031
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110468859.5A Pending CN113077013A (en) | 2021-04-28 | 2021-04-28 | High-dimensional data fault anomaly detection method and system based on generation countermeasure network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113077013A (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107330444A (en) * | 2017-05-27 | 2017-11-07 | 苏州科技大学 | A kind of image autotext mask method based on generation confrontation network |
CN108009628A (en) * | 2017-10-30 | 2018-05-08 | 杭州电子科技大学 | A kind of method for detecting abnormality based on generation confrontation network |
CN109165735A (en) * | 2018-07-12 | 2019-01-08 | 杭州电子科技大学 | Based on the method for generating confrontation network and adaptive ratio generation new samples |
CN109410179A (en) * | 2018-09-28 | 2019-03-01 | 合肥工业大学 | A kind of image abnormity detection method based on generation confrontation network |
CN109584221A (en) * | 2018-11-16 | 2019-04-05 | 聚时科技(上海)有限公司 | A kind of abnormal image detection method generating confrontation network based on supervised |
CN109580215A (en) * | 2018-11-30 | 2019-04-05 | 湖南科技大学 | A kind of wind-powered electricity generation driving unit fault diagnostic method generating confrontation network based on depth |
CN110991027A (en) * | 2019-11-27 | 2020-04-10 | 华南理工大学 | Robot simulation learning method based on virtual scene training |
CN112435221A (en) * | 2020-11-10 | 2021-03-02 | 东南大学 | Image anomaly detection method based on generative confrontation network model |
-
2021
- 2021-04-28 CN CN202110468859.5A patent/CN113077013A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107330444A (en) * | 2017-05-27 | 2017-11-07 | 苏州科技大学 | A kind of image autotext mask method based on generation confrontation network |
CN108009628A (en) * | 2017-10-30 | 2018-05-08 | 杭州电子科技大学 | A kind of method for detecting abnormality based on generation confrontation network |
CN109165735A (en) * | 2018-07-12 | 2019-01-08 | 杭州电子科技大学 | Based on the method for generating confrontation network and adaptive ratio generation new samples |
CN109410179A (en) * | 2018-09-28 | 2019-03-01 | 合肥工业大学 | A kind of image abnormity detection method based on generation confrontation network |
CN109584221A (en) * | 2018-11-16 | 2019-04-05 | 聚时科技(上海)有限公司 | A kind of abnormal image detection method generating confrontation network based on supervised |
CN109580215A (en) * | 2018-11-30 | 2019-04-05 | 湖南科技大学 | A kind of wind-powered electricity generation driving unit fault diagnostic method generating confrontation network based on depth |
CN110991027A (en) * | 2019-11-27 | 2020-04-10 | 华南理工大学 | Robot simulation learning method based on virtual scene training |
CN112435221A (en) * | 2020-11-10 | 2021-03-02 | 东南大学 | Image anomaly detection method based on generative confrontation network model |
Non-Patent Citations (1)
Title |
---|
HOUSSAM ZENATI 等: "Adversarially Learned Anomaly Detection", 《2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Choi et al. | Gan-based anomaly detection and localization of multivariate time series data for power plant | |
Sun et al. | Robust co-training | |
CN111581405A (en) | Cross-modal generalization zero sample retrieval method for generating confrontation network based on dual learning | |
CN113902926A (en) | General image target detection method and device based on self-attention mechanism | |
Peng et al. | Fault feature extractor based on bootstrap your own latent and data augmentation algorithm for unlabeled vibration signals | |
Hu et al. | You only segment once: Towards real-time panoptic segmentation | |
CN112785526B (en) | Three-dimensional point cloud restoration method for graphic processing | |
EP4246458A1 (en) | System for three-dimensional geometric guided student-teacher feature matching (3dg-stfm) | |
Xu et al. | A zero-shot fault semantics learning model for compound fault diagnosis | |
Alawieh et al. | Identifying wafer-level systematic failure patterns via unsupervised learning | |
Rahul et al. | Detection and correction of abnormal data with optimized dirty data: a new data cleaning model | |
Abu-Gellban et al. | Livedi: An anti-theft model based on driving behavior | |
Han et al. | L-Net: lightweight and fast object detector-based ShuffleNetV2 | |
CN115587335A (en) | Training method of abnormal value detection model, abnormal value detection method and system | |
Zhao et al. | Fault diagnosis based on space mapping and deformable convolution networks | |
Kang et al. | Htnet: Anchor-free temporal action localization with hierarchical transformers | |
Haurum et al. | Multi-scale hybrid vision transformer and Sinkhorn tokenizer for sewer defect classification | |
CN113077013A (en) | High-dimensional data fault anomaly detection method and system based on generation countermeasure network | |
CN115222998B (en) | Image classification method | |
Malik et al. | Teacher-class network: A neural network compression mechanism | |
He et al. | A diffusion-based framework for multi-class anomaly detection | |
Wang et al. | Unsupervised anomaly detection with local-sensitive VQVAE and global-sensitive transformers | |
Wang et al. | Deep embedded clustering with asymmetric residual autoencoder | |
Wickramasinghe et al. | Deep embedded clustering with ResNets | |
Olin-Ammentorp et al. | Bridge networks: Relating inputs through vector-symbolic manipulations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210706 |