CN111240279A - Confrontation enhancement fault classification method for industrial unbalanced data - Google Patents
Confrontation enhancement fault classification method for industrial unbalanced data Download PDFInfo
- Publication number
- CN111240279A CN111240279A CN201911369696.4A CN201911369696A CN111240279A CN 111240279 A CN111240279 A CN 111240279A CN 201911369696 A CN201911369696 A CN 201911369696A CN 111240279 A CN111240279 A CN 111240279A
- Authority
- CN
- China
- Prior art keywords
- data
- generated
- samples
- generator
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000012549 training Methods 0.000 claims abstract description 31
- 238000012216 screening Methods 0.000 claims abstract description 12
- 238000004519 manufacturing process Methods 0.000 claims abstract description 6
- 239000011159 matrix material Substances 0.000 claims description 24
- 230000006870 function Effects 0.000 claims description 18
- 238000013528 artificial neural network Methods 0.000 claims description 16
- 238000012886 linear function Methods 0.000 claims description 3
- 238000000513 principal component analysis Methods 0.000 claims description 3
- 238000003745 diagnosis Methods 0.000 abstract description 5
- 239000000498 cooling water Substances 0.000 description 3
- 239000000047 product Substances 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000003756 stirring Methods 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B19/00—Programme-control systems
- G05B19/02—Programme-control systems electric
- G05B19/418—Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM]
- G05B19/41875—Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM] characterised by quality surveillance of production
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/31—From computer integrated manufacturing till monitoring
- G05B2219/31359—Object oriented model for fault, quality control
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Manufacturing & Machinery (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an confrontation enhancement fault classification method for industrial unbalanced data, and belongs to the field of fault diagnosis and classification in industrial processes. According to the method, through countertraining between a multi-classification discriminator and a small sample generator, the small sample generator respectively generates data directionally for each type of unbalanced small samples, and through data screening based on the Mahalanobis distance of a principal element space, generated data closer to real data are obtained. A dynamic table of a supplementary database is established, real data are supplemented by utilizing a quantitative updating sample set to obtain a new data set, and the imbalance of industrial data is solved. The classification method couples the training processes of the generator and the multi-classification discriminator together, and more effectively utilizes computing resources.
Description
Technical Field
The invention belongs to the field of industrial process fault diagnosis and classification, and particularly relates to an anti-enhancement fault classification method for industrial unbalanced data.
Background
With the development of modern industry, industrial data is accumulated in large quantities, and a basis is provided for data-driven process analysis. Among them, the fault diagnosis problem is a typical application of industrial data. Many data-driven methods, such as support vector machines, back-propagation neural networks, etc. algorithms have been widely used for fault classification in some industrial processes.
However, since the fault conditions occurring in the industry are rare, the collected fault data is very limited. Compared with a large amount of non-fault data, namely data under a normal condition, the proportion of fault data is low. In addition, there is an imbalance in the number of faults of different probabilities. This presents difficulties for algorithmic classification based on balanced data distribution. Therefore, the fault diagnosis in the industry is essentially a multi-classification problem of unbalanced data, and needs to be solved urgently.
Supplementing data from the data plane is the most direct way to solve the imbalance problem. The generation of the countermeasure network is a promising generation model at present, and is composed of a generator and a discriminator. Through the countertraining between the generator and the arbiter, the generator can generate data that spoofs the arbiter. Thus, generating data that is generated against the network can be applied to the complement of small samples. However, the training process for generating the countermeasure network is extremely unstable, and noise points deviating from real data or pattern collapse problems are easily generated, so that the authenticity of the generated data is influenced. And the generation of the confrontation network and the training of the classifier are two independent processes, so that the complexity of the model is increased and the waste of computing resources is reduced.
Disclosure of Invention
Aiming at the classification problem of the industrial unbalanced data, the invention provides an confrontation enhancement fault classification method for the industrial unbalanced data, which realizes accurate classification by utilizing a confrontation generation network structure of a small sample generator and a multi-classification discriminator.
The purpose of the invention is realized by the following technical scheme: a confrontation enhancement fault classification method for industrial unbalance data specifically comprises the following steps:
(1) collecting offline data in a historical industrial process as an original data set X, wherein the X comprises m kinds of large sample data XbigAnd n kinds of small sample data XsmallI.e. X ═ Xbig,Xsmall}={(xi,yi) In which y isiE {1, 2., m + n }, i represents the number of samples, and the number of samples of m large samples is the same, while the number of samples of small sample data is less than that of large samples; the original data set X is preprocessed and converted into 0, 1 by linearizing the original data through a linear function]Range, resulting in a training data set
(2) The data generation stage specifically comprises the following substeps:
(2.1) the small sample generators construct structures of n generators for n kinds of small sample data of unbalanced data, each small sample generator is a fully-connected neural network with the same structure, the input of the fully-connected neural network is Gaussian noise z, and the dimension of the fully-connected neural network is p; the output is the characteristic of the generated data, and the dimension is q; the n generators are mutually independent to form a parallel multi-generator structure; before Gaussian noise z is input into a generator, initializing a hyper-parameter of each small sample generator network;
(2.2) Gaussian noise z with a mean value of 0 and a variance of 0.1 is generated by a random function. Inputting Gaussian noise z into the small sample generator, and outputting n groups of generated data G (z) { G } by the generator1,G2,…,Gn};
(3) The data screening stage specifically comprises the following steps:
(3.1) for the training data setPerforming principal component analysis to obtain a reference principal component matrix T1Respectively projecting the generated data G (z) to a reference pivot space to obtain a corresponding pivot matrix T2(ii) a Reference pivot matrix T1Determining a reference pivot matrix T by accumulating the variance percentage to 98%1The number of the main elements in (1); corresponding principal component matrix T2The number of principal elements in (1) and a reference principal element matrix T1The number of the main elements in the system is equal.
(3.2) separately aligning the reference pivot matrices T1Solving the Mahalanobis distance MD by the inner small sample data centroid1maxTo obtain the farthest distances under different categoriesBy said MD1maxDetermining the upper threshold value of data screening to be kMD1maxAnd k is a screening coefficient.
(3.3) for the corresponding principal component matrix T2And a reference pivot matrix T1The corresponding essence center of the small sample calculates the mahalanobis distanceWill MD2And kMD1maxMaking a comparison if MD2<kMD1maxThen the generated data G (z) is considered to be close to the training data setIs the effective point Gvalid(ii) a If MD2>kMD1maxWhen the generated data G deviates from the training data setIs an outlier noise Ginvalid。
(3.4) bringing the effective Point GvalidExtracting from the generated data G (z) set, and giving the corresponding class labels y to the generated data G (z) set to remove outlier noise Ginvalid。
(4) The dynamic table stage of the supplementary database specifically comprises the following steps:
(4.1) adding the effective point GvalidImporting a dynamic table L of a supplementary database, wherein the dynamic table L is a sample sequence distributed to the generated data of each small sample class, namely a sample sequences; the length of each sample sequence in the supplementary database plus the number of real samples of the corresponding category small samples is equal to the number of real samples of each category large samples;
(4.2) when the accumulated number of the generated samples is smaller than the sequence length, continuously writing the generated samples into a dynamic table L of a supplementary database in the iteration process; and when the accumulated number of the generated samples is greater than or equal to the length of the sample sequence, eliminating the generated data at the end, writing the generated data into new generated data, and obtaining an updated supplementary data set X'.
(5) The classifier training stage specifically comprises the following steps:
(5.1) constructing a neural network multi-classification discriminator D (x) combined by a multi-hidden layer and a softmax network layer, inputting p-dimensional data x, and outputting m + n +1 sample class labels y; wherein m items are large sample class labels, n items are small sample class labels, and the m + n +1 item is a generated pseudo data label;
(5.2) applying the training data setMixed with the supplementary data set X' and used as real data X-PdataThe generated data are represented as x to PGAnd inputting x into the multi-classification discriminator to obtain the probability value p (y | x) of softmax output corresponding to each class.
(5.3) constructing a loss function of the classification discriminator:
(5.4) updating network parameters through an error back propagation algorithm, and optimizing a classification discriminator model until the loss function of the discriminator is converged;
(6) the generator training stage specifically comprises the following steps:
(6.1) constructing loss functions for n independent generators in the small sample generator respectively, wherein the loss function of the ith generator is as follows:
and (6.2) updating network parameters through an error back propagation algorithm, and optimizing a generator model until the generated data can cheat the authenticity judgment of the discriminator, namely the loss function of the discriminator is converged.
(6.3) repeating the steps (2.1) - (6.2) until each sample sequence corresponding to the dynamic table of the supplementary data set is filled, and finishing training the resistance enhanced fault classifier.
(7) When new data needs to be subjected to fault classification, the data is input into a trained countermeasure enhancement fault classifier, the probability of the item y of the softmax network layer which is m + n +1 is ignored, the posterior probability of each fault category is obtained, and the data is matched with the category according to the maximum posterior probability to realize the fault classification of the data.
Compared with the prior art, the invention has the following beneficial effects: through continuous iteration of the process, the data generated by the generator gradually approaches to the real samples, the sample sequence corresponding to the dynamic table of the supplementary data set is updated by the generated data, and the small category eliminates the imbalance of the original data set through the supplement of the generated data. Meanwhile, high-quality data in the generated data of the generator is reserved through data screening, and the performance of the classifier is further improved. The countermeasure enhancement classification method provided by the invention is an end-to-end model and is a data enhancement method which more conveniently utilizes generated data.
Drawings
FIG. 1 is a flow chart of a training of a challenge enhanced fault classifier for industrial imbalance data;
FIG. 2 is a flow chart of the Tennessee Eastman (TE) process;
FIG. 3 is a comparison of classification results against robust fault classifiers and other oversampling methods.
Detailed Description
The present invention is further described in detail with reference to the accompanying drawings.
The countermeasure-enhancing fault classifier adopted by the invention is structurally divided into four parts, wherein the first part is a small sample generator: the second part is a data filter: screening the generated data based on the Mahalanobis distance of the principal component space, quantitatively storing screened data by a third part which is a dynamic table of a supplementary database and mixing the screened data with real data, and a fourth part which is a multi-classification discriminator: the method is formed by combining a multi-hidden-layer neural network and a softmax network layer, wherein the output of the last layer of the neural network and the output of the softmax network layer are m + n +1 items, wherein m is a large sample class number, and n is a small sample class number.
A confrontation enhancement fault classification method for industrial unbalance data specifically comprises the following steps:
(1) collecting offline data in a historical industrial process as an original data set X, wherein the X comprises m kinds of large sample data XbigAnd n kinds of small sample data XsmallI.e. X ═ Xbig,Xsmall}={(xi,yi) In which y isiE {1, 2., m + n }, i represents the number of samples, and the number of samples of m large samples is the same, while the number of samples of small sample data is less than that of large samples; the original data set X is preprocessed and converted into 0, 1 by linearizing the original data through a linear function]Range, resulting in a training data set
The training of the countermeasure enhancement fault classifier is a game countermeasure process and needs to be iterated circularly. An iteration cycle can be divided into 5 stages, and the specific flow is shown in fig. 1:
(2) the data generation stage specifically comprises the following substeps:
(2.1) the small sample generators construct structures of n generators for n kinds of small sample data of unbalanced data, each small sample generator is a fully-connected neural network with the same structure, the input of the fully-connected neural network is Gaussian noise z, and the dimension of the fully-connected neural network is p; the output is the characteristic of the generated data, and the dimension is q; the n generators are mutually independent to form a parallel multi-generator structure; before Gaussian noise z is input into a generator, initializing a hyper-parameter of each small sample generator network;
(2.2) Gaussian noise z with a mean value of 0 and a variance of 0.1 is generated by a random function. Inputting Gaussian noise z into the small sample generator, and outputting n groups of generated data G (z) { G } by the generator1,G2,…,Gn};
(3) The data screening stage specifically comprises the following steps:
(3.1) for the training data setPerforming principal component analysis to obtain a reference principal component matrix T1Respectively projecting the generated data G (z) to a reference pivot space to obtain a corresponding pivot matrix T2(ii) a Reference pivot matrix T1Determining a reference pivot matrix T by accumulating the variance percentage to 98%1The number of the main elements in (1); corresponding principal component matrix T2The number of principal elements in (1) and a reference principal element matrix T1The number of the main elements in the system is equal.
(3.2) separately aligning the reference pivot matrices T1Solving the Mahalanobis distance MD by the inner small sample data centroid1maxTo obtain the farthest distances under different categoriesBy said MD1maxDetermining the upper threshold value of data screening to be kMD1maxAnd k is a screening coefficient.
(3.3) for the corresponding principal component matrix T2And a reference pivot matrix T1The corresponding essence center of the small sample calculates the mahalanobis distanceWill MD2And kMD1maxMake a comparison ifThe generated data g (z) is considered to be close to the training data setIs the effective point Gvalid(ii) a If it isAt this time, the generated data G is considered to deviate from the training data setIs an outlier noise Ginvalid。
(3.4) bringing the effective Point GvalidExtracting from the generated data G (z) set, and giving the corresponding class labels y to the generated data G (z) set to remove outlier noise Ginvalid。
(4) The dynamic table stage of the supplementary database specifically comprises the following steps:
(4.1) adding the effective point GvalidImporting a dynamic table L of a supplementary database, wherein the dynamic table L is a sample sequence distributed to the generated data of each small sample class, namely a sample sequences; the length of each sample sequence in the supplementary database plus the number of real samples of the corresponding category small samples is equal to the number of real samples of each category large samples;
(4.2) when the accumulated number of the generated samples is smaller than the sequence length, continuously writing the generated samples into a dynamic table L of a supplementary database in the iteration process; and when the accumulated number of the generated samples is greater than or equal to the length of the sample sequence, eliminating the generated data at the end, writing the generated data into new generated data, and obtaining an updated supplementary data set X'.
(5) The classifier training stage specifically comprises the following steps:
(5.1) constructing a neural network multi-classification discriminator D (x) combined by a multi-hidden layer and a softmax network layer, inputting p-dimensional data x, and outputting m + n +1 sample class labels y; wherein m items are large sample class labels, n items are small sample class labels, and the m + n +1 item is a generated pseudo data label;
(5.2) applying the training data setMixed with the supplementary data set X' and used as real data X-PdataThe generated data are represented as x to PGAnd inputting x into the multi-classification discriminator to obtain the probability value p (y | x) of softmax output corresponding to each class.
(5.3) constructing a loss function of the classification discriminator:
(5.4) updating network parameters through an error back propagation algorithm, and optimizing a classification discriminator model until the loss function of the discriminator is converged;
(6) the generator training stage specifically comprises the following steps:
(6.1) constructing loss functions for n independent generators in the small sample generator respectively, wherein the loss function of the ith generator is as follows:
and (6.2) updating network parameters through an error back propagation algorithm, and optimizing a generator model until the generated data can cheat the authenticity judgment of the discriminator, namely the loss function of the discriminator is converged.
(6.3) repeating the steps (2.1) - (6.2) until each sample sequence corresponding to the dynamic table of the supplementary data set is filled, and finishing training the resistance enhanced fault classifier. Through continuous iteration of the process, the data generated by the generator gradually approaches to the real samples, the sample sequence corresponding to the dynamic table of the supplementary data set is updated by the generated data, the imbalance of the data set is solved through effective generated data supplement, and the classification performance of the multi-classification discriminator is improved in confrontation and training.
(7) When new data needs to be subjected to fault classification, the data is input into a trained countermeasure enhancement fault classifier, the probability of the item y of the softmax network layer which is m + n +1 is ignored, the posterior probability of each fault category is obtained, and the data is matched with the category according to the maximum posterior probability to realize the fault classification of the data.
Examples
The performance of the industrial imbalance data countermeasure enhancement fault classification discriminator is described below in conjunction with a specific TE process example. The TE process is a standard data set commonly used in the field of fault diagnosis and fault classification, and the whole data set includes 53 process variables, and the process flow thereof is shown in fig. 2. The process consists of 5 operation units, namely a gas-liquid separation tower, a continuous stirring type reaction kettle, a dephlegmator, a centrifugal compressor, a reboiler and the like, can be expressed by a plurality of algebraic and differential equations, and is mainly characterized by nonlinearity and strong coupling of the process sensing data.
The TE process sets 21 types of faults, wherein the 21 types of faults include 16 types of known faults and 5 types of unknown faults, the types of faults include step change of flow, slow ramp increase, viscosity of a valve and the like, and typical nonlinear faults and dynamic faults are included, normal data and five fault states are selected for research in the embodiment, and descriptions and corresponding ratios of different states are shown in table 1.
Table 1: fault list of the present embodiment
Numbering | Type (B) | State description | Number of |
0 | Is normal | Is free of | 1000 |
1 | Step fault | The A/C feed flow ratio was varied, the content of component B being kept constant (stream 4) | 50 |
2 | Step fault | The content of component B was varied and the A/C feed flow ratio was constant (stream 4) | 50 |
3 | Step fault | Loss of Material A (stream 1) | 50 |
4 | Random variable fault | The temperature of the cooling water inlet of the condenser changes | 20 |
5 | Unknown fault | Is unknown | 20 |
In this example, a total of 16 variables were selected for analysis, as shown in table 2.
Table 2: variable list of the present embodiment
Numbering | Measuring variable | Numbering | Measuring variable |
1 | A feed rate | 9 | Product separator temperature |
2 | D amount of feed | 10 | Humidity of product classifier |
3 | E amount of |
11 | Product separator bottoms flow |
4 | Total amount of |
12 | Pressure of stripper |
5 | Flow rate of recirculation | 13 | Stripper temperature |
6 | Reactor feed | 14 | Flow rate of gas stripper |
7 | Reactor temperature | 15 | Reactor cooling |
8 | Discharge velocity | 16 | Separator cooling water outlet temperature |
In this example, the number of generators of the small sample generator is 5, the number of hidden layers per generator is 2, the number of hidden layer nodes is 32 and 64, respectively, the optimizer used is ADAM, and the learning rate is 0.01. The number of hidden layers of the multi-classification discriminator is 2, the number of hidden layer nodes is 100 and 200 respectively, and the classification discriminator adopts an optimizer SGD to learn 0.1. Each time, a batch of data is selected for training, the batch size is 60, all samples are traversed in each period, and 100 periods are iterated.
100 samples from each state were selected as test data. Fig. 3 is a graph showing that the countermeasure enhancement discriminator and a common data oversampling method are counted and compared with the classification result (classification accuracy) of the neural network classifier, and it can be seen from the graph that the method has higher classification accuracy than the method of SMOTE + BPNN, smoteemann + SMOTE, and the superiority of the method is proved.
Claims (1)
1. The method for classifying the confrontation enhancement faults facing the industrial unbalance data is characterized by comprising the following steps:
(1) collecting offline data in a historical industrial process as an original data set X, wherein the X comprises m kinds of large sample data XbigAnd n kinds of small sample data XsmallI.e. X ═ Xbig,Xsmall}={(xi,yi) In which y isiE {1, 2., m + n }, i represents the number of samples, and the number of samples of m large samples is the same, while the number of samples of small sample data is less than that of large samples; the original data set X is preprocessed and converted into 0, 1 by linearizing the original data through a linear function]Range, resulting in a training data set
(2) The data generation stage specifically comprises the following substeps:
(2.1) the small sample generators construct structures of n generators for n kinds of small sample data of unbalanced data, each small sample generator is a fully-connected neural network with the same structure, the input of the fully-connected neural network is Gaussian noise z, and the dimension of the fully-connected neural network is p; the output is the characteristic of the generated data, and the dimension is q; the n generators are mutually independent to form a parallel multi-generator structure; before Gaussian noise z is input into a generator, initializing a hyper-parameter of each small sample generator network;
(2.2) Gaussian noise z with a mean value of 0 and a variance of 0.1 is generated by a random function. Inputting Gaussian noise z into the small sample generator, and outputting n groups of generated data G (z) { G } by the generator1,G2,…,Gn};
(3) The data screening stage specifically comprises the following steps:
(3.1) for the training data setPerforming principal component analysis to obtain a reference principal component matrix T1Respectively projecting the generated data G (z) to a reference pivot space to obtain a corresponding pivot matrix T2(ii) a Reference pivot matrix T1Determining a reference pivot matrix T by accumulating the variance percentage to 98%1The number of the main elements in (1); corresponding principal component matrix T2The number of principal elements in (1) and a reference principal element matrix T1The number of the main elements in the system is equal.
(3.2) separately aligning the reference pivot matrices T1Solving the Mahalanobis distance MD by the inner small sample data centroid1maxTo obtain the farthest distances under different categoriesBy said MD1maxDetermining the upper threshold value of data screening to be kMD1maxAnd k is a screening coefficient.
(3.3) for the corresponding principal component matrix T2And a reference pivot matrix T1The corresponding essence center of the small sample calculates the mahalanobis distanceWill MD2And kMD1maxMake a comparison ifThe generated data g (z) is considered to be close to the training data setIs the effective point Gvalid(ii) a If it isAt this time, the generated data G is considered to deviate from the training data setIs an outlier noise Ginvalid。
(3.4) bringing the effective Point GvalidExtracting from the generated data G (z) set, and giving the corresponding class labels y to the generated data G (z) set to remove outlier noise Ginvalid。
(4) The dynamic table stage of the supplementary database specifically comprises the following steps:
(4.1) adding the effective point GvalidImporting a dynamic table L of a supplementary database, wherein the dynamic table L is a sample sequence distributed to the generated data of each small sample class, namely a sample sequences; the length of each sample sequence in the supplementary database plus the number of real samples of the corresponding category small samples is equal to the number of real samples of each category large samples;
(4.2) when the accumulated number of the generated samples is smaller than the sequence length, continuously writing the generated samples into a dynamic table L of a supplementary database in the iteration process; and when the accumulated number of the generated samples is greater than or equal to the length of the sample sequence, eliminating the generated data at the end, writing the generated data into new generated data, and obtaining an updated supplementary data set X'.
(5) The classifier training stage specifically comprises the following steps:
(5.1) constructing a neural network multi-classification discriminator D (x) combined by a multi-hidden layer and a softmax network layer, inputting p-dimensional data x, and outputting m + n +1 sample class labels y; wherein m items are large sample class labels, n items are small sample class labels, and the m + n +1 item is a generated pseudo data label;
(5.2) applying the training data setMixed with the supplementary data set X' and used as real data X-PdataThe generated data are represented as x to PGAnd inputting x into the multi-classification discriminator to obtain the probability value p (y | x) of softmax output corresponding to each class.
(5.3) constructing a loss function of the classification discriminator:
(5.4) updating network parameters through an error back propagation algorithm, and optimizing a classification discriminator model until the loss function of the discriminator is converged;
(6) the generator training stage specifically comprises the following steps:
(6.1) constructing loss functions for n independent generators in the small sample generator respectively, wherein the loss function of the ith generator is as follows:
and (6.2) updating network parameters through an error back propagation algorithm, and optimizing a generator model until the generated data can cheat the authenticity judgment of the discriminator, namely the loss function of the discriminator is converged.
(6.3) repeating the steps (2.1) - (6.2) until each sample sequence corresponding to the dynamic table of the supplementary data set is filled, and finishing training the resistance enhanced fault classifier.
(7) When new data needs to be subjected to fault classification, the data is input into a trained countermeasure enhancement fault classifier, the probability of the item y of the softmax network layer which is m + n +1 is ignored, the posterior probability of each fault category is obtained, and the data is matched with the category according to the maximum posterior probability to realize the fault classification of the data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911369696.4A CN111240279B (en) | 2019-12-26 | 2019-12-26 | Confrontation enhancement fault classification method for industrial unbalanced data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911369696.4A CN111240279B (en) | 2019-12-26 | 2019-12-26 | Confrontation enhancement fault classification method for industrial unbalanced data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111240279A true CN111240279A (en) | 2020-06-05 |
CN111240279B CN111240279B (en) | 2021-04-06 |
Family
ID=70874084
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911369696.4A Active CN111240279B (en) | 2019-12-26 | 2019-12-26 | Confrontation enhancement fault classification method for industrial unbalanced data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111240279B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112328588A (en) * | 2020-11-27 | 2021-02-05 | 哈尔滨工程大学 | Industrial fault diagnosis unbalanced time sequence data expansion method |
WO2022166325A1 (en) * | 2021-02-05 | 2022-08-11 | 华为技术有限公司 | Multi-label class equalization method and device |
CN117932457A (en) * | 2024-03-22 | 2024-04-26 | 南京信息工程大学 | Model fingerprint identification method and system based on error classification |
CN118133212A (en) * | 2024-05-07 | 2024-06-04 | 国网福建省电力有限公司 | Industrial control anomaly detection method for complex uncertain unbalanced data set |
Citations (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6882992B1 (en) * | 1999-09-02 | 2005-04-19 | Paul J. Werbos | Neural networks for intelligent control |
DE10349094A1 (en) * | 2003-10-22 | 2005-05-25 | Rieter Ingolstadt Spinnereimaschinenbau Ag | Textile machine and method for improving the production process |
US20080010531A1 (en) * | 2006-06-12 | 2008-01-10 | Mks Instruments, Inc. | Classifying faults associated with a manufacturing process |
JP4312930B2 (en) * | 2000-06-09 | 2009-08-12 | 富士重工業株式会社 | Automobile failure diagnosis device |
CN102254177A (en) * | 2011-04-22 | 2011-11-23 | 哈尔滨工程大学 | Bearing fault detection method for unbalanced data SVM (support vector machine) |
US20140365195A1 (en) * | 2013-06-07 | 2014-12-11 | Scientific Design Company, Inc. | System and method for monitoring a process |
US8983882B2 (en) * | 2006-08-17 | 2015-03-17 | The United States Of America As Represented By The Administrator Of The National Aeronautics Space Administration | Autonomic and apoptopic systems in computing, robotics, and security |
WO2016144586A1 (en) * | 2015-03-11 | 2016-09-15 | Siemens Industry, Inc. | Prediction in building automation |
CN107239789A (en) * | 2017-05-09 | 2017-10-10 | 浙江大学 | A kind of industrial Fault Classification of the unbalanced data based on k means |
CN107657274A (en) * | 2017-09-20 | 2018-02-02 | 浙江大学 | A kind of y-bend SVM tree unbalanced data industry Fault Classifications based on k means |
CN107884706A (en) * | 2017-11-09 | 2018-04-06 | 合肥工业大学 | The analog-circuit fault diagnosis method approached based on vector value canonical kernel function |
CN108875771A (en) * | 2018-03-30 | 2018-11-23 | 浙江大学 | A kind of failure modes model and method being limited Boltzmann machine and Recognition with Recurrent Neural Network based on sparse Gauss Bernoulli Jacob |
CN109062177A (en) * | 2018-06-29 | 2018-12-21 | 无锡易通精密机械股份有限公司 | A kind of Trouble Diagnostic Method of Machinery Equipment neural network based and system |
CN109190665A (en) * | 2018-07-30 | 2019-01-11 | 国网上海市电力公司 | A kind of general image classification method and device based on semi-supervised generation confrontation network |
CN109800895A (en) * | 2019-01-18 | 2019-05-24 | 广东电网有限责任公司 | A method of based on augmented reality in the early warning of metering automation pipeline stall and maintenance |
CN109858352A (en) * | 2018-12-26 | 2019-06-07 | 华中科技大学 | A kind of method for diagnosing faults based on compressed sensing and the multiple dimensioned network of improvement |
US20190187688A1 (en) * | 2016-05-09 | 2019-06-20 | Strong Force Iot Portfolio 2016, Llc | Systems and methods for data collection and frequency analysis |
CN109977094A (en) * | 2019-01-30 | 2019-07-05 | 中南大学 | A method of the semi-supervised learning for structural data |
CN110059631A (en) * | 2019-04-19 | 2019-07-26 | 中铁第一勘察设计院集团有限公司 | The contactless monitoring defect identification method of contact net |
CN110070060A (en) * | 2019-04-26 | 2019-07-30 | 天津开发区精诺瀚海数据科技有限公司 | A kind of method for diagnosing faults of bearing apparatus |
CN110208660A (en) * | 2019-06-05 | 2019-09-06 | 国网江苏省电力有限公司电力科学研究院 | A kind of training method and device for power equipment shelf depreciation defect diagonsis |
US20190354094A1 (en) * | 2018-05-17 | 2019-11-21 | National Cheng Kung University | System and method that consider tool interaction effects for identifying root causes of yield loss |
CN110567720A (en) * | 2019-08-07 | 2019-12-13 | 东北电力大学 | method for diagnosing depth confrontation of fault of fan bearing under unbalanced small sample scene |
-
2019
- 2019-12-26 CN CN201911369696.4A patent/CN111240279B/en active Active
Patent Citations (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6882992B1 (en) * | 1999-09-02 | 2005-04-19 | Paul J. Werbos | Neural networks for intelligent control |
JP4312930B2 (en) * | 2000-06-09 | 2009-08-12 | 富士重工業株式会社 | Automobile failure diagnosis device |
DE10349094A1 (en) * | 2003-10-22 | 2005-05-25 | Rieter Ingolstadt Spinnereimaschinenbau Ag | Textile machine and method for improving the production process |
US20080010531A1 (en) * | 2006-06-12 | 2008-01-10 | Mks Instruments, Inc. | Classifying faults associated with a manufacturing process |
US8983882B2 (en) * | 2006-08-17 | 2015-03-17 | The United States Of America As Represented By The Administrator Of The National Aeronautics Space Administration | Autonomic and apoptopic systems in computing, robotics, and security |
CN102254177A (en) * | 2011-04-22 | 2011-11-23 | 哈尔滨工程大学 | Bearing fault detection method for unbalanced data SVM (support vector machine) |
US20140365195A1 (en) * | 2013-06-07 | 2014-12-11 | Scientific Design Company, Inc. | System and method for monitoring a process |
WO2016144586A1 (en) * | 2015-03-11 | 2016-09-15 | Siemens Industry, Inc. | Prediction in building automation |
US20190187688A1 (en) * | 2016-05-09 | 2019-06-20 | Strong Force Iot Portfolio 2016, Llc | Systems and methods for data collection and frequency analysis |
CN107239789A (en) * | 2017-05-09 | 2017-10-10 | 浙江大学 | A kind of industrial Fault Classification of the unbalanced data based on k means |
CN107657274A (en) * | 2017-09-20 | 2018-02-02 | 浙江大学 | A kind of y-bend SVM tree unbalanced data industry Fault Classifications based on k means |
CN107884706A (en) * | 2017-11-09 | 2018-04-06 | 合肥工业大学 | The analog-circuit fault diagnosis method approached based on vector value canonical kernel function |
CN108875771A (en) * | 2018-03-30 | 2018-11-23 | 浙江大学 | A kind of failure modes model and method being limited Boltzmann machine and Recognition with Recurrent Neural Network based on sparse Gauss Bernoulli Jacob |
US20190354094A1 (en) * | 2018-05-17 | 2019-11-21 | National Cheng Kung University | System and method that consider tool interaction effects for identifying root causes of yield loss |
CN109062177A (en) * | 2018-06-29 | 2018-12-21 | 无锡易通精密机械股份有限公司 | A kind of Trouble Diagnostic Method of Machinery Equipment neural network based and system |
CN109190665A (en) * | 2018-07-30 | 2019-01-11 | 国网上海市电力公司 | A kind of general image classification method and device based on semi-supervised generation confrontation network |
CN109858352A (en) * | 2018-12-26 | 2019-06-07 | 华中科技大学 | A kind of method for diagnosing faults based on compressed sensing and the multiple dimensioned network of improvement |
CN109800895A (en) * | 2019-01-18 | 2019-05-24 | 广东电网有限责任公司 | A method of based on augmented reality in the early warning of metering automation pipeline stall and maintenance |
CN109977094A (en) * | 2019-01-30 | 2019-07-05 | 中南大学 | A method of the semi-supervised learning for structural data |
CN110059631A (en) * | 2019-04-19 | 2019-07-26 | 中铁第一勘察设计院集团有限公司 | The contactless monitoring defect identification method of contact net |
CN110070060A (en) * | 2019-04-26 | 2019-07-30 | 天津开发区精诺瀚海数据科技有限公司 | A kind of method for diagnosing faults of bearing apparatus |
CN110208660A (en) * | 2019-06-05 | 2019-09-06 | 国网江苏省电力有限公司电力科学研究院 | A kind of training method and device for power equipment shelf depreciation defect diagonsis |
CN110567720A (en) * | 2019-08-07 | 2019-12-13 | 东北电力大学 | method for diagnosing depth confrontation of fault of fan bearing under unbalanced small sample scene |
Non-Patent Citations (5)
Title |
---|
LIXIANG DUAN: "Support vector data description for machinery multi-fault classification with unbalanced datasets", 《2016 IEEE INTERNATIONAL CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT (ICPHM)》 * |
祝志博: "基于支持向量数据描述的非高斯过程故障重构与诊断", 《中国博士学位论文全文数据库信息科技辑》 * |
葛志强: "负载工况过程统计监测方法研究", 《中国博士学位论文全文数据库信息科技辑》 * |
邓文凯: "不平衡数据分类研究及其在污水处理系统中的应用", 《中国优秀硕士学位论文全文数据库工程科技I辑》 * |
陈革成: "基于聚类的工业不平衡故障数据分类方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112328588A (en) * | 2020-11-27 | 2021-02-05 | 哈尔滨工程大学 | Industrial fault diagnosis unbalanced time sequence data expansion method |
WO2022166325A1 (en) * | 2021-02-05 | 2022-08-11 | 华为技术有限公司 | Multi-label class equalization method and device |
CN117932457A (en) * | 2024-03-22 | 2024-04-26 | 南京信息工程大学 | Model fingerprint identification method and system based on error classification |
CN117932457B (en) * | 2024-03-22 | 2024-05-28 | 南京信息工程大学 | Model fingerprint identification method and system based on error classification |
CN118133212A (en) * | 2024-05-07 | 2024-06-04 | 国网福建省电力有限公司 | Industrial control anomaly detection method for complex uncertain unbalanced data set |
Also Published As
Publication number | Publication date |
---|---|
CN111240279B (en) | 2021-04-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111240279B (en) | Confrontation enhancement fault classification method for industrial unbalanced data | |
CN108228716B (en) | SMOTE _ Bagging integrated sewage treatment fault diagnosis method based on weighted extreme learning machine | |
CN108875772B (en) | Fault classification model and method based on stacked sparse Gaussian Bernoulli limited Boltzmann machine and reinforcement learning | |
US7362892B2 (en) | Self-optimizing classifier | |
CN101903895B (en) | Method and apparatus for generating chemical toxicity prediction model | |
CN108875771B (en) | Fault classification model and method based on sparse Gaussian Bernoulli limited Boltzmann machine and recurrent neural network | |
CN106503867A (en) | A kind of genetic algorithm least square wind power forecasting method | |
CN110287983A (en) | Based on maximal correlation entropy deep neural network single classifier method for detecting abnormality | |
CN107239789A (en) | A kind of industrial Fault Classification of the unbalanced data based on k means | |
CN102750286A (en) | Novel decision tree classifier method for processing missing data | |
CN105760888A (en) | Neighborhood rough set ensemble learning method based on attribute clustering | |
CN109164794B (en) | Multivariable industrial process Fault Classification based on inclined F value SELM | |
CN114896228B (en) | Industrial data stream cleaning model and method based on filtering rule multistage combination optimization | |
CN113609480B (en) | Multipath learning intrusion detection method based on large-scale network flow | |
CN114357870A (en) | Metering equipment operation performance prediction analysis method based on local weighted partial least squares | |
CN107728476B (en) | SVM-forest based method for extracting sensitive data from unbalanced data | |
CN112837145A (en) | Customer credit classification method based on improved random forest | |
CN111488520B (en) | Crop planting type recommendation information processing device, method and storage medium | |
CN113222046A (en) | Feature alignment self-encoder fault classification method based on filtering strategy | |
CN103020864B (en) | Corn fine breed breeding method | |
CN116776245A (en) | Three-phase inverter equipment fault diagnosis method based on machine learning | |
CN115017978A (en) | Fault classification method based on weighted probability neural network | |
CN110909238B (en) | Association mining algorithm considering competition mode | |
El-Amin | Detection of Hydrogen Leakage Using Different Machine Learning Techniques | |
CN113657441A (en) | Classification algorithm based on weighted Pearson correlation coefficient and combined with feature screening |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |