CN111240279B - Confrontation enhancement fault classification method for industrial unbalanced data - Google Patents

Confrontation enhancement fault classification method for industrial unbalanced data Download PDF

Info

Publication number
CN111240279B
CN111240279B CN201911369696.4A CN201911369696A CN111240279B CN 111240279 B CN111240279 B CN 111240279B CN 201911369696 A CN201911369696 A CN 201911369696A CN 111240279 B CN111240279 B CN 111240279B
Authority
CN
China
Prior art keywords
data
generator
generated
samples
small sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911369696.4A
Other languages
Chinese (zh)
Other versions
CN111240279A (en
Inventor
葛志强
江肖禹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201911369696.4A priority Critical patent/CN111240279B/en
Publication of CN111240279A publication Critical patent/CN111240279A/en
Application granted granted Critical
Publication of CN111240279B publication Critical patent/CN111240279B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/418Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS], computer integrated manufacturing [CIM]
    • G05B19/41875Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS], computer integrated manufacturing [CIM] characterised by quality surveillance of production
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/31From computer integrated manufacturing till monitoring
    • G05B2219/31359Object oriented model for fault, quality control
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Abstract

The invention discloses an confrontation enhancement fault classification method for industrial unbalanced data, and belongs to the field of fault diagnosis and classification in industrial processes. According to the method, through countertraining between a multi-classification discriminator and a small sample generator, the small sample generator respectively generates data directionally for each type of unbalanced small samples, and through data screening based on the Mahalanobis distance of a principal element space, generated data closer to real data are obtained. A dynamic table of a supplementary database is established, real data are supplemented by utilizing a quantitative updating sample set to obtain a new data set, and the imbalance of industrial data is solved. The classification method couples the training processes of the generator and the multi-classification discriminator together, and more effectively utilizes computing resources.

Description

Confrontation enhancement fault classification method for industrial unbalanced data
Technical Field
The invention belongs to the field of industrial process fault diagnosis and classification, and particularly relates to an anti-enhancement fault classification method for industrial unbalanced data.
Background
With the development of modern industry, industrial data is accumulated in large quantities, and a basis is provided for data-driven process analysis. Among them, the fault diagnosis problem is a typical application of industrial data. Many data-driven methods, such as support vector machines, back-propagation neural networks, etc. algorithms have been widely used for fault classification in some industrial processes.
However, since the fault conditions occurring in the industry are rare, the collected fault data is very limited. Compared with a large amount of non-fault data, namely data under a normal condition, the proportion of fault data is low. In addition, there is an imbalance in the number of faults of different probabilities. This presents difficulties for algorithmic classification based on balanced data distribution. Therefore, the fault diagnosis in the industry is essentially a multi-classification problem of unbalanced data, and needs to be solved urgently.
Supplementing data from the data plane is the most direct way to solve the imbalance problem. The generation of the countermeasure network is a promising generation model at present, and is composed of a generator and a discriminator. Through the countertraining between the generator and the arbiter, the generator can generate data that spoofs the arbiter. Thus, generating data that is generated against the network can be applied to the complement of small samples. However, the training process for generating the countermeasure network is extremely unstable, and noise points deviating from real data or pattern collapse problems are easily generated, so that the authenticity of the generated data is influenced. And the generation of the confrontation network and the training of the classifier are two independent processes, so that the complexity of the model is increased and the waste of computing resources is reduced.
Disclosure of Invention
Aiming at the classification problem of the industrial unbalanced data, the invention provides an confrontation enhancement fault classification method for the industrial unbalanced data, which realizes accurate classification by utilizing a confrontation generation network structure of a small sample generator and a multi-classification discriminator.
The purpose of the invention is realized by the following technical scheme: a confrontation enhancement fault classification method for industrial unbalance data specifically comprises the following steps:
(1) collecting offline data in a historical industrial process as an original data set X, wherein the X comprises m kinds of large sample data XbigAnd n kinds of small sample data XsmallI.e. X ═ Xbig,Xsmall}={(xi,yi) In which y isiE {1, 2., m + n }, i represents the number of samples, and the number of samples of m large samples is the same, while the number of samples of small sample data is less than that of large samples; the original data set X is preprocessed and converted into 0, 1 by linearizing the original data through a linear function]Range, resulting in a training data set
Figure BDA0002339336610000011
(2) The data generation stage specifically comprises the following substeps:
(2.1) the small sample generators construct structures of n generators for n kinds of small sample data of unbalanced data, each small sample generator is a fully-connected neural network with the same structure, the input of the fully-connected neural network is Gaussian noise z, and the dimension of the fully-connected neural network is p; the output is the characteristic of the generated data, and the dimension is q; the n generators are mutually independent to form a parallel multi-generator structure; before Gaussian noise z is input into a generator, initializing a hyper-parameter of each small sample generator network;
(2.2) Gaussian noise z with a mean value of 0 and a variance of 0.1 is generated by a random function. Inputting Gaussian noise z into the small sample generator, and outputting n groups of generated data G (z) { G } by the generator1,G2,…,Gn};
(3) The data screening stage specifically comprises the following steps:
(3.1) For the training data set
Figure BDA0002339336610000025
Performing principal component analysis to obtain a reference principal component matrix T1Respectively projecting the generated data G (z) to a reference pivot space to obtain a corresponding pivot matrix T2(ii) a Reference pivot matrix T1Determining a reference pivot matrix T by accumulating the variance percentage to 98%1The number of the main elements in (1); corresponding principal component matrix T2The number of principal elements in (1) and a reference principal element matrix T1The number of the main elements in the system is equal.
(3.2) separately aligning the reference pivot matrices T1Solving the Mahalanobis distance MD by the inner small sample data centroid1maxTo obtain the farthest distances under different categories
Figure BDA0002339336610000021
By said MD1maxDetermining the upper threshold value of data screening to be kMD1maxAnd k is a screening coefficient.
(3.3) for the corresponding principal component matrix T2And a reference pivot matrix T1The corresponding essence center of the small sample calculates the mahalanobis distance
Figure BDA0002339336610000022
Will MD2And kMD1maxMaking a comparison if MD2<kMD1maxThen the generated data G (z) is considered to be close to the training data set
Figure BDA0002339336610000023
Is the effective point Gvalid(ii) a If MD2>kMD1maxWhen the generated data G deviates from the training data set
Figure BDA0002339336610000024
Is an outlier noise Ginvalid
(3.4) bringing the effective Point GvalidExtracting from the generated data G (z) set, and giving the corresponding class labels y to the generated data G (z) set to remove outlier noise Ginvalid
(4) The dynamic table stage of the supplementary database specifically comprises the following steps:
(4.1) adding the effective point GvalidImporting a dynamic table L of a supplementary database, wherein the dynamic table L is a sample sequence distributed to the generated data of each small sample class, namely a sample sequences; the length of each sample sequence in the supplementary database plus the number of real samples of the corresponding category small samples is equal to the number of real samples of each category large samples;
(4.2) when the accumulated number of the generated samples is smaller than the sequence length, continuously writing the generated samples into a dynamic table L of a supplementary database in the iteration process; and when the accumulated number of the generated samples is greater than or equal to the length of the sample sequence, eliminating the generated data at the end, writing the generated data into new generated data, and obtaining an updated supplementary data set X'.
(5) The classifier training stage specifically comprises the following steps:
(5.1) constructing a neural network multi-classification discriminator D (x) combined by a multi-hidden layer and a softmax network layer, inputting p-dimensional data x, and outputting m + n +1 sample class labels y; wherein m items are large sample class labels, n items are small sample class labels, and the m + n +1 item is a generated pseudo data label;
(5.2) applying the training data set
Figure BDA0002339336610000033
Mixed with the supplementary data set X' and used as real data X-PdataThe generated data are represented as x to PGAnd inputting x into the multi-classification discriminator to obtain the probability value p (y | x) of softmax output corresponding to each class.
(5.3) constructing a loss function of the classification discriminator:
Figure BDA0002339336610000031
(5.4) updating network parameters through an error back propagation algorithm, and optimizing a classification discriminator model until the loss function of the discriminator is converged;
(6) the generator training stage specifically comprises the following steps:
(6.1) constructing loss functions for n independent generators in the small sample generator respectively, wherein the loss function of the ith generator is as follows:
Figure BDA0002339336610000032
and (6.2) updating network parameters through an error back propagation algorithm, and optimizing a generator model until the generated data can cheat the authenticity judgment of the discriminator, namely the loss function of the discriminator is converged.
(6.3) repeating the steps (2.1) - (6.2) until each sample sequence corresponding to the dynamic table of the supplementary data set is filled, and finishing training the resistance enhanced fault classifier.
(7) When new data needs to be subjected to fault classification, the data is input into a trained countermeasure enhancement fault classifier, the probability of the item y of the softmax network layer which is m + n +1 is ignored, the posterior probability of each fault category is obtained, and the data is matched with the category according to the maximum posterior probability to realize the fault classification of the data.
Compared with the prior art, the invention has the following beneficial effects: through continuous iteration of the process, the data generated by the generator gradually approaches to the real samples, the sample sequence corresponding to the dynamic table of the supplementary data set is updated by the generated data, and the small category eliminates the imbalance of the original data set through the supplement of the generated data. Meanwhile, high-quality data in the generated data of the generator is reserved through data screening, and the performance of the classifier is further improved. The countermeasure enhancement classification method provided by the invention is an end-to-end model and is a data enhancement method which more conveniently utilizes generated data.
Drawings
FIG. 1 is a flow chart of a training of a challenge enhanced fault classifier for industrial imbalance data;
FIG. 2 is a flow chart of the Tennessee Eastman (TE) process;
FIG. 3 is a comparison of classification results against robust fault classifiers and other oversampling methods.
Detailed Description
The present invention is further described in detail with reference to the accompanying drawings.
The countermeasure-enhancing fault classifier adopted by the invention is structurally divided into four parts, wherein the first part is a small sample generator: the second part is a data filter: screening the generated data based on the Mahalanobis distance of the principal component space, quantitatively storing screened data by a third part which is a dynamic table of a supplementary database and mixing the screened data with real data, and a fourth part which is a multi-classification discriminator: the method is formed by combining a multi-hidden-layer neural network and a softmax network layer, wherein the output of the last layer of the neural network and the output of the softmax network layer are m + n +1 items, wherein m is a large sample class number, and n is a small sample class number.
A confrontation enhancement fault classification method for industrial unbalance data specifically comprises the following steps:
(1) collecting offline data in a historical industrial process as an original data set X, wherein the X comprises m kinds of large sample data XbigAnd n kinds of small sample data XsmallI.e. X ═ Xbig,Xsmall}={(xi,yi) In which y isiE {1, 2., m + n }, i represents the number of samples, and the number of samples of m large samples is the same, while the number of samples of small sample data is less than that of large samples; the original data set X is preprocessed and converted into 0, 1 by linearizing the original data through a linear function]Range, resulting in a training data set
Figure BDA0002339336610000041
The training of the countermeasure enhancement fault classifier is a game countermeasure process and needs to be iterated circularly. An iteration cycle can be divided into 5 stages, and the specific flow is shown in fig. 1:
(2) the data generation stage specifically comprises the following substeps:
(2.1) the small sample generators construct structures of n generators for n kinds of small sample data of unbalanced data, each small sample generator is a fully-connected neural network with the same structure, the input of the fully-connected neural network is Gaussian noise z, and the dimension of the fully-connected neural network is p; the output is the characteristic of the generated data, and the dimension is q; the n generators are mutually independent to form a parallel multi-generator structure; before Gaussian noise z is input into a generator, initializing a hyper-parameter of each small sample generator network;
(2.2) Gaussian noise z with a mean value of 0 and a variance of 0.1 is generated by a random function. Inputting Gaussian noise z into the small sample generator, and outputting n groups of generated data G (z) { G } by the generator1,G2,…,Gn};
(3) The data screening stage specifically comprises the following steps:
(3.1) for the training data set
Figure BDA0002339336610000042
Performing principal component analysis to obtain a reference principal component matrix T1Respectively projecting the generated data G (z) to a reference pivot space to obtain a corresponding pivot matrix T2(ii) a Reference pivot matrix T1Determining a reference pivot matrix T by accumulating the variance percentage to 98%1The number of the main elements in (1); corresponding principal component matrix T2The number of principal elements in (1) and a reference principal element matrix T1The number of the main elements in the system is equal.
(3.2) separately aligning the reference pivot matrices T1Solving the Mahalanobis distance MD by the inner small sample data centroid1maxTo obtain the farthest distances under different categories
Figure BDA0002339336610000043
By said MD1maxDetermining the upper threshold value of data screening to be kMD1maxAnd k is a screening coefficient.
(3.3) for the corresponding principal component matrix T2Root of Henan ginsengCooperation element matrix T1The corresponding essence center of the small sample calculates the mahalanobis distance
Figure BDA0002339336610000051
Will MD2And kMD1maxMake a comparison if
Figure BDA0002339336610000056
The generated data g (z) is considered to be close to the training data set
Figure BDA0002339336610000052
Is the effective point Gvalid(ii) a If it is
Figure BDA0002339336610000057
At this time, the generated data G is considered to deviate from the training data set
Figure BDA0002339336610000053
Is an outlier noise Ginvalid
(3.4) bringing the effective Point GvalidExtracting from the generated data G (z) set, and giving the corresponding class labels y to the generated data G (z) set to remove outlier noise Ginvalid
(4) The dynamic table stage of the supplementary database specifically comprises the following steps:
(4.1) adding the effective point GvalidImporting a dynamic table L of a supplementary database, wherein the dynamic table L is a sample sequence distributed to the generated data of each small sample class, namely a sample sequences; the length of each sample sequence in the supplementary database plus the number of real samples of the corresponding category small samples is equal to the number of real samples of each category large samples;
(4.2) when the accumulated number of the generated samples is smaller than the sequence length, continuously writing the generated samples into a dynamic table L of a supplementary database in the iteration process; and when the accumulated number of the generated samples is greater than or equal to the length of the sample sequence, eliminating the generated data at the end, writing the generated data into new generated data, and obtaining an updated supplementary data set X'.
(5) The classifier training stage specifically comprises the following steps:
(5.1) constructing a neural network multi-classification discriminator D (x) combined by a multi-hidden layer and a softmax network layer, inputting p-dimensional data x, and outputting m + n +1 sample class labels y; wherein m items are large sample class labels, n items are small sample class labels, and the m + n +1 item is a generated pseudo data label;
(5.2) applying the training data set
Figure BDA0002339336610000054
Mixed with the supplementary data set X' and used as real data X-PdataThe generated data are represented as x to PGAnd inputting x into the multi-classification discriminator to obtain the probability value p (y | x) of softmax output corresponding to each class.
(5.3) constructing a loss function of the classification discriminator:
Figure BDA0002339336610000058
wherein
Figure BDA0002339336610000055
X from n generators.
(5.4) updating network parameters through an error back propagation algorithm, and optimizing a classification discriminator model until the loss function of the discriminator is converged;
(6) the generator training stage specifically comprises the following steps:
(6.1) constructing loss functions for n independent generators in the small sample generator respectively, wherein the loss function of the ith generator is as follows:
Figure BDA0002339336610000061
and (6.2) updating network parameters through an error back propagation algorithm, and optimizing a generator model until the generated data can cheat the authenticity judgment of the discriminator, namely the loss function of the discriminator is converged.
(6.3) repeating the steps (2.1) - (6.2) until each sample sequence corresponding to the dynamic table of the supplementary data set is filled, and finishing training the resistance enhanced fault classifier. Through continuous iteration of the process, the data generated by the generator gradually approaches to the real samples, the sample sequence corresponding to the dynamic table of the supplementary data set is updated by the generated data, the imbalance of the data set is solved through effective generated data supplement, and the classification performance of the multi-classification discriminator is improved in confrontation and training.
(7) When new data needs to be subjected to fault classification, the data is input into a trained countermeasure enhancement fault classifier, the probability of the item y of the softmax network layer which is m + n +1 is ignored, the posterior probability of each fault category is obtained, and the data is matched with the category according to the maximum posterior probability to realize the fault classification of the data.
Examples
The performance of the industrial imbalance data countermeasure enhancement fault classification discriminator is described below in conjunction with a specific TE process example. The TE process is a standard data set commonly used in the field of fault diagnosis and fault classification, and the whole data set includes 53 process variables, and the process flow thereof is shown in fig. 2. The process consists of 5 operation units, namely a gas-liquid separation tower, a continuous stirring type reaction kettle, a dephlegmator, a centrifugal compressor, a reboiler and the like, can be expressed by a plurality of algebraic and differential equations, and is mainly characterized by nonlinearity and strong coupling of the process sensing data.
The TE process sets 21 types of faults, wherein the 21 types of faults include 16 types of known faults and 5 types of unknown faults, the types of faults include step change of flow, slow ramp increase, viscosity of a valve and the like, and typical nonlinear faults and dynamic faults are included, normal data and five fault states are selected for research in the embodiment, and descriptions and corresponding ratios of different states are shown in table 1.
Table 1: fault list of the present embodiment
Numbering Type (B) State description Number of
0 Is normal Is free of 1000
1 Step fault The A/C feed flow ratio was varied, the content of component B being kept constant (stream 4) 50
2 Step fault The content of component B was varied and the A/C feed flow ratio was constant (stream 4) 50
3 Step fault Loss of Material A (stream 1) 50
4 Random variable fault The temperature of the cooling water inlet of the condenser changes 20
5 Unknown fault Is unknown 20
In this example, a total of 16 variables were selected for analysis, as shown in table 2.
Table 2: variable list of the present embodiment
Numbering Measuring variable Numbering Measuring variable
1 A feed rate 9 Product separator temperature
2 D amount of feed 10 Humidity of product classifier
3 E amount of feed 11 Product separator bottoms flow
4 Total amount of feed 12 Pressure of stripper
5 Flow rate of recirculation 13 Stripper temperature
6 Reactor feed 14 Flow rate of gas stripper
7 Reactor temperature 15 Reactor cooling water outlet temperature
8 Discharge velocity 16 Separator cooling water outlet temperature
In this example, the number of generators of the small sample generator is 5, the number of hidden layers per generator is 2, the number of hidden layer nodes is 32 and 64, respectively, the optimizer used is ADAM, and the learning rate is 0.01. The number of hidden layers of the multi-classification discriminator is 2, the number of hidden layer nodes is 100 and 200 respectively, and the classification discriminator adopts an optimizer SGD to learn 0.1. Each time, a batch of data is selected for training, the batch size is 60, all samples are traversed in each period, and 100 periods are iterated.
100 samples from each state were selected as test data. Fig. 3 is a graph showing that the countermeasure enhancement discriminator and a common data oversampling method are counted and compared with the classification result (classification accuracy) of the neural network classifier, and it can be seen from the graph that the method has higher classification accuracy than the method of SMOTE + BPNN, smoteemann + SMOTE, and the superiority of the method is proved.

Claims (1)

1. The method for classifying the confrontation enhancement faults facing the industrial unbalance data is characterized by comprising the following steps:
(1) collecting offline data in a historical industrial process as an original data set X, wherein the X comprises m kinds of large sample data XbigAnd n kinds of small sample data XsmallI.e. X ═ Xbig,Xsmall}={(xi,yi) In which y isiE {1, 2., m + n }, i represents the number of samples, and the number of samples of m large samples is the same, while the number of samples of small sample data is less than that of large samples; the original data set X is preprocessed and converted into 0, 1 by linearizing the original data through a linear function]Range, resulting in a training data set
Figure FDA0002798849980000011
(2) The data generation stage specifically comprises the following substeps:
(2.1) the small sample generators construct structures of n generators for n kinds of small sample data of unbalanced data, each small sample generator is a fully-connected neural network with the same structure, the input of the fully-connected neural network is Gaussian noise z, and the dimension of the fully-connected neural network is p; the output is the characteristic of the generated data, and the dimension is q; the n generators are mutually independent to form a parallel multi-generator structure; before Gaussian noise z is input into a generator, initializing a hyper-parameter of each small sample generator network;
(2.2) generating a Gaussian noise z with a mean value of 0 and a variance of 0.1 by a random function; inputting Gaussian noise z into the small sample generator, and outputting n groups of generated data G (z) { G } by the generator1,G2,…,Gn};
(3) The data screening stage specifically comprises the following steps:
(3.1) for the training data set
Figure FDA0002798849980000018
Performing principal component analysis to obtain a reference principal component matrix T1Respectively projecting the generated data G (z) to a reference pivot space to obtain a corresponding pivot matrix T2(ii) a Reference pivot matrix T1Determining a reference pivot matrix T by accumulating the variance percentage to 98%1The number of the main elements in (1); corresponding principal component matrix T2The number of principal elements in (1) and a reference principal element matrix T1The number of the main elements in the tree is equal;
(3.2) separately aligning the reference pivot matrices T1Solving the Mahalanobis distance MD by the inner small sample data centroid1maxTo obtain the farthest distances under different categories
Figure FDA0002798849980000012
By said MD1maxDetermining the upper threshold value of data screening to be kMD1maxK is a screening coefficient;
(3.3) for the corresponding principal component matrix T2And a reference pivot matrix T1The corresponding essence center of the small sample calculates the mahalanobis distance
Figure FDA0002798849980000013
Will MD2And kMD1maxMake a comparison if
Figure FDA0002798849980000014
The generated data g (z) is considered to be close to the training data set
Figure FDA0002798849980000017
Is the effective point Gvalid(ii) a If it is
Figure FDA0002798849980000015
At this time, the generated data G (z) is considered to deviate from the training data set
Figure FDA0002798849980000016
Is an outlier noise Ginvalid
(3.4) bringing the effective Point GvalidExtracting from the generated data G (z) set, and giving the corresponding class labels y to the generated data G (z) set to remove outlier noise Ginvalid
(4) The dynamic table stage of the supplementary database specifically comprises the following steps:
(4.1) adding the effective point GvalidImporting a dynamic table L of a supplementary database, wherein the dynamic table L is a sample sequence distributed to the generated data of each small sample class, namely a sample sequences; the length of each sample sequence in the supplementary database plus the number of real samples of the corresponding category small samples is equal to the number of real samples of each category large samples;
(4.2) when the accumulated number of the generated samples is smaller than the sequence length, continuously writing the generated samples into a dynamic table L of a supplementary database in the iteration process; when the accumulated number of the generated samples is larger than or equal to the length of the sample sequence, eliminating the generated data at the tail, and writing the generated data into new generated data to obtain an updated supplementary data set X';
(5) the classifier training stage specifically comprises the following steps:
(5.1) constructing a neural network multi-classification discriminator D (x) combined by a multi-hidden layer and a softmax network layer, inputting p-dimensional data x, and outputting m + n +1 sample class labels y; wherein m items are large sample class labels, n items are small sample class labels, and the m + n +1 item is a generated pseudo data label;
(5.2) applying the training data set
Figure FDA0002798849980000023
Mixed with the supplementary data set X' and used as real data X-PdataThe generated data are represented as x to PGInputting x into a multi-classification discriminator to obtain a probability value p (y | x) of softmax output corresponding to each class;
(5.3) constructing a loss function of the classification discriminator:
Figure FDA0002798849980000021
(5.4) updating network parameters through an error back propagation algorithm, and optimizing a classification discriminator model until the loss function of the discriminator is converged;
(6) the generator training stage specifically comprises the following steps:
(6.1) constructing loss functions for n independent generators in the small sample generator respectively, wherein the loss function of the ith generator is as follows:
Figure FDA0002798849980000022
(6.2) updating network parameters through an error back propagation algorithm, and optimizing a generator model until the generated data can trick the authenticity judgment of the discriminator, namely the loss function of the generator is converged;
(6.3) repeating the steps (2.1) - (6.2) until each sample sequence corresponding to the dynamic table of the supplementary data set is filled, and finishing training the anti-enhancement fault classifier;
(7) when new data needs to be subjected to fault classification, the data is input into a trained countermeasure enhancement fault classifier, the probability of the item y of the softmax network layer which is m + n +1 is ignored, the posterior probability of each fault category is obtained, and the data is matched with the category according to the maximum posterior probability to realize the fault classification of the data.
CN201911369696.4A 2019-12-26 2019-12-26 Confrontation enhancement fault classification method for industrial unbalanced data Active CN111240279B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911369696.4A CN111240279B (en) 2019-12-26 2019-12-26 Confrontation enhancement fault classification method for industrial unbalanced data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911369696.4A CN111240279B (en) 2019-12-26 2019-12-26 Confrontation enhancement fault classification method for industrial unbalanced data

Publications (2)

Publication Number Publication Date
CN111240279A CN111240279A (en) 2020-06-05
CN111240279B true CN111240279B (en) 2021-04-06

Family

ID=70874084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911369696.4A Active CN111240279B (en) 2019-12-26 2019-12-26 Confrontation enhancement fault classification method for industrial unbalanced data

Country Status (1)

Country Link
CN (1) CN111240279B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328588B (en) * 2020-11-27 2022-07-15 哈尔滨工程大学 Industrial fault diagnosis unbalanced time sequence data expansion method
CN114881096A (en) * 2021-02-05 2022-08-09 华为技术有限公司 Multi-label class balancing method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190665A (en) * 2018-07-30 2019-01-11 国网上海市电力公司 A kind of general image classification method and device based on semi-supervised generation confrontation network
CN109977094A (en) * 2019-01-30 2019-07-05 中南大学 A method of the semi-supervised learning for structural data

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6882992B1 (en) * 1999-09-02 2005-04-19 Paul J. Werbos Neural networks for intelligent control
JP4312930B2 (en) * 2000-06-09 2009-08-12 富士重工業株式会社 Automobile failure diagnosis device
DE10349094A1 (en) * 2003-10-22 2005-05-25 Rieter Ingolstadt Spinnereimaschinenbau Ag Textile machine and method for improving the production process
US20080010531A1 (en) * 2006-06-12 2008-01-10 Mks Instruments, Inc. Classifying faults associated with a manufacturing process
US8983882B2 (en) * 2006-08-17 2015-03-17 The United States Of America As Represented By The Administrator Of The National Aeronautics Space Administration Autonomic and apoptopic systems in computing, robotics, and security
CN102254177B (en) * 2011-04-22 2013-06-05 哈尔滨工程大学 Bearing fault detection method for unbalanced data SVM (support vector machine)
US9892238B2 (en) * 2013-06-07 2018-02-13 Scientific Design Company, Inc. System and method for monitoring a process
CA2979193C (en) * 2015-03-11 2021-09-14 Siemens Industry, Inc. Diagnostics in building automation
US11646808B2 (en) * 2016-05-09 2023-05-09 Strong Force Iot Portfolio 2016, Llc Methods and systems for adaption of data storage and communication in an internet of things downstream oil and gas environment
CN107239789A (en) * 2017-05-09 2017-10-10 浙江大学 A kind of industrial Fault Classification of the unbalanced data based on k means
CN107657274A (en) * 2017-09-20 2018-02-02 浙江大学 A kind of y-bend SVM tree unbalanced data industry Fault Classifications based on k means
CN107884706B (en) * 2017-11-09 2020-04-07 合肥工业大学 Analog circuit fault diagnosis method based on vector value regular kernel function approximation
CN108875771B (en) * 2018-03-30 2020-04-10 浙江大学 Fault classification model and method based on sparse Gaussian Bernoulli limited Boltzmann machine and recurrent neural network
TWI660322B (en) * 2018-05-17 2019-05-21 國立成功大學 System and method that consider tool interaction effects for identifying root causes of yield loss
CN109062177A (en) * 2018-06-29 2018-12-21 无锡易通精密机械股份有限公司 A kind of Trouble Diagnostic Method of Machinery Equipment neural network based and system
CN109858352B (en) * 2018-12-26 2020-09-18 华中科技大学 Fault diagnosis method based on compressed sensing and improved multi-scale network
CN109800895A (en) * 2019-01-18 2019-05-24 广东电网有限责任公司 A method of based on augmented reality in the early warning of metering automation pipeline stall and maintenance
CN110059631B (en) * 2019-04-19 2020-04-03 中铁第一勘察设计院集团有限公司 Contact net non-contact type monitoring defect identification method
CN110070060B (en) * 2019-04-26 2021-06-04 天津开发区精诺瀚海数据科技有限公司 Fault diagnosis method for bearing equipment
CN110208660B (en) * 2019-06-05 2021-07-27 国网江苏省电力有限公司电力科学研究院 Training method and device for diagnosing partial discharge defects of power equipment
CN110567720B (en) * 2019-08-07 2020-12-18 东北电力大学 Method for diagnosing depth confrontation of fault of fan bearing under unbalanced small sample scene

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190665A (en) * 2018-07-30 2019-01-11 国网上海市电力公司 A kind of general image classification method and device based on semi-supervised generation confrontation network
CN109977094A (en) * 2019-01-30 2019-07-05 中南大学 A method of the semi-supervised learning for structural data

Also Published As

Publication number Publication date
CN111240279A (en) 2020-06-05

Similar Documents

Publication Publication Date Title
CN108228716B (en) SMOTE _ Bagging integrated sewage treatment fault diagnosis method based on weighted extreme learning machine
CN110033021B (en) Fault classification method based on one-dimensional multipath convolutional neural network
US7362892B2 (en) Self-optimizing classifier
CN108875772B (en) Fault classification model and method based on stacked sparse Gaussian Bernoulli limited Boltzmann machine and reinforcement learning
CN112288191B (en) Ocean buoy service life prediction method based on multi-class machine learning method
CN108875771B (en) Fault classification model and method based on sparse Gaussian Bernoulli limited Boltzmann machine and recurrent neural network
CN111240279B (en) Confrontation enhancement fault classification method for industrial unbalanced data
CN105760888B (en) A kind of neighborhood rough set integrated learning approach based on hierarchical cluster attribute
CN110287983A (en) Based on maximal correlation entropy deep neural network single classifier method for detecting abnormality
CN112087447B (en) Rare attack-oriented network intrusion detection method
CN109781411A (en) A kind of combination improves the Method for Bearing Fault Diagnosis of sparse filter and KELM
CN103914705A (en) Hyperspectral image classification and wave band selection method based on multi-target immune cloning
CN102750286A (en) Novel decision tree classifier method for processing missing data
CN107239789A (en) A kind of industrial Fault Classification of the unbalanced data based on k means
CN115048874A (en) Aircraft design parameter estimation method based on machine learning
Raju et al. Predicting the outcome of english premier league matches using machine learning
CN111738086B (en) Composition method and system for point cloud segmentation and point cloud segmentation system and device
Shen et al. A novel meta learning framework for feature selection using data synthesis and fuzzy similarity
CN107728476B (en) SVM-forest based method for extracting sensitive data from unbalanced data
CN114896228B (en) Industrial data stream cleaning model and method based on filtering rule multistage combination optimization
CN114997378A (en) Inductive graph neural network pruning method, system, device and storage medium
CN115017978A (en) Fault classification method based on weighted probability neural network
CN112837145A (en) Customer credit classification method based on improved random forest
CN110909238B (en) Association mining algorithm considering competition mode
CN115017125B (en) Data processing method and device for improving KNN method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant