CN107239789A - A kind of industrial Fault Classification of the unbalanced data based on k means - Google Patents
A kind of industrial Fault Classification of the unbalanced data based on k means Download PDFInfo
- Publication number
- CN107239789A CN107239789A CN201710321424.1A CN201710321424A CN107239789A CN 107239789 A CN107239789 A CN 107239789A CN 201710321424 A CN201710321424 A CN 201710321424A CN 107239789 A CN107239789 A CN 107239789A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- data
- sample
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of method of the unbalanced data classification based on k means.This method is first by k means, and clustered according to the degree of unbalancedness class more to data, it is divided into N number of subclass by more several classes of, then with M minority class altogether, as many classification problems of (M+N) class, learnt finally according to Naive Bayes Classifier.Compared to other existing methods, the present invention is farthest remaining the information of former data, and prevents from preferably resolving the problem of uneven class data are classified on the premise of over-fitting, compared to other methods, nicety of grading is added, and reduces the generation of over-fitting.
Description
Technical field
The invention belongs to the industrial process failure modes side of industrial process control field, more particularly to uneven class data
Method.
Background technology
In the work of industrial failure modes, some conventional sorting techniques can all have a use premise, i.e., in training
Concentrate the data volume of Various types of data suitable.But the situation of reality is frequently not so, when a certain class data are many or a certain
Class data seldom, i.e., when uneven class data occur, directly can then produce very big error in classification using traditional sorting technique.
In recent years, the research of uneven class data is always a focus, and existing method is mainly gone from both direction
Solve, one is that, from algorithm aspect, one is, from sampling aspect, to enter present invention is generally directed to sample level in face of conventional sorting methods
Row is improved.Improved method for sampling is broadly divided into two classes, and a class is over-sampling, i.e., to minority class resampling to reach data
Balance, a big drawback of such a method can exactly produce increase systems approach, produce over-fitting, practical application effect is not
It is highly desirable;Another kind of is lack sampling, i.e., according to certain rule choose it is more several classes of in a part as training data, other
Data then give up without, reach the balance of data with this, such a method due to have ignored the more several classes of data messages of a part,
It can then cause the grader precision trained inadequate.It is an advantage of the present invention that both not changing former data sample
Structure, also do not give up or artificially increase sample data on the premise of, train the preferable grader of effect.
The content of the invention
In view of the above-mentioned deficiencies in the prior art, it is an object of the present invention to provide a kind of unbalanced data work based on k-means
Industry Fault Classification.
The purpose of the present invention is achieved through the following technical solutions:A kind of unbalanced data work based on k-means
Industry Fault Classification, comprises the following steps:
(1) there is label instruction using what the data of systematic collection process nominal situation and various fault datas constituted modeling
Practice sample set:Assuming that fault category is C, it is C+1 in total classification plus a normal class, each sample data, i.e.,WhereinniFor number of training, m is process variable number, and R is set of real numbers.
So the complete label training sample set that has is, Xl=[X1;X2;...;XC+1], record the label information of all data, normal work
It is 1 that label is marked under condition, and the label of failure 1 is 2, by that analogy, i.e. Yi=[i, i ... i], i=1,2 ..., C+1, complete
Tally set is Y=[Y1;Y2;...;YC+1].Wherein normal class data x1To be more several classes of, remainder data is minority class, uneven
Spend for N=100, and assume the data volume difference of failure classes data less, i.e.,
(2) k-means clustering methods are used, by X1It is divided into N number of subset that quantity is more or less the same, i.e. X1=[X11;
X12;...;X1N], and new label Y is assigned respectively1=[Y11;Y12;...;Y1N];
(3) N number of subclass in (2) is combined with C failure classes data, many classification as (N+C) class are asked
The training set of topic, grader is set up using Nae Bayesianmethod.
(4) grader in (3) is tested using test set, and label is belonged into Y1Whole be classified as normal class.
The beneficial effects of the invention are as follows:Method of the present invention by being clustered to most classes, i.e., to data sample process it
Afterwards, can preferably solve the problem of unbalanced data is classified, while the not internal structure of change data, also do not increase or
Data are reduced, the characteristic information of former data sample is farthest ensure that, compared to other methods, classification essence are added
Degree, and reduce the generation of over-fitting.
Brief description of the drawings
Fig. 1 is the result schematic diagram that Bayes is directly handled;
Fig. 2 is the Bayes result schematic diagrames based on k-means.
Embodiment
The present invention is directed to the failure modes problem of industrial process, and this method is first by k-means, and according to degree of unbalancedness
The class more to data is clustered, and is divided into N number of subclass by more several classes of, then with M minority class altogether, as (a M+
N) many classification problems of class, are learnt finally according to Naive Bayes Classifier.
The key step difference of the technical solution adopted by the present invention is as follows:
The first step:There is mark using the data of systematic collection process nominal situation and the composition modeling of various fault datas
Sign training sample set:Assuming that fault category is C, it is C+1 in total classification plus a normal class, each sample data,
I.e.I=1,2...C+1 is whereinniFor number of training, m is process variable number, and R is real
Manifold.So the complete label training sample set that has is, Xl=[X1;X2;...;XC+1], the label information of all data is recorded,
It is 1 that label is marked under nominal situation, and the label of failure 1 is 2, by that analogy, i.e. Yi=[i, i ... i], i=1,2 ..., C+1,
Complete tally set is Y=[Y1;Y2;...;YC+1].Wherein normal class data x1To be more several classes of, remainder data is minority class,
Degree of unbalancedness is N=100, and assumes that the data volume difference of failure classes data is little, i.e.,
Second step:Using k-means clustering methods, by X1It is divided into N number of subset that quantity is more or less the same, i.e. X1=[X11;
X12;...;X1N], and new label Y is assigned respectively1=[Y11;Y12;...;Y1N]。
(a) in order to by X1It is divided into N number of class, chooses N number of suitable initial mean value vector as the initial mean value of each classification
Vector, i.e.,OrderWherein a=1,2 ..., N.
(b) each sample of calculating is calculated as follows respectivelyThe distance between with these mean vectors,
Wherein j=1,2 ..., n1, the Euclidean distance between j-th of sample and a-th of mean vector is:
Wherein j=1,2 ..., n1, a=1,2 ..., N.For sample xjIf, djaMinimum, then by xjIt is included in a classes.
(c) in order to avoid the result data difference for occurring clustering is larger, it is impossible to reach that the purpose situation of cluster occurs, we
A threshold k is added in (b), after the data amount check of a classes has reached K, the distance after this wheel relatively in by dja
Remove, do not consider, then this wheel will not increase data to a classes again, until the calculating of next round.
(d) after G iteration, N number of subclass, i.e. X are obtained1=[X11;X12;...;X1N], and successively by each subclass
Sample label be replaced by 1,2 .., N, obtain Y1=[1,2 ..., N].And simultaneously successively change failure classes data label,
Make Yi=[b, b ..., b], wherein b=N+1, N+2..., N+C.Then training set now is X=[X1;X2;...;XN+C], and
IfI=1,2...C+N, wherein niFor the number of samples of the i-th class sample.Equally make each sample dataI=1,2 ..., C+N.
3rd step:N number of subclass in second is combined with C failure classes data, as many of (N+C) class
The training set of classification problem, grader is set up using Nae Bayesianmethod.
(a) the average Mean of each dimension data in each classification is calculated respectivelyicAnd variance VaricAll kinds of priori is general
Rate pi, calculating formula is as follows:
Wherein i=1,2 ..., C+N, c=1,2 ..., m.
(b) according to Naive Bayes Classification principle, for a test set containing U sample each sample z thereink
=[zk1,zk2,...,zkm], calculate its posterior probability p for belonging to each classificationki, calculating formula is as follows:
Wherein k=1,2 ..., U;I=1,2 ..., C+N.
According to the posterior probability calculated, and to the class label of sample imparting maximum of which probability.
4th step:Classification training set for having divided label in the 3rd step, is 1 data sample for arriving N by label
Label is changed to 1 again, i.e., normal class classification, and label is changed into 2 respectively for N+1 to N+C data sample label arrives C+1,
Complete the test of grader.
Illustrate effectiveness of the invention below in conjunction with the example of a specific industrial process.The data of the process come from
U.S. TE (Tennessee Eastman --- Tennessee-Yi Siman) chemical process is tested, and prototype is Eastman chemical companies
An actual process flow.At present, TE processes oneself through extensive as typical chemical process fault detection and diagnosis object
Research.Whole TE processes include 41 measurands and 12 performance variables (control variable), wherein 41 measurands include
22 continuous measurands and 19 composition measurement values, they are sampled once for every 3 minutes.Including 21 batches of fault datas.
In these failures, 16 are that oneself knows, 5 are unknown.Failure 1-7 is relevant with the Spline smoothing of process variable, such as cooling water
Inlet temperature or feed constituents change.The changeability of failure 8-12 and some process variables, which increases, to matter a lot.Failure 13
It is the slow drift in kinetics, failure 14,15 and 21 is relevant with sticking valve.Failure 16-20 is unknown.In order to
The process is monitored, 44 process variables are have chosen altogether, as shown in table 1.Next the detailed process is combined to this hair
Bright implementation steps are set forth in:
1st, collection normal data and 4 kinds of fault datas carry out data prediction and normalization as training sample data.
Nominal situation and failure 1,2,6,14 are have selected in this experiment respectively as training sample, failure 1 and failure 2 are flowed in 4
Composition transfer.Failure 6 is the A compositions generation shadow as caused by the A charging losses in stream 1, but eventually in convection current 4
Ring.14 product separator bottom of towe flows of failure.Sampling time is 3min, and wherein nominal situation contains 1000 samples of exemplar
This, remaining failure modes has selected exemplar 10 respectively.
2nd, it is 100 classes according to k-means points by nominal situation data sample, and ensures the quantitative difference between class and class not
Greatly.Then Nae Bayesianmethod is used, the training set for having 104 classes altogether plus 4 class fault datas is learnt.
3rd, online classification is tested, and the sample data for belonging to preceding 100 class is regular for normal class, and resets 4 failure classes
Data label.
Table 1:Monitor variable declaration
Variable is numbered | Measurand | Variable is numbered | Measurand |
1 | A feed rates | 22 | Separator cooling water outlet temperature |
2 | D feed rates | 23 | A molar contents in logistics 6 |
3 | E feed rates | 24 | B molar contents in logistics 6 |
4 | A+C feed rates | 25 | C molar contents in logistics 6 |
5 | Recirculating mass | 26 | D molar contents in logistics 6 |
6 | Reactor feed flow velocity | 27 | E molar contents in logistics 6 |
7 | Reactor pressure | 28 | F molar contents in logistics 6 |
8 | Reactor grade | 29 | A molar contents in logistics 9 |
9 | Temperature of reactor | 30 | B molar contents in logistics 9 |
10 | Mass rate of emission | 31 | C molar contents in logistics 9 |
11 | Product separator temperature | 32 | D molar contents in logistics 9 |
12 | Product separator grade | 33 | E molar contents in logistics 9 |
13 | Product separator temperature | 34 | F molar contents in logistics 9 |
14 | Product separator bottom of towe flow | 35 | G molar contents in logistics 9 |
15 | Stripper grade | 36 | H molar contents in logistics 9 |
16 | Pressure of stripping tower | 37 | D molar contents in logistics 11 |
17 | Stripper bottom of towe flow | 38 | E molar contents in logistics 11 |
18 | Stripper temperature | 39 | F molar contents in logistics 11 |
19 | Stripper flow | 40 | G molar contents in logistics 11 |
20 | Compressor horsepower | 41 | H molar contents in logistics 11 |
21 | Reactor cooling water outlet temperature |
Above-described embodiment is used for illustrating the present invention, rather than limits the invention, the present invention spirit and
In scope of the claims, any modifications and changes made to the present invention both fall within protection scope of the present invention.
Claims (4)
1. a kind of industrial Fault Classification of the unbalanced data based on k-means, it is characterised in that comprise the following steps:
(1) there is label training sample using what the data of systematic collection process nominal situation and various fault datas constituted modeling
This collection:Assuming that fault category is C, along with a normal class, total classification of each sample data is C+1, i.e.,Wherein,niFor number of training, m is process variable number, and R is real number
Collection.So complete has label training sample set X=[X1;X2;...;XC+1], record the label information of all data.Normal work
It is 1 that label is marked under condition, and the label of failure 1 is 2, by that analogy, i.e. tally set Yi=[i, i ... i], i=1,2 ..., C+1,
Complete tally set is Y=[Y1;Y2;...;YC+1].Wherein normal class data X1To be more several classes of, remainder data is minority class,
Degree of unbalancedness is N=100, and assumes that the data volume difference of failure classes data is little, i.e.,
(2) k-means clustering methods are used, by X1It is divided into N number of subset that quantity is more or less the same, i.e. X1=[X11;X12;...;
X1N], and new label Y is assigned respectively1=[Y11;Y12;...;Y1N]。
(3) N number of subclass in step 2 is combined with C failure classes data, as many classification problems of (N+C) class
Training set, set up grader using Nae Bayesianmethod.
(4) grader in step 3 is tested using test set, and label is belonged into Y1Whole be classified as normal class.
2. the industrial Fault Classification of unbalanced data based on k-means according to claim 1, it is characterised in that institute
Stating step (2) is specially:First in normal class X1A suitable initial mean value vector is chosen, each sample is calculatedThe distance between with these mean vectors, wherein j=1,2 ..., n1.And according to each sample distance most
Near mean vector is determinedCluster mark λj, then recalculate the mean vector of each cluster, and repeat above-mentioned work G times.
In order to control to allow all kinds of data volumes in last cluster result to be more or less the same, therefore a threshold k is designed during iteration,
Just no longer sample is added after the sample size of each cluster reaches threshold value to this cluster.Threshold value k-means specific method is such as
Under:
(2.1) in order to by X1Be divided into N number of class, choose N number of suitable initial mean value vector as each classification initial mean value to
Amount, the N number of sample value of general random selection is vectorial as initial mean value, i.e.,Make xNa=[qa1;...;
qam], wherein a=1,2 ..., N.
(2.2) be calculated as follows the distance of each sample and N number of mean vector respectively, j-th of sample and a-th mean vector it
Between Euclidean distance be:
<mrow>
<msub>
<mi>d</mi>
<mrow>
<mi>j</mi>
<mi>a</mi>
</mrow>
</msub>
<mo>=</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>m</mi>
</munderover>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>p</mi>
<mrow>
<mi>j</mi>
<mi>k</mi>
</mrow>
</msub>
<mo>-</mo>
<msub>
<mi>q</mi>
<mrow>
<mi>a</mi>
<mi>k</mi>
</mrow>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein j=1,2 ..., n1, a=1,2 ..., N.For sample xjIf, djaMinimum, then by xjIt is included in a classes, i.e. λj=
a。
(2.3) in order to avoid the result data difference for occurring clustering is larger, it is impossible to reach that the purpose situation of cluster occurs, in step
(2.2) threshold k is added in, after the data amount check of a classes has reached K, the distance after this wheel relatively in by dji
Remove, do not consider, then this wheel will not increase data to a classes again, until the calculating of next round.
(2.4) after G iteration, N number of subclass, i.e. X are obtained1=[X11;X12;...;X1N], and successively by each subclass
Sample label is replaced by 1,2 .., N, obtains Y1=[1,2 ..., N].And change the label of failure classes data successively simultaneously, make Yb
=[b, b ..., b], wherein b=N+1, N+2..., N+C.Then training set now is X=[X1;X2;...;XN+C], and setWherein niFor the number of samples of the i-th class sample.Equally make each sample data
3. the industrial Fault Classification of unbalanced data based on k-means according to claim 1, it is characterised in that institute
Stating step (3) is specially:The average and variance of each dimension of (N+C) class are calculated respectively.Then for the sample data of test set,
Its posterior probability for belonging to each classification is calculated respectively, is chosen the wherein maximum classification of posterior probability, is assigned the sample corresponding
Label.Comprise the following steps that:
(3.1) the average Mean of each dimension data in each classification is calculated respectivelyicAnd variance VaricAll kinds of prior probabilities
pi, calculating formula is as follows
<mrow>
<msub>
<mi>Mean</mi>
<mrow>
<mi>i</mi>
<mi>c</mi>
</mrow>
</msub>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<msub>
<mi>n</mi>
<mi>i</mi>
</msub>
</mfrac>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>t</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<msub>
<mi>n</mi>
<mi>i</mi>
</msub>
</munderover>
<msub>
<mi>p</mi>
<mrow>
<mi>t</mi>
<mi>c</mi>
</mrow>
</msub>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<msub>
<mi>Var</mi>
<mrow>
<mi>i</mi>
<mi>c</mi>
</mrow>
</msub>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<msub>
<mi>n</mi>
<mi>i</mi>
</msub>
</mfrac>
<msqrt>
<mrow>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>t</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<msub>
<mi>n</mi>
<mi>i</mi>
</msub>
</munderover>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>p</mi>
<mrow>
<mi>t</mi>
<mi>c</mi>
</mrow>
</msub>
<mo>-</mo>
<msub>
<mi>Mean</mi>
<mrow>
<mi>i</mi>
<mi>c</mi>
</mrow>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
</msqrt>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>3</mn>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<msub>
<mi>p</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mfrac>
<msub>
<mi>n</mi>
<mi>i</mi>
</msub>
<mrow>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>t</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mrow>
<mi>C</mi>
<mo>+</mo>
<mi>N</mi>
</mrow>
</munderover>
<msub>
<mi>n</mi>
<mi>t</mi>
</msub>
</mrow>
</mfrac>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>4</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein i=1,2 ..., C+N, c=1,2 ..., m
(3.2) according to Naive Bayes Classification principle, for a test set containing U sample each sample z thereink=
[zk1,zk2,...,zkm], calculate its posterior probability p for belonging to each classificationki, calculating formula is as follows:
<mrow>
<msub>
<mi>p</mi>
<mrow>
<mi>k</mi>
<mi>i</mi>
</mrow>
</msub>
<mo>=</mo>
<msub>
<mi>p</mi>
<mi>i</mi>
</msub>
<mo>&times;</mo>
<munderover>
<mo>&Pi;</mo>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>m</mi>
</munderover>
<mfrac>
<mn>1</mn>
<mrow>
<msqrt>
<mrow>
<mn>2</mn>
<mi>&pi;</mi>
</mrow>
</msqrt>
<msub>
<mi>Var</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
</mrow>
</mfrac>
<msup>
<mi>e</mi>
<mrow>
<mo>-</mo>
<mfrac>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>z</mi>
<mrow>
<mi>k</mi>
<mi>j</mi>
</mrow>
</msub>
<mo>-</mo>
<msub>
<mi>Mean</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mrow>
<mn>2</mn>
<msubsup>
<mi>var</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
<mn>2</mn>
</msubsup>
</mrow>
</mfrac>
</mrow>
</msup>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>5</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein k=1,2 ..., U;I=1,2 ..., C+N.
According to the posterior probability calculated, and to the class label of sample imparting maximum of which probability.
4. the industrial Fault Classification of unbalanced data based on k-means according to claim 1, it is characterised in that institute
Belonging to step (4) is specially:Classification training set for having divided label in step (3), is 1 data sample for arriving N by label
Label be changed to 1 again, i.e., label is changed to 2 to C+ by normal class classification respectively for N+1 to N+C data sample label
1, that is, complete the test of grader.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710321424.1A CN107239789A (en) | 2017-05-09 | 2017-05-09 | A kind of industrial Fault Classification of the unbalanced data based on k means |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710321424.1A CN107239789A (en) | 2017-05-09 | 2017-05-09 | A kind of industrial Fault Classification of the unbalanced data based on k means |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107239789A true CN107239789A (en) | 2017-10-10 |
Family
ID=59984939
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710321424.1A Pending CN107239789A (en) | 2017-05-09 | 2017-05-09 | A kind of industrial Fault Classification of the unbalanced data based on k means |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107239789A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109086412A (en) * | 2018-08-03 | 2018-12-25 | 北京邮电大学 | A kind of unbalanced data classification method based on adaptive weighted Bagging-GBDT |
CN109978009A (en) * | 2019-02-27 | 2019-07-05 | 广州杰赛科技股份有限公司 | Behavior classification method, device and storage medium based on wearable intelligent equipment |
WO2019169700A1 (en) * | 2018-03-08 | 2019-09-12 | 平安科技(深圳)有限公司 | Data classification method and device, equipment, and computer readable storage medium |
CN110309885A (en) * | 2019-07-05 | 2019-10-08 | 黑龙江电力调度实业有限公司 | Computer room state judging method based on big data |
CN111240279A (en) * | 2019-12-26 | 2020-06-05 | 浙江大学 | Confrontation enhancement fault classification method for industrial unbalanced data |
CN111833171A (en) * | 2020-03-06 | 2020-10-27 | 北京芯盾时代科技有限公司 | Abnormal operation detection and model training method, device and readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030204507A1 (en) * | 2002-04-25 | 2003-10-30 | Li Jonathan Qiang | Classification of rare events with high reliability |
CN101980202A (en) * | 2010-11-04 | 2011-02-23 | 西安电子科技大学 | Semi-supervised classification method of unbalance data |
CN104951809A (en) * | 2015-07-14 | 2015-09-30 | 西安电子科技大学 | Unbalanced data classification method based on unbalanced classification indexes and integrated learning |
CN106444706A (en) * | 2016-09-22 | 2017-02-22 | 宁波大学 | Industrial process fault detection method based on data neighborhood feature preservation |
-
2017
- 2017-05-09 CN CN201710321424.1A patent/CN107239789A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030204507A1 (en) * | 2002-04-25 | 2003-10-30 | Li Jonathan Qiang | Classification of rare events with high reliability |
CN101980202A (en) * | 2010-11-04 | 2011-02-23 | 西安电子科技大学 | Semi-supervised classification method of unbalance data |
CN104951809A (en) * | 2015-07-14 | 2015-09-30 | 西安电子科技大学 | Unbalanced data classification method based on unbalanced classification indexes and integrated learning |
CN106444706A (en) * | 2016-09-22 | 2017-02-22 | 宁波大学 | Industrial process fault detection method based on data neighborhood feature preservation |
Non-Patent Citations (4)
Title |
---|
潘俊等: "基于推进的非平衡数据分类算法研究", 《计算机工程与应用》 * |
蹇涛,李宏,郭跃健: "结合代价敏感及多数类分解的非平衡分类", 《计算机工程与应用》 * |
阿曼: "朴素贝叶斯分类算法的研究与应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
齐雯: "大型风电场等值建模及其并网稳定性研究", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019169700A1 (en) * | 2018-03-08 | 2019-09-12 | 平安科技(深圳)有限公司 | Data classification method and device, equipment, and computer readable storage medium |
CN109086412A (en) * | 2018-08-03 | 2018-12-25 | 北京邮电大学 | A kind of unbalanced data classification method based on adaptive weighted Bagging-GBDT |
CN109978009A (en) * | 2019-02-27 | 2019-07-05 | 广州杰赛科技股份有限公司 | Behavior classification method, device and storage medium based on wearable intelligent equipment |
CN110309885A (en) * | 2019-07-05 | 2019-10-08 | 黑龙江电力调度实业有限公司 | Computer room state judging method based on big data |
CN111240279A (en) * | 2019-12-26 | 2020-06-05 | 浙江大学 | Confrontation enhancement fault classification method for industrial unbalanced data |
CN111833171A (en) * | 2020-03-06 | 2020-10-27 | 北京芯盾时代科技有限公司 | Abnormal operation detection and model training method, device and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107239789A (en) | A kind of industrial Fault Classification of the unbalanced data based on k means | |
CN106843195B (en) | The Fault Classification differentiated based on adaptive set at semi-supervised Fei Sheer | |
CN109931678B (en) | Air conditioner fault diagnosis method based on deep learning LSTM | |
CN107656154B (en) | Based on the Diagnosis Method of Transformer Faults for improving Fuzzy C-Means Cluster Algorithm | |
CN106371427B (en) | Industrial process Fault Classification based on analytic hierarchy process (AHP) and fuzzy Fusion | |
CN107657274A (en) | A kind of y-bend SVM tree unbalanced data industry Fault Classifications based on k means | |
CN108875772B (en) | Fault classification model and method based on stacked sparse Gaussian Bernoulli limited Boltzmann machine and reinforcement learning | |
CN103914064A (en) | Industrial process fault diagnosis method based on multiple classifiers and D-S evidence fusion | |
CN106649789A (en) | Integrated semi-supervised Fisher's discrimination-based industrial process fault classifying method | |
CN104699606A (en) | Method for predicting state of software system based on hidden Markov model | |
CN104914850B (en) | Industrial process method for diagnosing faults based on switching linear dynamic system model | |
CN105334823B (en) | Industrial process fault detection method based on the linear dynamic system model for having supervision | |
CN112922582B (en) | Gas well wellhead choke tip gas flow analysis and prediction method based on Gaussian process regression | |
CN110689069A (en) | Transformer fault type diagnosis method based on semi-supervised BP network | |
CN111709454B (en) | Multi-wind-field output clustering evaluation method based on optimal copula model | |
CN105510729A (en) | Overheating fault diagnosis method of transformer | |
CN115115090A (en) | Wind power short-term prediction method based on improved LSTM-CNN | |
CN103559542A (en) | Extension neural network pattern recognition method based on priori knowledge | |
Tang et al. | Review and perspectives of machine learning methods for wind turbine fault diagnosis | |
CN111240279B (en) | Confrontation enhancement fault classification method for industrial unbalanced data | |
CN103616889B (en) | A kind of chemical process Fault Classification of reconstructed sample center | |
CN105425777A (en) | Chemical process fault monitoring method based on active learning | |
CN107728476B (en) | SVM-forest based method for extracting sensitive data from unbalanced data | |
CN109164794B (en) | Multivariable industrial process Fault Classification based on inclined F value SELM | |
CN104537383A (en) | Massive organizational structure data classification method and system based on particle swarm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20171010 |