CN107239789A - A kind of industrial Fault Classification of the unbalanced data based on k means - Google Patents

A kind of industrial Fault Classification of the unbalanced data based on k means Download PDF

Info

Publication number
CN107239789A
CN107239789A CN201710321424.1A CN201710321424A CN107239789A CN 107239789 A CN107239789 A CN 107239789A CN 201710321424 A CN201710321424 A CN 201710321424A CN 107239789 A CN107239789 A CN 107239789A
Authority
CN
China
Prior art keywords
mrow
msub
data
sample
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710321424.1A
Other languages
Chinese (zh)
Inventor
葛志强
陈革成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201710321424.1A priority Critical patent/CN107239789A/en
Publication of CN107239789A publication Critical patent/CN107239789A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of method of the unbalanced data classification based on k means.This method is first by k means, and clustered according to the degree of unbalancedness class more to data, it is divided into N number of subclass by more several classes of, then with M minority class altogether, as many classification problems of (M+N) class, learnt finally according to Naive Bayes Classifier.Compared to other existing methods, the present invention is farthest remaining the information of former data, and prevents from preferably resolving the problem of uneven class data are classified on the premise of over-fitting, compared to other methods, nicety of grading is added, and reduces the generation of over-fitting.

Description

A kind of industrial Fault Classification of the unbalanced data based on k-means
Technical field
The invention belongs to the industrial process failure modes side of industrial process control field, more particularly to uneven class data Method.
Background technology
In the work of industrial failure modes, some conventional sorting techniques can all have a use premise, i.e., in training Concentrate the data volume of Various types of data suitable.But the situation of reality is frequently not so, when a certain class data are many or a certain Class data seldom, i.e., when uneven class data occur, directly can then produce very big error in classification using traditional sorting technique.
In recent years, the research of uneven class data is always a focus, and existing method is mainly gone from both direction Solve, one is that, from algorithm aspect, one is, from sampling aspect, to enter present invention is generally directed to sample level in face of conventional sorting methods Row is improved.Improved method for sampling is broadly divided into two classes, and a class is over-sampling, i.e., to minority class resampling to reach data Balance, a big drawback of such a method can exactly produce increase systems approach, produce over-fitting, practical application effect is not It is highly desirable;Another kind of is lack sampling, i.e., according to certain rule choose it is more several classes of in a part as training data, other Data then give up without, reach the balance of data with this, such a method due to have ignored the more several classes of data messages of a part, It can then cause the grader precision trained inadequate.It is an advantage of the present invention that both not changing former data sample Structure, also do not give up or artificially increase sample data on the premise of, train the preferable grader of effect.
The content of the invention
In view of the above-mentioned deficiencies in the prior art, it is an object of the present invention to provide a kind of unbalanced data work based on k-means Industry Fault Classification.
The purpose of the present invention is achieved through the following technical solutions:A kind of unbalanced data work based on k-means Industry Fault Classification, comprises the following steps:
(1) there is label instruction using what the data of systematic collection process nominal situation and various fault datas constituted modeling Practice sample set:Assuming that fault category is C, it is C+1 in total classification plus a normal class, each sample data, i.e.,WhereinniFor number of training, m is process variable number, and R is set of real numbers. So the complete label training sample set that has is, Xl=[X1;X2;...;XC+1], record the label information of all data, normal work It is 1 that label is marked under condition, and the label of failure 1 is 2, by that analogy, i.e. Yi=[i, i ... i], i=1,2 ..., C+1, complete Tally set is Y=[Y1;Y2;...;YC+1].Wherein normal class data x1To be more several classes of, remainder data is minority class, uneven Spend for N=100, and assume the data volume difference of failure classes data less, i.e.,
(2) k-means clustering methods are used, by X1It is divided into N number of subset that quantity is more or less the same, i.e. X1=[X11; X12;...;X1N], and new label Y is assigned respectively1=[Y11;Y12;...;Y1N];
(3) N number of subclass in (2) is combined with C failure classes data, many classification as (N+C) class are asked The training set of topic, grader is set up using Nae Bayesianmethod.
(4) grader in (3) is tested using test set, and label is belonged into Y1Whole be classified as normal class.
The beneficial effects of the invention are as follows:Method of the present invention by being clustered to most classes, i.e., to data sample process it Afterwards, can preferably solve the problem of unbalanced data is classified, while the not internal structure of change data, also do not increase or Data are reduced, the characteristic information of former data sample is farthest ensure that, compared to other methods, classification essence are added Degree, and reduce the generation of over-fitting.
Brief description of the drawings
Fig. 1 is the result schematic diagram that Bayes is directly handled;
Fig. 2 is the Bayes result schematic diagrames based on k-means.
Embodiment
The present invention is directed to the failure modes problem of industrial process, and this method is first by k-means, and according to degree of unbalancedness The class more to data is clustered, and is divided into N number of subclass by more several classes of, then with M minority class altogether, as (a M+ N) many classification problems of class, are learnt finally according to Naive Bayes Classifier.
The key step difference of the technical solution adopted by the present invention is as follows:
The first step:There is mark using the data of systematic collection process nominal situation and the composition modeling of various fault datas Sign training sample set:Assuming that fault category is C, it is C+1 in total classification plus a normal class, each sample data, I.e.I=1,2...C+1 is whereinniFor number of training, m is process variable number, and R is real Manifold.So the complete label training sample set that has is, Xl=[X1;X2;...;XC+1], the label information of all data is recorded, It is 1 that label is marked under nominal situation, and the label of failure 1 is 2, by that analogy, i.e. Yi=[i, i ... i], i=1,2 ..., C+1, Complete tally set is Y=[Y1;Y2;...;YC+1].Wherein normal class data x1To be more several classes of, remainder data is minority class, Degree of unbalancedness is N=100, and assumes that the data volume difference of failure classes data is little, i.e.,
Second step:Using k-means clustering methods, by X1It is divided into N number of subset that quantity is more or less the same, i.e. X1=[X11; X12;...;X1N], and new label Y is assigned respectively1=[Y11;Y12;...;Y1N]。
(a) in order to by X1It is divided into N number of class, chooses N number of suitable initial mean value vector as the initial mean value of each classification Vector, i.e.,OrderWherein a=1,2 ..., N.
(b) each sample of calculating is calculated as follows respectivelyThe distance between with these mean vectors, Wherein j=1,2 ..., n1, the Euclidean distance between j-th of sample and a-th of mean vector is:
Wherein j=1,2 ..., n1, a=1,2 ..., N.For sample xjIf, djaMinimum, then by xjIt is included in a classes.
(c) in order to avoid the result data difference for occurring clustering is larger, it is impossible to reach that the purpose situation of cluster occurs, we A threshold k is added in (b), after the data amount check of a classes has reached K, the distance after this wheel relatively in by dja Remove, do not consider, then this wheel will not increase data to a classes again, until the calculating of next round.
(d) after G iteration, N number of subclass, i.e. X are obtained1=[X11;X12;...;X1N], and successively by each subclass Sample label be replaced by 1,2 .., N, obtain Y1=[1,2 ..., N].And simultaneously successively change failure classes data label, Make Yi=[b, b ..., b], wherein b=N+1, N+2..., N+C.Then training set now is X=[X1;X2;...;XN+C], and IfI=1,2...C+N, wherein niFor the number of samples of the i-th class sample.Equally make each sample dataI=1,2 ..., C+N.
3rd step:N number of subclass in second is combined with C failure classes data, as many of (N+C) class The training set of classification problem, grader is set up using Nae Bayesianmethod.
(a) the average Mean of each dimension data in each classification is calculated respectivelyicAnd variance VaricAll kinds of priori is general Rate pi, calculating formula is as follows:
Wherein i=1,2 ..., C+N, c=1,2 ..., m.
(b) according to Naive Bayes Classification principle, for a test set containing U sample each sample z thereink =[zk1,zk2,...,zkm], calculate its posterior probability p for belonging to each classificationki, calculating formula is as follows:
Wherein k=1,2 ..., U;I=1,2 ..., C+N.
According to the posterior probability calculated, and to the class label of sample imparting maximum of which probability.
4th step:Classification training set for having divided label in the 3rd step, is 1 data sample for arriving N by label Label is changed to 1 again, i.e., normal class classification, and label is changed into 2 respectively for N+1 to N+C data sample label arrives C+1, Complete the test of grader.
Illustrate effectiveness of the invention below in conjunction with the example of a specific industrial process.The data of the process come from U.S. TE (Tennessee Eastman --- Tennessee-Yi Siman) chemical process is tested, and prototype is Eastman chemical companies An actual process flow.At present, TE processes oneself through extensive as typical chemical process fault detection and diagnosis object Research.Whole TE processes include 41 measurands and 12 performance variables (control variable), wherein 41 measurands include 22 continuous measurands and 19 composition measurement values, they are sampled once for every 3 minutes.Including 21 batches of fault datas. In these failures, 16 are that oneself knows, 5 are unknown.Failure 1-7 is relevant with the Spline smoothing of process variable, such as cooling water Inlet temperature or feed constituents change.The changeability of failure 8-12 and some process variables, which increases, to matter a lot.Failure 13 It is the slow drift in kinetics, failure 14,15 and 21 is relevant with sticking valve.Failure 16-20 is unknown.In order to The process is monitored, 44 process variables are have chosen altogether, as shown in table 1.Next the detailed process is combined to this hair Bright implementation steps are set forth in:
1st, collection normal data and 4 kinds of fault datas carry out data prediction and normalization as training sample data. Nominal situation and failure 1,2,6,14 are have selected in this experiment respectively as training sample, failure 1 and failure 2 are flowed in 4 Composition transfer.Failure 6 is the A compositions generation shadow as caused by the A charging losses in stream 1, but eventually in convection current 4 Ring.14 product separator bottom of towe flows of failure.Sampling time is 3min, and wherein nominal situation contains 1000 samples of exemplar This, remaining failure modes has selected exemplar 10 respectively.
2nd, it is 100 classes according to k-means points by nominal situation data sample, and ensures the quantitative difference between class and class not Greatly.Then Nae Bayesianmethod is used, the training set for having 104 classes altogether plus 4 class fault datas is learnt.
3rd, online classification is tested, and the sample data for belonging to preceding 100 class is regular for normal class, and resets 4 failure classes Data label.
Table 1:Monitor variable declaration
Variable is numbered Measurand Variable is numbered Measurand
1 A feed rates 22 Separator cooling water outlet temperature
2 D feed rates 23 A molar contents in logistics 6
3 E feed rates 24 B molar contents in logistics 6
4 A+C feed rates 25 C molar contents in logistics 6
5 Recirculating mass 26 D molar contents in logistics 6
6 Reactor feed flow velocity 27 E molar contents in logistics 6
7 Reactor pressure 28 F molar contents in logistics 6
8 Reactor grade 29 A molar contents in logistics 9
9 Temperature of reactor 30 B molar contents in logistics 9
10 Mass rate of emission 31 C molar contents in logistics 9
11 Product separator temperature 32 D molar contents in logistics 9
12 Product separator grade 33 E molar contents in logistics 9
13 Product separator temperature 34 F molar contents in logistics 9
14 Product separator bottom of towe flow 35 G molar contents in logistics 9
15 Stripper grade 36 H molar contents in logistics 9
16 Pressure of stripping tower 37 D molar contents in logistics 11
17 Stripper bottom of towe flow 38 E molar contents in logistics 11
18 Stripper temperature 39 F molar contents in logistics 11
19 Stripper flow 40 G molar contents in logistics 11
20 Compressor horsepower 41 H molar contents in logistics 11
21 Reactor cooling water outlet temperature
Above-described embodiment is used for illustrating the present invention, rather than limits the invention, the present invention spirit and In scope of the claims, any modifications and changes made to the present invention both fall within protection scope of the present invention.

Claims (4)

1. a kind of industrial Fault Classification of the unbalanced data based on k-means, it is characterised in that comprise the following steps:
(1) there is label training sample using what the data of systematic collection process nominal situation and various fault datas constituted modeling This collection:Assuming that fault category is C, along with a normal class, total classification of each sample data is C+1, i.e.,Wherein,niFor number of training, m is process variable number, and R is real number Collection.So complete has label training sample set X=[X1;X2;...;XC+1], record the label information of all data.Normal work It is 1 that label is marked under condition, and the label of failure 1 is 2, by that analogy, i.e. tally set Yi=[i, i ... i], i=1,2 ..., C+1, Complete tally set is Y=[Y1;Y2;...;YC+1].Wherein normal class data X1To be more several classes of, remainder data is minority class, Degree of unbalancedness is N=100, and assumes that the data volume difference of failure classes data is little, i.e.,
(2) k-means clustering methods are used, by X1It is divided into N number of subset that quantity is more or less the same, i.e. X1=[X11;X12;...; X1N], and new label Y is assigned respectively1=[Y11;Y12;...;Y1N]。
(3) N number of subclass in step 2 is combined with C failure classes data, as many classification problems of (N+C) class Training set, set up grader using Nae Bayesianmethod.
(4) grader in step 3 is tested using test set, and label is belonged into Y1Whole be classified as normal class.
2. the industrial Fault Classification of unbalanced data based on k-means according to claim 1, it is characterised in that institute Stating step (2) is specially:First in normal class X1A suitable initial mean value vector is chosen, each sample is calculatedThe distance between with these mean vectors, wherein j=1,2 ..., n1.And according to each sample distance most Near mean vector is determinedCluster mark λj, then recalculate the mean vector of each cluster, and repeat above-mentioned work G times. In order to control to allow all kinds of data volumes in last cluster result to be more or less the same, therefore a threshold k is designed during iteration, Just no longer sample is added after the sample size of each cluster reaches threshold value to this cluster.Threshold value k-means specific method is such as Under:
(2.1) in order to by X1Be divided into N number of class, choose N number of suitable initial mean value vector as each classification initial mean value to Amount, the N number of sample value of general random selection is vectorial as initial mean value, i.e.,Make xNa=[qa1;...; qam], wherein a=1,2 ..., N.
(2.2) be calculated as follows the distance of each sample and N number of mean vector respectively, j-th of sample and a-th mean vector it Between Euclidean distance be:
<mrow> <msub> <mi>d</mi> <mrow> <mi>j</mi> <mi>a</mi> </mrow> </msub> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msup> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>q</mi> <mrow> <mi>a</mi> <mi>k</mi> </mrow> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
Wherein j=1,2 ..., n1, a=1,2 ..., N.For sample xjIf, djaMinimum, then by xjIt is included in a classes, i.e. λj= a。
(2.3) in order to avoid the result data difference for occurring clustering is larger, it is impossible to reach that the purpose situation of cluster occurs, in step (2.2) threshold k is added in, after the data amount check of a classes has reached K, the distance after this wheel relatively in by dji Remove, do not consider, then this wheel will not increase data to a classes again, until the calculating of next round.
(2.4) after G iteration, N number of subclass, i.e. X are obtained1=[X11;X12;...;X1N], and successively by each subclass Sample label is replaced by 1,2 .., N, obtains Y1=[1,2 ..., N].And change the label of failure classes data successively simultaneously, make Yb =[b, b ..., b], wherein b=N+1, N+2..., N+C.Then training set now is X=[X1;X2;...;XN+C], and setWherein niFor the number of samples of the i-th class sample.Equally make each sample data
3. the industrial Fault Classification of unbalanced data based on k-means according to claim 1, it is characterised in that institute Stating step (3) is specially:The average and variance of each dimension of (N+C) class are calculated respectively.Then for the sample data of test set, Its posterior probability for belonging to each classification is calculated respectively, is chosen the wherein maximum classification of posterior probability, is assigned the sample corresponding Label.Comprise the following steps that:
(3.1) the average Mean of each dimension data in each classification is calculated respectivelyicAnd variance VaricAll kinds of prior probabilities pi, calculating formula is as follows
<mrow> <msub> <mi>Mean</mi> <mrow> <mi>i</mi> <mi>c</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <msub> <mi>n</mi> <mi>i</mi> </msub> </mfrac> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>i</mi> </msub> </munderover> <msub> <mi>p</mi> <mrow> <mi>t</mi> <mi>c</mi> </mrow> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>
<mrow> <msub> <mi>Var</mi> <mrow> <mi>i</mi> <mi>c</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <msub> <mi>n</mi> <mi>i</mi> </msub> </mfrac> <msqrt> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>n</mi> <mi>i</mi> </msub> </munderover> <msup> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mrow> <mi>t</mi> <mi>c</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>Mean</mi> <mrow> <mi>i</mi> <mi>c</mi> </mrow> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </msqrt> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>
<mrow> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <msub> <mi>n</mi> <mi>i</mi> </msub> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>C</mi> <mo>+</mo> <mi>N</mi> </mrow> </munderover> <msub> <mi>n</mi> <mi>t</mi> </msub> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>
Wherein i=1,2 ..., C+N, c=1,2 ..., m
(3.2) according to Naive Bayes Classification principle, for a test set containing U sample each sample z thereink= [zk1,zk2,...,zkm], calculate its posterior probability p for belonging to each classificationki, calculating formula is as follows:
<mrow> <msub> <mi>p</mi> <mrow> <mi>k</mi> <mi>i</mi> </mrow> </msub> <mo>=</mo> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>&amp;times;</mo> <munderover> <mo>&amp;Pi;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <mfrac> <mn>1</mn> <mrow> <msqrt> <mrow> <mn>2</mn> <mi>&amp;pi;</mi> </mrow> </msqrt> <msub> <mi>Var</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mrow> </mfrac> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mfrac> <msup> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mrow> <mi>k</mi> <mi>j</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>Mean</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mrow> <mn>2</mn> <msubsup> <mi>var</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mn>2</mn> </msubsup> </mrow> </mfrac> </mrow> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>
Wherein k=1,2 ..., U;I=1,2 ..., C+N.
According to the posterior probability calculated, and to the class label of sample imparting maximum of which probability.
4. the industrial Fault Classification of unbalanced data based on k-means according to claim 1, it is characterised in that institute Belonging to step (4) is specially:Classification training set for having divided label in step (3), is 1 data sample for arriving N by label Label be changed to 1 again, i.e., label is changed to 2 to C+ by normal class classification respectively for N+1 to N+C data sample label 1, that is, complete the test of grader.
CN201710321424.1A 2017-05-09 2017-05-09 A kind of industrial Fault Classification of the unbalanced data based on k means Pending CN107239789A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710321424.1A CN107239789A (en) 2017-05-09 2017-05-09 A kind of industrial Fault Classification of the unbalanced data based on k means

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710321424.1A CN107239789A (en) 2017-05-09 2017-05-09 A kind of industrial Fault Classification of the unbalanced data based on k means

Publications (1)

Publication Number Publication Date
CN107239789A true CN107239789A (en) 2017-10-10

Family

ID=59984939

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710321424.1A Pending CN107239789A (en) 2017-05-09 2017-05-09 A kind of industrial Fault Classification of the unbalanced data based on k means

Country Status (1)

Country Link
CN (1) CN107239789A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086412A (en) * 2018-08-03 2018-12-25 北京邮电大学 A kind of unbalanced data classification method based on adaptive weighted Bagging-GBDT
CN109978009A (en) * 2019-02-27 2019-07-05 广州杰赛科技股份有限公司 Behavior classification method, device and storage medium based on wearable intelligent equipment
WO2019169700A1 (en) * 2018-03-08 2019-09-12 平安科技(深圳)有限公司 Data classification method and device, equipment, and computer readable storage medium
CN110309885A (en) * 2019-07-05 2019-10-08 黑龙江电力调度实业有限公司 Computer room state judging method based on big data
CN111240279A (en) * 2019-12-26 2020-06-05 浙江大学 Confrontation enhancement fault classification method for industrial unbalanced data
CN111833171A (en) * 2020-03-06 2020-10-27 北京芯盾时代科技有限公司 Abnormal operation detection and model training method, device and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030204507A1 (en) * 2002-04-25 2003-10-30 Li Jonathan Qiang Classification of rare events with high reliability
CN101980202A (en) * 2010-11-04 2011-02-23 西安电子科技大学 Semi-supervised classification method of unbalance data
CN104951809A (en) * 2015-07-14 2015-09-30 西安电子科技大学 Unbalanced data classification method based on unbalanced classification indexes and integrated learning
CN106444706A (en) * 2016-09-22 2017-02-22 宁波大学 Industrial process fault detection method based on data neighborhood feature preservation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030204507A1 (en) * 2002-04-25 2003-10-30 Li Jonathan Qiang Classification of rare events with high reliability
CN101980202A (en) * 2010-11-04 2011-02-23 西安电子科技大学 Semi-supervised classification method of unbalance data
CN104951809A (en) * 2015-07-14 2015-09-30 西安电子科技大学 Unbalanced data classification method based on unbalanced classification indexes and integrated learning
CN106444706A (en) * 2016-09-22 2017-02-22 宁波大学 Industrial process fault detection method based on data neighborhood feature preservation

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
潘俊等: "基于推进的非平衡数据分类算法研究", 《计算机工程与应用》 *
蹇涛,李宏,郭跃健: "结合代价敏感及多数类分解的非平衡分类", 《计算机工程与应用》 *
阿曼: "朴素贝叶斯分类算法的研究与应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
齐雯: "大型风电场等值建模及其并网稳定性研究", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019169700A1 (en) * 2018-03-08 2019-09-12 平安科技(深圳)有限公司 Data classification method and device, equipment, and computer readable storage medium
CN109086412A (en) * 2018-08-03 2018-12-25 北京邮电大学 A kind of unbalanced data classification method based on adaptive weighted Bagging-GBDT
CN109978009A (en) * 2019-02-27 2019-07-05 广州杰赛科技股份有限公司 Behavior classification method, device and storage medium based on wearable intelligent equipment
CN110309885A (en) * 2019-07-05 2019-10-08 黑龙江电力调度实业有限公司 Computer room state judging method based on big data
CN111240279A (en) * 2019-12-26 2020-06-05 浙江大学 Confrontation enhancement fault classification method for industrial unbalanced data
CN111833171A (en) * 2020-03-06 2020-10-27 北京芯盾时代科技有限公司 Abnormal operation detection and model training method, device and readable storage medium

Similar Documents

Publication Publication Date Title
CN107239789A (en) A kind of industrial Fault Classification of the unbalanced data based on k means
CN106843195B (en) The Fault Classification differentiated based on adaptive set at semi-supervised Fei Sheer
CN109931678B (en) Air conditioner fault diagnosis method based on deep learning LSTM
CN107656154B (en) Based on the Diagnosis Method of Transformer Faults for improving Fuzzy C-Means Cluster Algorithm
CN106371427B (en) Industrial process Fault Classification based on analytic hierarchy process (AHP) and fuzzy Fusion
CN107657274A (en) A kind of y-bend SVM tree unbalanced data industry Fault Classifications based on k means
CN108875772B (en) Fault classification model and method based on stacked sparse Gaussian Bernoulli limited Boltzmann machine and reinforcement learning
CN103914064A (en) Industrial process fault diagnosis method based on multiple classifiers and D-S evidence fusion
CN106649789A (en) Integrated semi-supervised Fisher&#39;s discrimination-based industrial process fault classifying method
CN104699606A (en) Method for predicting state of software system based on hidden Markov model
CN104914850B (en) Industrial process method for diagnosing faults based on switching linear dynamic system model
CN105334823B (en) Industrial process fault detection method based on the linear dynamic system model for having supervision
CN112922582B (en) Gas well wellhead choke tip gas flow analysis and prediction method based on Gaussian process regression
CN110689069A (en) Transformer fault type diagnosis method based on semi-supervised BP network
CN111709454B (en) Multi-wind-field output clustering evaluation method based on optimal copula model
CN105510729A (en) Overheating fault diagnosis method of transformer
CN115115090A (en) Wind power short-term prediction method based on improved LSTM-CNN
CN103559542A (en) Extension neural network pattern recognition method based on priori knowledge
Tang et al. Review and perspectives of machine learning methods for wind turbine fault diagnosis
CN111240279B (en) Confrontation enhancement fault classification method for industrial unbalanced data
CN103616889B (en) A kind of chemical process Fault Classification of reconstructed sample center
CN105425777A (en) Chemical process fault monitoring method based on active learning
CN107728476B (en) SVM-forest based method for extracting sensitive data from unbalanced data
CN109164794B (en) Multivariable industrial process Fault Classification based on inclined F value SELM
CN104537383A (en) Massive organizational structure data classification method and system based on particle swarm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171010