CN103914064B - Based on the commercial run method for diagnosing faults that multi-categorizer and D-S evidence merge - Google Patents

Based on the commercial run method for diagnosing faults that multi-categorizer and D-S evidence merge Download PDF

Info

Publication number
CN103914064B
CN103914064B CN201410128630.7A CN201410128630A CN103914064B CN 103914064 B CN103914064 B CN 103914064B CN 201410128630 A CN201410128630 A CN 201410128630A CN 103914064 B CN103914064 B CN 103914064B
Authority
CN
China
Prior art keywords
matrix
data
new
sample
spe
Prior art date
Application number
CN201410128630.7A
Other languages
Chinese (zh)
Other versions
CN103914064A (en
Inventor
张富元
葛志强
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Priority to CN201410128630.7A priority Critical patent/CN103914064B/en
Publication of CN103914064A publication Critical patent/CN103914064A/en
Application granted granted Critical
Publication of CN103914064B publication Critical patent/CN103914064B/en

Links

Abstract

The present invention discloses a kind of commercial run method for diagnosing faults merged based on multi-categorizer and D-S evidence, first the method carries out independent repeated sampling according to the fault data of commercial run, then new training data is applied multiple classifier methods, obtaining respective off-line modeling model, the performance of each classifier methods is to merge the form performance of matrix simultaneously. Then, calculate different classes of basic probability assignment function by D-S evidence theory, according to similarity index, multiple sorter decision-making is carried out selective ensemble with comprehensive, try to achieve associating basic probability assignment function, by comparing, obtain last classification diagnosis result. Compare other current method, the present invention can improve the diagnosis effect of commercial run greatly, reduce and postpone Diagnostic Time, improve accuracy rate of diagnosis, improve monitoring performance to a great extent, enhancing process operator to the understanding ability of process and operation confidence, advantageously automatization in commercial run is implemented.

Description

Based on the commercial run method for diagnosing faults that multi-categorizer and D-S evidence merge
Technical field
The invention belongs to industrial stokehold field, particularly relate to a kind of commercial run method for diagnosing faults merged based on multi-categorizer and Dempster-Shafer (D-S) evidence.
Background technology
In recent years, the monitoring problem of Industrial processes more and more obtains the extensive attention of industry member and academia. On the one hand, actual commercial run is because its process is complicated, and operational variable is many, there is the stages such as non-linear, non-gaussian, dynamic, under single hypothesis, uses a certain kind method, and its monitoring effect has very big limitation. On the other hand, if process well do not monitored, and to contingent diagnosing malfunction, it is possible to work accident can occur, the lighter affects the quality of product, and severe one will cause the loss of life and property. Therefore, find better process monitoring method and carry out correctly forecasting one of the research focus and problem in the urgent need to address that have become Industrial processes in time.
Traditional Industrial Process Monitoring method, except the method based on mechanism model, adopts multiviate statistical analysis method, such as pca method (PCA) and deflected secondary air (PLS) etc. mostly. When mechanism model is difficult to obtain, become the main stream approach of Industrial Process Monitoring based on the multiviate statistical analysis method of data-driven. But, all there are some basic hypothesis conditions in traditional multiviate statistical analysis method, the such as hypothesis condition of pca method (PCA) is that data obey independent same distribution, and hypothesis process obeys linear Gaussian distribution, but actual procedure relative complex, process may be that a part is linear, a part is non-linear or the combination of a part of non-gaussian. And want to find a kind of method being applicable to various environment to be impossible. By contrast, method when multiple different hypothesis is carried out integrated, namely information fusion method has the advantage of himself in the monitoring and trouble diagnosis of process complex industrial process, and the present invention adopts the method to substitute original single multiviate statistical analysis method and diagnosed by procedure fault. In order to improve the effect of fusion, increasing in sorter diversity, it is possible to first training data is carried out repeated sampling pre-treatment, and utilize certain similarity index, selectively merge. Traditional monitoring method assumes that process operation is under single condition, cannot meet the detection requirement of actual industrial process. Even if the different operating condition of process is carried out modeling respectively, satisfied monitoring effect also cannot be reached. Because when new process data is monitored, cohesive process knowledge is needed the working conditions of these data to be judged, and choose corresponding monitoring model, this just greatly strengthen monitoring method to the dependency of process knowledge, is unfavorable for that the automatization of commercial run is implemented.
Summary of the invention
It is an object of the invention to for the deficiencies in the prior art, it is provided that a kind of commercial run method for diagnosing faults merged based on multi-categorizer and D-S evidence.
It is an object of the invention to be achieved through the following technical solutions: a kind of commercial run method for diagnosing faults merged based on multi-categorizer and D-S evidence, comprises the following steps:
(1) training sample set of data that systematic collection process normally runs and the composition modeling of various fault data is utilized: assuming that fault classification is C, add a normal class, total classification of modeling data is C+1, that is, Xi=[x1; x2; ...; xn], i=1,2 ..., C+1; Wherein Xi��Rn��m, R is set of real numbers, Rn��mRepresenting that X meets the bivariate distribution of n �� m, n is the sample number of each class, and m is process variable number; So complete training sample set is X and X=[X1; X2; ...; XC+1], X �� R((C +1)*n)��m, by these data stored in historical data storehouse;
(2) the other fault class data different from training data are collected, as off-line test data, C class altogether, that is: Yj=[y1; y2; ...; yN], j=1,2 ..., C, wherein Yj��RN��m, and N is the sample number of each class, m is process variable number; So complete test sample book integrates as Y, i.e. Y=[Y1; Y2; ...; YC], Y �� R(C*N)��m, simultaneously by these data stored in historical data storehouse;
(3) from database, call learning sample X, adopt independent repeated sampling method that each class data matrix is carried out rearrangement process, and ensure that reordering rule is consistent, obtain data matrix collection
(4) to data setCarrying out pre-treatment and normalization method, namely in each classification, the average making each process variable respectively is zero, and variance is 1, obtains new data matrix collection and is
(5) data set Y carrying out pre-treatment and normalization method, the average of all kinds of learning sample namely obtained and variance so that in each class according to step 4, the average of each process variable is zero, and variance is 1, obtains new data matrix collection and is Y ‾ ‾ ∈ R ( C * N ) × m ;
(6) number selecting classifier methods is G, comprises unsupervised approaches and has measure of supervision, calls different sorters, at training datasetLower set up different sorter models, calculate corresponding T to without monitor model2With the fiducial limit of SPE statistic; Corresponding label index is calculated to there being monitor model;
(7) in test data setUnder, utilize different sorter models and parameter thereof, the fusion matrix of each classifier methods of off-line calculation;
(8) according to the similarity index proposed, the similarity between different classifier methods is calculated, for selectivity fusion process afterwards is prepared;
(9) by modeling data and each model parameter, together stored in for subsequent use in historical data storehouse and real-time data base;
(10) collect new online process data, and it is carried out pre-treatment and normalization method;
(11) adopt different sorter models to monitor respectively, for without monitor model, setting up statistic T 2 and SPE statistic, for there being monitor model, obtain corresponding tag along sort;
(12) D-S evidence theory is passed through, utilize the priori to different faults recall rate in fusion matrix, calculate the comprehensive classification rate of current sample under all classifier methods, the similarity index calculated before utilization, selectively merge, and make last decision-making.
The invention has the beneficial effects as follows: the analysis and modeling of the present invention by each fault data being carried out respectively under different classifier methods. Then, introduce similarity index, utilize D-S means of proof the diagnostic message under different methods to be carried out integrated and comprehensive, obtain last diagnostic result. Compare other current method for diagnosing faults, the present invention is possible not only to the monitoring effect greatly improving commercial run, reduce and postpone Diagnostic Time, increase the accuracy of diagnosis, and improve monitoring method to a great extent to the dependency of process knowledge, enhancing process operator to the understanding ability of process and operation confidence, advantageously automatization in commercial run is implemented.
Accompanying drawing explanation
Fig. 1 is that unsupervised approaches under the inventive method (PCA) is to the diagnostic graph of the fault data of TE process;
Fig. 2 has measure of supervision (FDA) to the diagnostic graph of the fault data of TE process under the inventive method;
Fig. 3 is unsupervised approaches under the inventive method (PCA) and the fusion matrix diagram having measure of supervision (FDA).
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described in detail.
A kind of commercial run method for diagnosing faults merged based on multi-categorizer and D-S evidence of the present invention, the method is for the troubleshooting issue of commercial run, first the data under utilizing distributed control system to collect normal operation state, and various already present fault data, then these data are carried out unified diverse processing, namely independent repeated sampling is carried out, obtain new training dataset, on this basis, call different classifier methods respectively, set up corresponding sorter model, and to method establishment two the monitoring and statistics amount T without supervision2With the statistics limit of SPE and correspondence thereofAnd SPElim, to the method establishment label classification having supervision. Call test data set, utilize various sorter model, obtain comprising the fusion matrix of sorter classification performance, then all model parameters stored in for subsequent use in database. New online data is monitored and when trouble diagnosis, first different sorter monitoring models is utilized it to be monitored, obtain corresponding monitoring result, utilize similarity index, select suitable classifier methods, then, on the basis merging matrix, the state final decision of these data is obtained by D-S evidence theory.
A kind of commercial run method for diagnosing faults merged based on multi-categorizer and D-S evidence of the present invention, comprises the following steps:
The first step: the training sample set utilizing data that systematic collection process normally runs and the composition modeling of various fault data: assuming that fault classification is C, add a normal class, total classification of modeling data is C+1, that is, Xi=[x1; x2; ...; xn], i=1,2 ..., C+1. Wherein Xi��Rn��m, m is process variable number, and n is the sample number of each class. So complete training sample set is X and X=[X1; X2; ...; XC+1],X��R((C+1)*n)��m, by these data stored in historical data storehouse.
2nd step: collect the other fault class data different from training data, as off-line test data, C class altogether, that is, Yj=[y1; y2; ...; yN], j=1,2 ..., C, wherein Yj��RN��m, and N is the sample number of each class, m is process variable number. As Y namely so complete test sample book integrates, Y=[Y1; Y2; ...; YC],Y��R(C*N)��m, simultaneously by these data stored in historical data storehouse.
3rd step: call normal data from database, adopts independent repeated sampling method that data matrix is carried out rearrangement process, obtains data matrix collection
In order to increase the diversity of data, to improve final syncretizing effect, utilize the method for independent repeated sampling, data are processed, under statistical significance, the possibility that the data of big probability are adopted is big, the method so just can be utilized to filter the sample of part small probability, i.e. the useless information of a part.
4th step: to data matrix stackCarrying out pre-treatment and normalization method, namely in each class so that the average of each process variable is zero, and variance is 1, obtaining new data matrix collection is
The process data collected is carried out pre-treatment by historical data storehouse, reject outlier and obvious coarse error data, in order to make the yardstick of process data can not have influence on the result of monitoring, the data of different variable are normalized respectively, namely the average of each variable is zero, and variance is 1.Like this, the data of various process variable are just under identical yardstick, can not have influence on follow-up monitoring effect afterwards.
5th step: data set Y carries out pre-treatment and normalization method, namely according to average and the variance of all kinds of learning sample in step 4 so that in each class, the average of each process variable is zero, and variance is 1, obtains new data matrix collection and is
6th step: call different classifier methods, comprises unsupervised approaches and has measure of supervision, at new data matrix collectionLower set up different sorter models, construct corresponding T to without monitor model2With the fiducial limit of SPE statistic; Corresponding label index is constructed to there being monitor model;
Because training datasetComprise the information of C+1 kind, therefore when modeling, to be processed separately for each fault class sample set, altogether need C+1 time, below to process a class fault:
A) for unsupervised approaches, concrete performing step is as follows:
1) analyzed by PCA, it is possible to obtain the covariance matrix �� �� R of data matrixn��n, unitary matrix U �� Rn��m, eigenwert form diagonal matrix D �� Rm��mAs follows:
Σ = X ‾ ‾ X ‾ ‾ T / ( n - 1 ) Σ = UDU T D = d i a g ( λ i ) , i = 1 , ... , m U = [ u 1 , u 2 , ... , u m ] - - - ( 15 )
Then load matrix P �� R is obtained on its basism��k, residual error load matrixPivot composition t �� Rn ��k, residual matrixAs follows:
P = [ u 1 , u 2 , ... , u k ] P ‾ = [ u k + 1 , u k + 2 , ... , u m ] t = X ‾ ‾ P C ‾ = P ‾ P ‾ T - - - ( 16 )
Wherein, k is the pivot number extracted, and mainly utilizes accumulative variance contribution ratio (> 80%) calculate. Then T is constructed2Statistic also utilizes F-distribution to provide fiducial limitResidual matrix C is set up SPE statistic and calculates its corresponding fiducial limit SPElim��
2) analyzed by KPCA, utilize Radial basis kernel function, raw data is mapped by higher-dimension, obtain the eigenwert of high dimension space, proper vector and score, and utilize accumulative variance contribution ratio (> 80%) and calculate pivot number d, obtain corresponding load matrix, pivot PCA method described above. Construct T equally2Statistic also utilizes F-distribution to provide fiducial limitResidual matrix is set up SPE statistic and calculates its corresponding fiducial limit SPElim��
3) analyzed by ICA, it is possible to obtain the independent component matrix S �� R of this data matrixr��n, mixing matrix A �� Rm��r, separation matrix W �� Rr��mAnd residual matrixAs follows:
X ‾ ‾ = A S + E ‾ ‾ S = W X ‾ ‾ E ‾ ‾ = X ‾ ‾ - A S - - - ( 17 )
Wherein, r is the independent component number chosen. Then, it is to construct I2Statistic also utilizes Density Estimator method to provide its corresponding fiducial limitThat is:
f ^ ( I 2 , H ) = 1 n Σ i = 1 n K ( H - 1 / 2 ( I 2 - I i 2 ) ) . - - - ( 18 )
Wherein, K () is kernel function, is usually chosen for Gaussian kernel form, and H is the bandwidth parameter matrix of kernel function, it is possible to easy choice is diagonal angle form, I2For the I of current sample2Statistics value,It is the I of i-th learning sample2Statistics value. Like this, we just can obtain I2The probability density distribution information of statistic, such that it is able to conveniently ask for its statistics limit under certain degree of confidenceValue. For residual matrixThe fiducial limit of structure SPE statistic;
To residual matrix on the basis of previous stepSet up SPE statistic and calculate its corresponding fiducial limit SPElim, that is:
SPE lim = E ‾ ‾ E ‾ ‾ T - - - ( 19 )
Wherein, SPElimObeying parameter is the �� of g and h2Distribution,
g · h = m e a n ( S P E ) 2 g 2 h = var ( S P E ) - - - ( 20 )
Therefore, the fiducial limit of SPE statistic can also obtain easily, namely
B) for there being measure of supervision, concrete performing step is as follows:
1) by Fei Sheer method of discrimination, find out all kinds of between the most suitable projecting direction, and determine the position of the central point of each class;
2) by K-near neighbor method, set 5 neighbour's points, add class label to modeling data;
3) by neural net method, selecting the two layers of BP network comprising three hidden nodes, hidden layer selects tansig function, and output layer selects purelin function, training network model.
7th step is for test data setCalling different classifier methods, off-line calculation G kind merges matrix. The form merging matrix is as follows:
Wherein, CMgRepresenting the fusion matrix of g sorter, and G is the number of sorter, row represents C class fault, and front C row represent different faults, and last row represent normal, withFor example, its meaning is that the sample belonging to the 2nd class is classified the sample number that device is divided into the first kind. Here is concrete performing step.
A) calling the classifier methods without supervision, concrete performing step is as follows:
1) PCA method, according to all kinds of parameters obtained in above-mentioned steps, calculates the T of 6 class fault samples under 6 kinds of classifier methods2Add up value with SPE, utilize the index of associating PCA and SPE two statistics to do fault reconstruct, be called associating discriminant index ��i, as follows:
λ i = ( 1 - α i ) × T 2 / T lim . i 2 + α i × S P E / SPE lim . i , i = 1 , 2 , ... , C + 1 - - - ( 22 )
Wherein, ��iFor score tiCumulative proportion in ANOVA, T2It is the statistical value of test sample book with SPE,And SPElim.iIt is respectively the fiducial limit that step (6) modeling obtains, gets the testing classification of the minimum person in C+1 as certain sample;
2) KPCA method, with above-mentioned PCA method;
3) ICA method, similar with above-mentioned PCA method, but T2Statistic has changed I into2Statistic, fault reconstructing method afterwards is similar.
B) calling the classifier methods without supervision, concrete performing step is as follows:
1) for Fei Sheer method of discrimination, the Europe formula distance between test sample book and central point of all categories is calculated, apart from label corresponding to that minimum class as final label;
2) for K-near neighbor method, calculate the distance between test sample book and known class label, obtain final label;
3) for neural net method, utilize the network model trained and obtain, calculate the output label of test sample book. 8th step, according to the similarity index proposed, calculates the similarity between different classifier methods, for selectivity fusion process afterwards is prepared. In order to the similarity weighed between different classifier methods, the present invention proposes a kind of similarity index method of calculation based on merging matrix, as follows:
corr i j = cov ( c i _ m , c j _ m ) D ( c i _ m ) D ( c j _ m ) - - - ( 23 )
Wherein, what ci_m and cj_m represented is the fusion matrix of different classifier methods, and D (ci_m) and D (cj_m) is respectively variance value corresponding to the fusion matrix of difference classifier methods, corrijWhat represent is the linear dependence index between two sorters.
9th step by modeling data and each model parameter stored in for subsequent use in historical data storehouse and real-time data base;
Tenth step collects new process data, and it carries out pre-treatment and normalization method;
For the data sample newly collected in process, except it is carried out pre-treatment, also have and adopt model parameter during modeling this data point to be normalized, namely subtract modeling average and divided by modeling standard deviation. 11 step adopts different sorter models it to be monitored respectively, namely sets up statistic T2With SPE and label, so each method can obtain one about decision-making that is normal or fault
A) for the method without supervision, corresponding monitoring and statistics amount is set up as follows:
1) PCA is analyzed
t n e w = X ‾ ‾ n e w P SPE n e w = | | C ‾ X ‾ ‾ n e w | | 2 T n e w 2 = | | D r - 1 / 2 t n e w | | 2 = | | D r - 1 / 2 P T X ‾ ‾ n e w | | - - - ( 24 )
WhereinFor the online new sample after normalization method, tnewFor the pivot of new sample,For residual matrix, P is load matrix, SPEnewFor the SPE of online new sample adds up value, | | | |2Represent 2-norm, Tnew 2For the T of new sample2Statistics value, T is the transposition of matrix.
2) KPCA is analyzed, with above-mentioned PCA process.
3) ICA is analyzed
s n e w = W x ‾ ‾ n e w e ‾ ‾ n e w = x ‾ ‾ n e w - As n e w I n e w 2 = s n e w T s n e w - - - ( 25 )
Wherein,For the new data after normalization method, snewFor the independent component vector extracted based on new data,For the I of new data2Statistic, continues for residual vectorSetting up SPE statistic is SPEnew:
SPE n e w = e ‾ ‾ n e w e ‾ ‾ n e w T - - - ( 26 )
B) for the method having supervision, corresponding class label is obtained:
1) for Fei Sheer method of discrimination, the Europe formula distance between new sample and central point of all categories is calculated, apart from label corresponding to that minimum classification as final label;
2) for K-near neighbor method, calculate the distance method between new sample and known class label, obtain final label;
3) for neural net method, utilize the network model trained and obtain, calculate the output label of new sample.
9th step, by D-S evidence theory, utilizes each method to the priori of different faults recall rate, calculates the comprehensive recall rate of current monitoring data under all classifier methods, and make last decision-making
A) first call different classifier methods, calculate corresponding basic probability assignment function mg(Ci), as follows:
m g ( C i ) = N i j g Σ i = 1 G N i j g , g = 1 , 2 , ... , G - - - ( 27 )
Wherein,Refer to the element of the i-th row jth row in the fusion matrix of g classifier methods, mg(Ci) refer to that sample is assigned to C by the g sorteriThe probable value of class, is also basic probability assignment function value, and G is the number selecting sorter.
The fault classification assumed in fault database has C=1,2 ... L, for PCA method, have:
m P C A ( 1 ) = p P C A ( 1 ) ; m P C A ( 2 ) = p P C A ( 2 ) ; ... m P C A ( L ) = p P C A ( L ) ; - - - ( 28 )
Wherein, mPCA(1) that represent is the probable value p that sample is divided into the first kind by PCA methodPCA(1), same mPCA(L) that represent is the probable value p that sample is divided into L class by PCA methodPCA(L)��
For other classifier methods, corresponding basic probability assignment function value can be obtained equally, as follows:
m m e t h o d ( 1 ) = p m e t h o d ( 1 ) ; m m e t h o d ( 2 ) = p m e t h o d ( 2 ) ; ... m m e t h o d ( L ) = p m e t h o d ( L ) ; - - - ( 29 )
Wherein mmethod(L) what represent is that sample is divided into the probable value p of L class by classifier methods menthodmethod(L)��
B) in same sampling instant, call the output under six kinds of classifier methods of selection, according to similarity index calculated before, selectively select classifier methods, picking out corresponding detected result is the probable value that fault occurs, utilize following D-S fusion rule, obtain last basic probability assignment function:
Wherein represent orthogonal and, definition as shown in formula (29), set A, B, C represent different fault class set respectively, and A is the common factor of set B and C, �� represent is common factor be empty set joint probability assignment function value, m1And m2Represent first and the 2nd sorter respectively, m1.2(A) it is the probable value after two kinds of classifier methods are combined, and the result after normalization method.
m 1 , 2 , ... , K = m 1 ⊕ m 2 ⊕ ... ⊕ m K = ( ( ( m 1 ⊕ m 2 ) ⊕ m 3 ) ⊕ ... ⊕ m K ) = ( ( m 1 , 2 ⊕ m 3 ) ⊕ ... ⊕ m K ) ... - - - ( 32 )
m1,2,...,KWhat represent is the probable value after K Classifier combination, and it is the associating by first doing two methods, then with the 3rd method associating, analogizes with this.
C) for the basic probability assignment function value after fusion, bigger value is selected, as last result, Final (Ai) namely:
F i n a l ( A i ) = argmax i [ m 1 , 2 , ... G ( A i ) ] , i = 1 , 2 , ... , C + 1 - - - ( 33 )
Wherein m1,2,...G(Ai) what represent is joint probability assignment function total under G kind classifier methods.
The validity of the present invention is described below in conjunction with the example of a concrete commercial run. The data of this process are from the experiment of U.S. TE (TennesseeEastman Tennessee-Yi Siman) chemical process, and prototype is an actual process flow process of the chemical company of Eastman. Whole TE process comprises 41 and measures variable and 12 operational variables (controlled variable), and wherein 41 measurement variablees comprise 22 continuously measured variablees and 19 component metering values, and they are sampled once for every 3 minutes.Comprising 21 batches of fault data. In these faults, 16 is that oneself knows, 5 is unknown 1301. Fault 1��7 is relevant with the Spline smoothing of process variable, such as the temperature in of water coolant or the change of charging composition. Fault 8��12 matters a lot with the mutability increasing of some process variables. Fault 13 is the slow drift in reaction kinetics, and fault 14,15 is relevant with sticking valve with 21. Fault 16��20 is unknown. In order to this process be monitored, have chosen altogether 16 process variables, as shown in table 1:
Table 1: monitored variable explanation
Sequence number Variable Sequence number Variable
1 A charging (stream 1) 9 Product separation actuator temperature
2 D charging (stream 2) 10 Product separator pressure
3 E charging (stream 3) 11 Low flow at the bottom of product separator tower (stream 10)
4 Combined feed total feed (stream 4) 12 Stripper pressure
5 Recirculation flow (stream 8) 13 Stripper temperature
6 Reactor feed speed (stream 6) 14 Stripper flow
7 Temperature of reactor 15 Reactor cooling water outlet temperature
8 Discharge rate (stream 9) 16 Separator cooling water outlet temperature
Next carry out setting forth in detail to the implementation step of the present invention in conjunction with this detailed process:
1. gather normal processes data, gather various fault data simultaneously, carry out pre-treatment, normalization method
2. for training data, call different classifier methods, set up different sorter models respectively and determine fiducial limit and the label of corresponding statistic
Respectively to new data matrixCarry out model foundation, for the data modeling of a class:
1) carry out PCA analysis and modeling, choose 6 pivot compositions, obtain detailed pca model. Then T is constructed2Statistic also determines its corresponding fiducial limit with F-distribution. With reason, utilize the distribution of Ka Er side can determine the monitoring fiducial limit of SPE statistic. Here, the degree of confidence that we choose two statistics is 99%.
2) carry out KPCA analysis and modeling, choose 5 pivot compositions, obtain detailed KPCA model. Then T is constructed2Statistic also determines its corresponding fiducial limit with F-distribution. With reason, utilize the distribution of Ka Er side can determine the monitoring fiducial limit of SPE statistic. Here, the degree of confidence that we choose two statistics is 99%.
3) carry out ICA analysis and modeling, choose 4 independent components, obtain detailed ICA model parameter information, i.e. independent component information S �� R4��960, mixing matrix A �� R16��4, separation matrix W �� R4��16And residual matrixThen I is constructed2Statistic also determines its corresponding fiducial limit by Density Estimator method. With reason, it may be determined that the fiducial limit of SPE statistic. Here, the degree of confidence that we choose two statistics is 99%.
4) carrying out Fei Sheer and differentiate modeling, find the most applicable all kinds of projecting direction, add class label to modeling data, label is 1 to 7, altogether 7 natural numbers.
5) carrying out K-neighbour's modeling, set 5 neighbour's points, add class label to modeling data, label is 1 to 7, altogether 7 natural numbers.
6) carrying out neural net model establishing, select two layers of BP network with three hidden nodes, problem of multiclass being classified is converted into two classification, and number of tags is 0 and 1.
3. call off-line test data, according to the sorter model of above-mentioned foundation, obtain merging accordingly matrix.
4. obtain current online data information, and it is carried out pre-treatment and normalization method
In order to test the validity of novel method, respectively normal sample and fault sample are tested. Choose a certain process data at random, and utilize the normalized parameter under each classifier methods it to be processed.Choose a kind of typical fault to test, equally it is normalized.
5. on-line fault diagnosis
First training data is carried out modeling, select fault 1,2,5,6,8,12 6 kind, without the method for supervision with there is diagnostic result (part) that the method for supervision obtains respectively as depicted in figs. 1 and 2. As can be seen from the figure, fault is made good diagnosis as PCA and Fei Sheer differentiates by single classifier methods. Then, calling test sample book, obtain merging matrix, same selection partial results differentiates such as PCA and Fei Sheer, as shown in Figure 3, illustrates that single classifier methods is higher to the recall rate of some fault, and some is then responsive not; According to the similarity index calculation formula proposed, calculating the diversity between the six kinds of classifier methods selected, as shown in table 2, it is for the process of last fusion provides foundation.
Table 2: the indicator gauge of relative index under various classifier methods
Finally, being diagnosed by new sample, the diagnostic result contrast of new method and single classifier methods is as shown in table 3:
Table 3: the diagnosis effect contrast table of the inventive method and single classifier methods
This it appears that, new method has successfully detected the fault of process and has diagnosed out corresponding fault very accurately, has less sample delay and higher accuracy rate of diagnosis simultaneously.
Above-described embodiment is used for explaining explanation the present invention, instead of limits the invention, and in the spirit of the present invention and the protection domain of claim, any amendment the present invention made and change, all fall into protection scope of the present invention.

Claims (7)

1. the commercial run method for diagnosing faults merged based on multi-categorizer and D-S evidence, it is characterised in that, comprise the following steps:
(1) training sample set of data that systematic collection process normally runs and the composition modeling of various fault data is utilized: assuming that fault classification is C, add a normal class, total classification of modeling data is C+1, that is, Xi=[x1; x2; ...; xn], i=1,2 ..., C+1; Wherein Xi��Rn��m, R is set of real numbers, Rn��mRepresenting that X meets the bivariate distribution of n �� m, n is the sample number of each class, and m is process variable number; So complete training sample set is X and X=[X1; X2; ...; XC+1], X �� R((C+1)*n)��m, by these data stored in historical data storehouse;
(2) the other fault class data different from training data are collected, as off-line test data, C class altogether, that is: Yj=[y1; y2; ...; yN], j=1,2 ..., C, wherein Yj��RN��m, and N is the sample number of each class, m is process variable number; So complete test sample book integrates as Y, i.e. Y=[Y1; Y2; ...; YC], Y �� R(C*N)��m, simultaneously by these data stored in historical data storehouse;
(3) from database, call learning sample X, adopt independent repeated sampling method that each class data matrix is carried out rearrangement process, and ensure that reordering rule is consistent, obtain data matrix collection
(4) to data setCarrying out pre-treatment and normalization method, namely in each classification, the average making each process variable respectively is zero, and variance is 1, obtains new data matrix collection and is
(5) data set Y carrying out pre-treatment and normalization method, the average of all kinds of learning sample namely obtained and variance so that in each class according to step (4), the average of each process variable is zero, and variance is 1, obtains new data matrix collection and is
(6) number selecting classifier methods is G, comprises unsupervised approaches and has measure of supervision, calls different sorters, at training datasetLower set up different sorter models, calculate corresponding T to without monitor model2With the fiducial limit of SPE statistic;Corresponding label index is calculated to there being monitor model;
(7) in test data setUnder, utilize different sorter models and parameter thereof, the fusion matrix of each classifier methods of off-line calculation;
(8) according to the similarity index proposed, the similarity between different classifier methods is calculated, for selectivity fusion process afterwards is prepared;
(9) by modeling data and each model parameter, together stored in for subsequent use in historical data storehouse and real-time data base;
(10) collect new online process data, and it is carried out pre-treatment and normalization method;
(11) different sorter models is adopted to monitor respectively, for without monitor model, setting up statistic T2With SPE statistic, for there being monitor model, obtain corresponding tag along sort;
(12) D-S evidence theory is passed through, utilize the priori to different faults recall rate in fusion matrix, calculate the comprehensive classification rate of current sample under all classifier methods, the similarity index calculated before utilization, selectively merge, and make last decision-making.
2. the commercial run method for diagnosing faults merged based on multi-categorizer and D-S evidence according to claim 1, it is characterized in that, described step (3) is specially: to equal with sample number, the natural number array being n carries out random repeated sampling, obtain a new natural number array, according to this array, each categorical data in training sample set X is rearranged, reconstitutes new data matrix
3. the commercial run method for diagnosing faults merged based on multi-categorizer and D-S evidence according to claim 1, it is characterized in that, described step (6) is specially: according to the characteristics of objects selected, in order to can process simultaneously, there is process linear, non-linear, non-gaussian, selecting G to be 6, multi-categorizer method specifically comprises: unsupervised approaches: pivot analysis (PCA), core pivot element analysis (KPCA), Independent component analysis (ICA); There is measure of supervision: Fei Sheer differentiates (FDA), k-neighbour (KNN), neural network (BP-ANN);
A (), for the method without supervision, specific implementation step is as follows:
(1) analyzed by PCA, obtain the covariance matrix �� �� R of data matrixn��n, unitary matrix U �� Rn��m, eigenwert form diagonal matrix D �� Rm��mAs follows:
Σ = X = X = T / ( n - 1 ) Σ = UDU T D = diag ( λ i ) , i = 1 , . . . , m U = [ u 1 , u 2 , . . . , u m ] - - - ( 1 )
Wherein,Representing new data matrix collection, �� represents covariance matrix, and U represents unitary matrix, and n represents number of training, and m is variable number, and T is the transposition of matrix, D representation feature value ��iThe diagonal matrix formed, and its diagonal element is according to by greatly to the arrangement of little order, diag () represents that the amount in bracket is pressed diagonal lines to be arranged, umRepresent the m column vector forming U;
Then load matrix P �� R is obtained on its basism��k, residual error load matrixPivot composition t �� Rn��k, residual matrixAs follows:
P = [ u 1 , u 2 , . . . , u k ] P - = [ u k + 1 , u k + 2 , . . . , u m ] t = X = P C - = P - P - T - - - ( 2 )
Wherein, k is the pivot number extracted, and utilizes accumulative variance contribution ratio > 80% calculate, then construct T2Statistic also utilizes F-distribution to provide fiducial limitTo residual matrixSet up SPE statistic and calculate its corresponding fiducial limit SPElim;
(2) analyzed by KPCA, utilize Radial basis kernel function, raw data is mapped by higher-dimension, obtain the eigenwert of high dimension space, proper vector and score, and utilize accumulative variance contribution ratio > and 80% calculate pivot number k, then PCA method obtains corresponding load matrix, pivot composition described above;
Construct T equally2Statistic also utilizes F-distribution to provide fiducial limitResidual matrix is set up SPE statistic and calculates its corresponding fiducial limit SPElim;
(3) analyzed by ICA, it is possible to obtain the independent component matrix S �� R of this data matrixr��n, mixing matrix A �� Rm��r, separation matrix W �� Rr��mAnd residual matrixAs follows:
X = = AS + E = S = W X = E = = X = - AS - - - ( 3 )
Wherein, r is the independent component number chosen; Then, it is to construct I2Statistic also utilizes Density Estimator method to provide its corresponding fiducial limitTo residual matrixSet up SPE statistic and calculate its corresponding fiducial limit SPElim;
B (), for there being measure of supervision, specific implementation step is as follows:
(1) by Fei Sheer method of discrimination, find out all kinds of between the most suitable projecting direction, and determine the position of the central point of each class;
(2) by K-near neighbor method, set 5 neighbour's points, add class label to modeling data;
(3) by neural net method, selecting the two layers of BP network comprising three hidden nodes, hidden layer selects tansig function, and output layer selects purelin function, training network model.
4. the commercial run method for diagnosing faults merged based on multi-categorizer and D-S evidence according to claim 1, it is characterised in that, described step (7) is specially: for test data set
A () calls the classifier methods without supervision, concrete performing step is as follows:
(1) PCA method, according to all kinds of parameters obtained in above-mentioned steps, calculates the T of G class fault sample under G kind classifier methods2Adding up value with SPE, G gets 6, utilizes the index combining these two statistics to do fault reconstruct, that is, combines discriminant index ��i, as follows:
λ i = ( 1 - α i ) × T 2 / T lim . i 2 + α i × S P E / SPE lim . i , i = 1 , 2 , ... , C + 1 - - - ( 4 )
Wherein, ��iFor score tiCumulative proportion in ANOVA, i is different classifier methods, T2It is the statistical value of test sample book with SPE,And SPElim.iIt is respectively the fiducial limit that step (6) modeling obtains, gets the minimum person in C+1 as the testing classification of certain sample, finally obtain the matrix of a C* (C+1);
(2) KPCA method, with above-mentioned PCA method;
(3) ICA method, by the T in above-mentioned PCA method2Statistic changes I into2Statistic, fault reconstructing method afterwards is identical with PCA method, obtains the matrix of a C* (C+1) equally;
B () calls the classifier methods of supervision, concrete performing step is as follows:
(1) for Fei Sheer method of discrimination, calculate the Europe formula distance between test sample book and all kinds of central points, apart from label corresponding to that minimum classification as final label, obtain the matrix of a C* (C+1) equally;
(2) for K-near neighbor method, calculate the distance between test sample book and known class label, obtain final label, obtain the matrix of a C* (C+1) equally;
(3) for neural net method, utilize the network model trained and obtain, calculate the output label of test sample book, obtain the matrix of a C* (C+1) equally.
5. according to claim 1 based on the commercial run method for diagnosing faults of multi-categorizer and D-S evidence, it is characterized in that, described step (8) is specially: in order to the similarity weighed between different classifier methods, adopt based on the similarity index method of calculation merging matrix, as follows:
corr i j = cov ( c i _ m , c j _ m ) D ( c i _ m ) D ( c j _ m ) - - - ( 5 )
Wherein, what ci_m and cj_m represented is the fusion matrix of different classifier methods, and D (ci_m) and D (cj_m) is respectively variance value corresponding to the fusion matrix of difference classifier methods, corrijWhat represent is the linear dependence index between two kinds of sorters.
6. according to claim 1 based on the commercial run method for diagnosing faults of multi-categorizer and D-S evidence, it is characterised in that, described step (11) is specially: for the new data after normalization methodDifferent models is adopted it to be monitored respectively;
A (), for the method without supervision, sets up corresponding monitoring and statistics amount as follows:
(1) PCA is analyzed
t new = X = new P SPE new = | | C - X = new | | 2 T new 2 = | | D r - 1 / 2 t new | | 2 = | | D r - 1 / 2 P T X = new | | - - - ( 6 )
Wherein,For the new data after normalization method, tnewFor the pivot of new data,For residual matrix, P is load matrix, SPEnewFor the SPE of new data adds up value, | | | |2Represent 2-norm, DrRepresent the diagonal matrix being made up of front r eigenwert, Tnew 2For the T of new data2Statistics value, T is the transposition of matrix;
(2) KPCA is analyzed, with above-mentioned PCA process;
(3) ICA is analyzed:
s new = W x = new e = new = x = new - A s new I new 2 = s new T s new - - - ( 7 )
Wherein, W is separation matrix,For the new data after normalization method, snewFor the independent component vector extracted based on new data,For the I of new data2Statistic, continues for residual vectorSetting up SPE statistic is SPEnew:
SPE n e w = e ‾ ‾ n e w e ‾ ‾ n e w T - - - ( 8 )
B (), for the method having supervision, obtains corresponding class label:
(1) for Fei Sheer method of discrimination, the Europe formula distance between new sample and central point of all categories is calculated, apart from label corresponding to that minimum classification as final label;
(2) for K-near neighbor method, calculate the distance method between new sample and known class label, obtain final label;
(3) for neural net method, utilize the network model trained and obtain, calculate the output label of new sample.
7. the commercial run method for diagnosing faults merged based on multi-categorizer and D-S evidence according to claim 1, it is characterised in that, described step (12) is specially:
A () first calls different classifier methods, calculate corresponding basic probability assignment function, as follows:
m g ( C i ) = N i j g Σ i = 1 G N i j g , g = 1 , 2 , ... , G - - - ( 9 )
Wherein,Refer to the element of the i-th row jth row in the fusion matrix of g kind classifier methods, mg(Ci) refer to that sample is assigned to C by g kind sorteriThe probable value of class, is also basic probability assignment function value, and G is the number selecting classifier methods;
The fault classification assumed in fault database has C=1,2 ... L kind, for different classifier methods method, has:
m method ( 1 ) = p method ( 1 ) ; m method ( 2 ) = p method ( 2 ) ; . . . m method ( L ) = p method ( L ) ; - - - ( 10 )
Wherein, mmethod(1) that represent is the probable value p that the first kind assigned to by sample by method methodmethod(1), same mmethod(L) that represent is the probable value p that L class assigned to by sample by method methodmethod(L);
B () is in same sampling instant, call the output under the G kind classifier methods of selection, G gets 6, according to similarity index calculated before, selectively select classifier methods, picking out corresponding detected result is the probable value that fault occurs, and utilizes following D-S fusion rule, obtains last basic probability assignment function:
Wherein, represent orthogonal and, definition as shown in formula (11), wherein set A, B, C represent different fault class set respectively, and A is the common factor of set B and C, �� represent is common factor be empty set joint probability assignment function value, m1And m2Represent the first and the 2nd kind of sorter respectively, m1,2(A) it is the probable value after two kinds of classifier methods are combined, and the result after normalization method;
m 1,2 , . . . , K = m 1 ⊕ m 2 ⊕ . . . ⊕ m K = ( ( ( m 1 ⊕ m 2 ) ⊕ m 3 ) ⊕ . . . ⊕ m K ) = ( ( m 1,2 ⊕ m 3 ) ⊕ . . . ⊕ m K ) . . . - - - ( 13 )
m1,2,...,KWhat represent is the probable value after K kind Classifier combination, and it is the associating by first doing two methods, then with the 3rd method associating, analogizes with this; mKRepresent K kind sorter;
C) for the basic probability assignment function value after fusion, bigger value is selected, as last result Final (Ai):
F i n a l ( A i ) = argmax i [ m 1 , 2 , ... G ( A i ) ] , i = 1 , 2 , ... , C + 1 - - - ( 14 )
Wherein, m1,2,...G(Ai) what represent is joint probability assignment function total under G kind classifier methods.
CN201410128630.7A 2014-04-01 2014-04-01 Based on the commercial run method for diagnosing faults that multi-categorizer and D-S evidence merge CN103914064B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410128630.7A CN103914064B (en) 2014-04-01 2014-04-01 Based on the commercial run method for diagnosing faults that multi-categorizer and D-S evidence merge

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410128630.7A CN103914064B (en) 2014-04-01 2014-04-01 Based on the commercial run method for diagnosing faults that multi-categorizer and D-S evidence merge

Publications (2)

Publication Number Publication Date
CN103914064A CN103914064A (en) 2014-07-09
CN103914064B true CN103914064B (en) 2016-06-08

Family

ID=51039824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410128630.7A CN103914064B (en) 2014-04-01 2014-04-01 Based on the commercial run method for diagnosing faults that multi-categorizer and D-S evidence merge

Country Status (1)

Country Link
CN (1) CN103914064B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104133992A (en) * 2014-07-21 2014-11-05 快威科技集团有限公司 Assessment reference building method and assessment reference building device based on information security assessment correlation
CN104240432B (en) * 2014-10-10 2016-09-07 西安石油大学 Mn-rich slag production safety based on information fusion monitoring method
CN104536439B (en) * 2015-01-20 2017-03-01 浙江大学 A kind of method for diagnosing faults based on nested iterations Fei Sheer discriminant analysiss
CN104914854B (en) * 2015-05-29 2017-05-10 东北大学 Industrial process fault diagnosis method based on KPCA
CN106679886A (en) * 2015-11-07 2017-05-17 北京自动化控制设备研究所 Nonlinear fault detecting and identifying method of self-confirming air data system
CN105512460A (en) * 2015-11-25 2016-04-20 国网福建省电力有限公司 Load matrix reduction method of cables/overhead combined lines
CN107103171B (en) * 2016-02-19 2020-09-25 阿里巴巴集团控股有限公司 Modeling method and device of machine learning model
CN106355030B (en) * 2016-09-20 2019-01-25 浙江大学 A kind of fault detection method based on analytic hierarchy process (AHP) and Nearest Neighbor with Weighted Voting Decision fusion
CN106371427B (en) * 2016-10-28 2019-03-29 浙江大学 Industrial process Fault Classification based on analytic hierarchy process (AHP) and fuzzy Fusion
CN106779296A (en) * 2016-11-22 2017-05-31 华中科技大学 A kind of constructing tunnel Adjacent Buildings safe early warning method based on multisensor
CN107273818B (en) * 2017-05-25 2020-10-16 北京工业大学 Selective integrated face recognition method based on genetic algorithm fusion differential evolution
CN107563997B (en) * 2017-08-24 2020-06-02 京东方科技集团股份有限公司 Skin disease diagnosis system, construction method, classification method and diagnosis device
CN107657274A (en) * 2017-09-20 2018-02-02 浙江大学 A kind of y-bend SVM tree unbalanced data industry Fault Classifications based on k means
CN108760302A (en) * 2018-05-08 2018-11-06 南京风电科技有限公司 A kind of on-line monitoring and fault diagnosis system of wind power generating set bearing
CN108919755B (en) * 2018-06-11 2020-06-16 宁波大学 Distributed fault detection method based on multiple nonlinear cross relation models
CN109145968A (en) * 2018-08-03 2019-01-04 杭州电力设备制造有限公司 A kind of power quality event classification method, system, device and readable storage medium storing program for executing
CN109325553B (en) * 2018-12-04 2020-12-01 山东建筑大学 Wind power gear box fault detection method, system, equipment and medium
CN110057588B (en) * 2019-05-09 2020-07-03 山东大学 Bearing early fault detection and diagnosis method and system based on fusion of singular value and graph theory characteristics
CN110530631B (en) * 2019-08-21 2021-02-12 贵州大学 Gear single-type fault detection method based on hybrid classifier

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102004022748A1 (en) * 2004-05-07 2005-12-08 Abb Research Ltd. Automatic fault diagnosis in a technical system such as a production plant uses real time acquired data
CN102339389A (en) * 2011-09-14 2012-02-01 清华大学 Fault detection method for one-class support vector machine based on density parameter optimization
CN102498445A (en) * 2009-09-17 2012-06-13 西门子公司 Supervised fault learning using rule-generated samples for machine condition monitoring
CN103093238A (en) * 2013-01-15 2013-05-08 江苏大学 Visual dictionary construction method based on Dempster-Shafer (D-S) evidence theory
US20130332773A1 (en) * 2012-06-12 2013-12-12 Siemens Aktiengesellschaft Generalized pattern recognition for fault diagnosis in machine condition monitoring

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102004022748A1 (en) * 2004-05-07 2005-12-08 Abb Research Ltd. Automatic fault diagnosis in a technical system such as a production plant uses real time acquired data
CN102498445A (en) * 2009-09-17 2012-06-13 西门子公司 Supervised fault learning using rule-generated samples for machine condition monitoring
CN102339389A (en) * 2011-09-14 2012-02-01 清华大学 Fault detection method for one-class support vector machine based on density parameter optimization
US20130332773A1 (en) * 2012-06-12 2013-12-12 Siemens Aktiengesellschaft Generalized pattern recognition for fault diagnosis in machine condition monitoring
CN103093238A (en) * 2013-01-15 2013-05-08 江苏大学 Visual dictionary construction method based on Dempster-Shafer (D-S) evidence theory

Also Published As

Publication number Publication date
CN103914064A (en) 2014-07-09

Similar Documents

Publication Publication Date Title
CN102496069B (en) Cable multimode safe operation evaluation method based on fuzzy analytic hierarchy process (FAHP)
CN104595170B (en) A kind of air compressor machine monitoring and diagnosis system and method for self-adaptive kernel gauss hybrid models
CN101251564B (en) Method for diagnosis failure of power transformer using extension theory and rough set theory
CN101169623B (en) Non-linear procedure fault identification method based on kernel principal component analysis contribution plot
CN106769052B (en) A kind of mechanical system rolling bearing intelligent failure diagnosis method based on clustering
CN105279365B (en) For the method for the sample for learning abnormality detection
CN106168799B (en) A method of batteries of electric automobile predictive maintenance is carried out based on big data machine learning
CN106779505B (en) Power transmission line fault early warning method and system based on big data driving
CN105760839A (en) Bearing fault diagnosis method based on multi-feature manifold learning and support vector machine
CN103033362B (en) Gear fault diagnosis method based on improving multivariable predictive models
CN105955219B (en) Distributed dynamic procedure failure testing method based on mutual information
CN103927412B (en) Instant learning debutanizing tower soft-measuring modeling method based on gauss hybrid models
CN102778355B (en) Rolling bearing state identification method based on empirical mode decomposition (EMD) and principal component analysis (PCA)
CN102208028B (en) Fault predicting and diagnosing method suitable for dynamic complex system
CN103337043B (en) The method for early warning of electric power communication device running status and system
CN104535865A (en) Comprehensive diagnosing method for operation troubles of power transformer based on multiple parameters
CN106845717B (en) Energy efficiency evaluation method based on multi-model fusion strategy
CN106529090B (en) A kind of aerospace electron class Reliability Assessment method
CN104020401B (en) The appraisal procedure of transformer insulated heat ageing state based on cloud models theory
CN100470417C (en) Fault diagnostic system and method for under industrial producing process small sample condition
CN103245907B (en) A kind of analog-circuit fault diagnosis method
CN104793606B (en) Industrial method for diagnosing faults based on improved KPCA and HMM
CN107357275B (en) Non-gaussian industrial process fault detection method and system
CN103713628B (en) Fault diagnosis method based on signed directed graph and data constitution
CN104502103A (en) Bearing fault diagnosis method based on fuzzy support vector machine

Legal Events

Date Code Title Description
PB01 Publication
C06 Publication
SE01 Entry into force of request for substantive examination
C10 Entry into substantive examination
GR01 Patent grant
C14 Grant of patent or utility model
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160608

Termination date: 20190401

CF01 Termination of patent right due to non-payment of annual fee