CN106681305A - Online fault diagnosing method for Fast RVM (relevance vector machine) sewage treatment - Google Patents

Online fault diagnosing method for Fast RVM (relevance vector machine) sewage treatment Download PDF

Info

Publication number
CN106681305A
CN106681305A CN201710000827.6A CN201710000827A CN106681305A CN 106681305 A CN106681305 A CN 106681305A CN 201710000827 A CN201710000827 A CN 201710000827A CN 106681305 A CN106681305 A CN 106681305A
Authority
CN
China
Prior art keywords
sample
data
fast
vector machine
historical data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710000827.6A
Other languages
Chinese (zh)
Inventor
许玉格
邓文凯
陈立定
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201710000827.6A priority Critical patent/CN106681305A/en
Publication of CN106681305A publication Critical patent/CN106681305A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B23/00Testing or monitoring of control systems or parts thereof
    • G05B23/02Electric testing or monitoring
    • G05B23/0205Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
    • G05B23/0218Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults
    • G05B23/0243Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults model based detection method, e.g. first-principles knowledge model
    • G05B23/0254Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults model based detection method, e.g. first-principles knowledge model based on a quantitative model, e.g. mathematical relationships between inputs and outputs; functions: observer, Kalman filter, residual calculation, Neural Networks

Abstract

The invention discloses an online fault diagnosing method for Fast RVM (relevance vector machine) sewage treatment. The method includes the steps of firstly, removing samples with incomplete attributes in sewage data, normalizing the samples into a [0, 1] interval, and determining a historical data set and an updating test set; secondly, using a relevance vector machine method based on clustering to compress the majority data of the historical data set; thirdly, using a virtual minority upward sampling method to extend the minority data of the historical data set; fourth, building a 'one-to-one' fast relevance vector machine multi-classification training model; fifthly, adding new samples from the updating test set into the model for testing, and updating the historical data set; sixthly, returning to the second step, reprocessing unbalanced historical data, training the model, and repeating the above process until online data testing is finished. By the online fault diagnosing method, the unbalance of the sewage data is lowered effectively, classification accuracy is increased, online updating speed is increased, operation faults can be diagnosed in real time, and the safety operation of a sewage treatment plant is guaranteed.

Description

A kind of Fast RVM sewage disposals on-line fault diagnosis method
Technical field
The present invention relates to sewage treatment area, more particularly to a kind of Fast RVM sewage disposals on-line fault diagnosis method.
Background technology
At present, environmental conservation has become the important foundation of China's sustainable economic development, as China's industrial economy is sent out Exhibition is rapid, and city process is constantly accelerated, the discharge capacity of the industrial wastewater rapid growth with the increase of industrial water consumption, most of The direct discharge of waste water again severe contamination rivers water bodys, destroy ecological balance, indirectly have impact on the life of people.Sewage Used as the crucial protective barrier of natural water, its operation is fine or not will to directly affect the safe coefficient of water environment for treatment plant.Sewage is given birth to Change complex treatment process, influence factor is very more, and sewage treatment plant is difficult to the fortune for maintaining a long-term stability in actual moving process OK, once operation troubles occurs can usually cause that effluent quality is up to standard, operating cost increases serious with secondary environmental pollution etc. Problem.Therefore, it is necessary to be monitored to sewage treatment plant's running status, sewage disposal process failure is diagnosed to be in time and is located Reason.
The fault diagnosis of sewage disposal process running status is substantially a pattern classification problem, and in virtual condition fortune In row classification, the skewness weighing apparatus problem of sewage data set can be usually run into, prior art has some limitations, is being used for When unbalanced data is classified, category of model accuracy cannot meet requirement, and to the fault diagnosis of biochemical wastewater treatment pole is brought Big difficulty;Simultaneously in the middle of real process, fault diagnosis is actually a continuous learning process, its spy for projecting Point study is not once to carry out offline, but what data were added one by one, the process being constantly optimized.On-line study side Method requires that before next data are obtained training must be completed, and can otherwise affect completing for next step decision-making, and at sewage The fault message that the operation of reason factory occurs is particularly important, so what online system failure diagnosis more focused on is rapidity and accurately Property.
The content of the invention
It is an object of the invention to overcome the deficiencies in the prior art, there is provided a kind of Fast clustered based on unbalanced data RVM sewage disposal on-line fault diagnosis methods, by the fast correlation vector machine method based on cluster to many several classes of data compressions With virtual minority class to the method for up-sampling to minority class data extending, the disequilibrium of sewage data is reduced, improve classification Accuracy rate, while many disaggregated models are set up to biochemical processing procedure of sewage using Fast RVM, accelerates online updating speed, so as to Ensure that the accurate forthright and real-time of the on-line fault diagnosis of sewage disposal process.
For achieving the above object, technical scheme provided by the present invention is:A kind of online failure of Fast RVM sewage disposals Diagnostic method, comprises the following steps:
S1. the incomplete sample of attribute in sewage data is weeded out, due to the difference of each input variable dimension, it is carried out Normalized, in normalizing to [0,1] interval, and determines history data set xoldWith renewal test set xnew
S2. many several classes of samples in historical data are compressed using the fast correlation vector machine method for being based on cluster;
S3. according to virtual minority class the minority class sample in historical data is expanded to the method for up-sampling;
S4. the sample data of all classes in the historical data after process is reconfigured and constitutes new history training set, and Set up many classification based training models of fast correlation vector machine of " one-to-one ";
S5. from renewal test set xnewK new samples of middle addition are tested in model, and preserve class test result, Historical data concentration is added to, removes k sample before historical data is concentrated;
S6. step S2 is returned to, unbalanced historical data is processed again, training pattern, continuous repeatedly said process, until Online updating data test is finished, and obtains final on-line testing result, so as to realize the on-line operation shape to sewage disposal process The identification of state.
Described step S2, specially:
S201, many several classes of sample set X={ x of hypothesis1,x2,…,xi,…,xnIt is n RdThe data in space, wherein d are sample The dimension of this attribute, randomly chooses k object as initial cluster centre from n data object;
S202 then to remaining sample object then according to the distance of each cluster centre be separately dispensed into distance most phase In near cluster centre;The formula of computed range is as follows, it is assumed that cjFor the center of j-th class, then xiWith cjDistance be:
S203, the point in set update the cluster centre of each class, it is assumed that the sample of j-th apoplexy due to endogenous wind isContain njIndividual sample, then such cluster centre beWherein For class center cjM-th attribute, computing formula is as follows:
S204, constantly repeat S202, S203 step, till canonical measure function convergence, using mean square deviation as meter Canonical measure function is calculated, its form is:
S205, by cluster after many several classes of samples carry out fast correlation vector machine classification model construction, it is certain such that it is able to obtain The associated vector of quantity, the number of these associated vectors than original many several classes of data much less, and with certain representative Property, then replace original many several classes of samples so as to the compression to many several classes of samples with the associated vector chosen.
Described step S3, specially:
S301, to each sample x in minority class, with Euclidean distance as criterion calculation, it is every in minority class sample set The distance of individual sample, obtains wherein k arest neighbors, and records the subscript of neighbour's sample;
S302, according to up-sampling multiplying power N, to each minority class sample x, from its k arest neighbors N is randomly selected Individual sample, is designated as y1,y2,…,yN
S303, in former sample x and yjStochastic linear interpolation is carried out between (j=1,2 ..., N), new minority class sample is constructed This pj, i.e. new samples:
pj=x+rand (0,1) * (yj- x), j=1,2 ..., N (4)
Wherein rand (0,1) represents a random number in interval (0,1).
In step S4, many classification based training models of fast correlation vector machine of " one-to-one ", it is as follows that it sets up process:
Historical data after process and can be defined asWherein N is the sample of data set This number, n is sample sequence number, and d is the dimension of sample attribute, znFor the input of sample, tnFor the desired value of sample, anticipation function As shown in formula one:
tn=y (zn;w)+εn (5)
The wherein definition of y (z) such as formula (shown in 2)
Wherein K (z, zi) it is kernel function, wiFor the corresponding weight of basic function, w=[w0,w1,…,wN]TnFor noise, clothes From εn~N (0, σ2), therefore tn~N (y (zn,w),σ2).Assume prediction target tnBetween it is separate, then just have:
Φ is the structural matrix of a N × (N+1) in formula, in order to avoid over-fitting, the weights ω needed restraint in model, Assume its obedienceGauss distribution, α is hyper parameter.When one group of new variable is input into, corresponding desired value t* is p (t* | t)~p (w, α, σ2| t), it is distributed according to prior probability distribution and possibility predication, the Posterior probability distribution of weight can be obtained:
p(ω,α,σ2| t)=p (ω | t, α, σ2)p(α,σ2|t) (8)
Approximate processing is carried out to above formula, finally into maximization p (α, σ2|t)∝p(t|α,σ2)p(α)p(σ2) process, Namely find parameter alpha and σ2Most likely value αMP
Fast correlation vector machine starts dynamically to expand basic matrix Φ in the training process from empty set, so as to increase limit seemingly Right function, or remove the row of basic matrix Φ redundancies increasing object function.By by border likelihood function p (t | α, σ2) take it is right Number, and note L (α)=log [p (t | α, σ2)], arrangement has:
Wherein L (α-i) be expressed as working as αiDuring=∞, corresponding basis vector φiCorresponding border likelihood letter after being removed Several logarithms, and l (αi) represent in the logarithmic function of border likelihood only and αiRelevant independent sector.SiBe defined as it is sparse because Son, QiFor quality factor.L (α) has unique maximum of points to be:
In order to maximize L (α), according to formula (10), constantly iteration to be searching out suitable weight, at this moment hyper parameter α Also can constantly update against weight w, by being continuously updated, final training pattern can be obtainedThe corresponding weight of some sample points is zero, and those points being not zero are exactly associated vector.It is comprehensive Upper described, fast correlation vector machine classification rudimentary algorithm step is as follows:
(1) σ is initialized2=0;
(2) with single base vector φiInitialization αi, can be obtained by formula (10) analysis and arrangementAnd Others α is setm(m ≠ i) is infinity;
(3) covariance matrix Σ, weight matrix μ are calculated and to all M basic function φmInitialization SmAnd Qm
(4) from all M basic function φmThe base vector φ of candidate is selected in seti
(5) calculate
(6) if θi> 0 and αi< ∞, reevaluate αi
(7) if θi> 0 and αi=∞, adds φiTo in model and reevaluate αi
(8) if θi≤ 0 and αi< ∞, delete φiAnd α is seti=∞;
(9) covariance matrix Σ is recalculated with Laplace approach methods, in weight matrix μ and corresponding iterative process SmAnd Qm
(10) if restraining or reaching maximum iteration time, terminator;Otherwise go to step (4);End condition is:Appoint The meaning corresponding α of basic function in a modeli, there is αi< le12 and
Set up fast correlation vector machine and after disaggregated model, then multiple two graders are adopted into " one-to-one " method phase With reference to, a multi-categorizer is set up, if sample to be sorted is k classification, any two class of this k apoplexy due to endogenous wind can constitute a base This grader of fast correlation vector machine two, pairwise classification is carried out to all of training sample, and such k classification is between any two altogether Meter may be constructedThe individual grader of fast correlation vector machine two, each fast correlation vector machine classifier is only respective It is trained on corresponding sample set.It is using the method for ballot, each is to be measured when carrying out class test to unknown sample Sample is all through allIndividual grader is differentiated.For example, when sample is classified between the class of i, j two, machine differentiates knot Really it belongs to the i-th class, just increases by 1 ticket in the i-th class, otherwise Jia 1 to the ballot of jth class, until the classification of all of grader is completed, Finally count who gets the most votes's class and be test sample generic.
If classification function fijX () is used for differentiating the class sample of i, j two, if fijX () < 0, then differentiate that x belongs to the i-th class, remembers i classes 1 ticket, otherwise sentence x and belong to jth class, note j classes obtain 1 ticket, during last decision-making, compare ticket which kind of obtains at most, then will test Sample is planned to such.
The present invention compared with prior art, has the advantage that and beneficial effect:
1st, the present invention establishes a kind of on-line fault diagnosis of the Fast RVM sewage disposals clustered based on unbalanced data Model, by the fast correlation vector machine method based on cluster to many several classes of data compressions and virtual minority class to top sampling method To minority class data extending, the disequilibrium of sewage data is reduced, while using Fast RVM to biochemical processing procedure of sewage Many disaggregated models are set up, accelerates online updating speed, then real-time diagnosis and more new model are carried out according to operating mode interpolation data, etc. Fault diagnosis next time is treated, so as to establish on-line fault diagnosis model.The on-time model is improve to biochemical wastewater treatment system The fault diagnosis precision of system, on-line performance is good, effect is significant.
2nd, model of the invention is to many several classes of data compressions and virtual minority class based on the fast correlation vector machine for clustering To top sampling method to minority class data extending, the disequilibrium of sewage data is reduced, can not only be obtained in equilibrium criterion Good result, but also reasonable classifying quality can be obtained in unbalanced data, Fast RVM are employed on this basis The multi-categorizer of foundation, its key point is that its hyper parameter to training sample carries out Fast estimation, removes the non-of training sample Associated vector, it is ensured that model it is openness, so as to reduce the training time.Therefore, one kind that the present invention is adopted is based on uneven number On-line fault diagnosis modeling is carried out to sewage disposal process according to the on-line fault diagnosis method of the Fast RVM sewage disposals of cluster, Ensure that the accurate forthright and real-time of the on-line fault diagnosis of sewage disposal process.
3rd, when in-circuit emulation of the present invention is tested, need the data new to each group to be tested and added model to carry out more Newly.History data set taking restricted memory by way of keeping its capacity, make training data all the time be limited group, often increase As soon as the newest observation data of group, abandon immediately one group of earliest observation data, so as to ensure model in all comprising new data Information, it is to avoid data message contained by history floods the information that new data is included.
Description of the drawings
Fig. 1 is Fast RVM sewage disposal on-line fault diagnosis method stream of the model of the present invention based on unbalanced data cluster Cheng Tu.
Fig. 2 is model fast correlation vector machine sorting algorithm flow chart of the present invention.
Fig. 3 is many disaggregated model schematic diagrams of fast correlation vector machine of model of the present invention " one-to-one ".
Specific embodiment
With reference to specific embodiment, the present invention is described in further detail.
As shown in figure 1, the Fast RVM sewage disposal on-line fault diagnosis methods that the present invention is provided, based on unbalanced data Cluster, concrete condition is as follows:
S1. the incomplete sample of attribute in sewage data is weeded out, due to the difference of each input variable dimension, it is carried out Normalized, in normalizing to [0,1] interval, and determines history data set xoldWith renewal test set xnew
S2. many several classes of samples in historical data are compressed using the fast correlation vector machine method for being based on cluster;
S3. according to virtual minority class the minority class sample in historical data is expanded to the method for up-sampling;
S4. the sample data of all classes in the historical data after process is reconfigured and constitutes new history training set, and Set up many classification based training models of fast correlation vector machine of " one-to-one ";
S5. from renewal test set xnewK new samples of middle addition are tested in model, and preserve class test result, Historical data concentration is added to, removes k sample before historical data is concentrated;
S6. step S2 is returned to, unbalanced historical data is processed again, training pattern, continuous repeatedly said process, until Online updating data test is finished, and obtains final on-line testing result, so as to realize the on-line operation shape to sewage disposal process The identification of state.
Described step S2, specially:
S201, many several classes of sample set X={ x of hypothesis1,x2,…,xi,…,xnIt is n RdThe data in space, wherein d are sample The dimension of this attribute, randomly chooses k object as initial cluster centre from n data object;
S202 then to remaining sample object then according to the distance of each cluster centre be separately dispensed into distance most phase In near cluster centre;The formula of computed range is as follows, it is assumed that cjFor the center of j-th class, then xiWith cjDistance be:
S203, the point in set update the cluster centre of each class, it is assumed that the sample of j-th apoplexy due to endogenous wind isContain njIndividual sample, then such cluster centre beWherein For class center cjM-th attribute, computing formula is as follows:
S204, constantly repeat S202, S203 step, till canonical measure function convergence, using mean square deviation as meter Canonical measure function is calculated, its form is:
S205, by cluster after many several classes of samples carry out fast correlation vector machine classification model construction, it is certain such that it is able to obtain The associated vector of quantity, the number of these associated vectors than original many several classes of data much less, and with certain representative Property, then replace original many several classes of samples so as to the compression to many several classes of samples with the associated vector chosen.
Described step S3, specially:
S301, to each sample x in minority class, with Euclidean distance as criterion calculation, it is every in minority class sample set The distance of individual sample, wherein k arest neighbors of acquisition, and the subscript of neighbour's sample is recorded, here k takes 5;
S302, according to up-sampling multiplying power N, to each minority class sample x, from its k arest neighbors N is randomly selected Individual sample, is designated as y1,y2,…,yN
S303, in former sample x and yjStochastic linear interpolation is carried out between (j=1,2 ..., N), new minority class sample is constructed This pj, i.e. new samples:
pj=x+rand (0,1) * (yj- x), j=1,2 ..., N (14)
Wherein rand (0,1) represents a random number in interval (0,1).
In step S4, many classification based training models of fast correlation vector machine of " one-to-one ", as shown in figure 3, it sets up process It is as follows:
Historical data after process and can be defined asWherein N is the sample of data set This number, n is sample sequence number, and d is the dimension of sample attribute, znFor the input of sample, tnFor the desired value of sample, anticipation function As shown in formula one:
tn=y (zn;w)+εn (15)
The wherein definition of y (z) such as formula (shown in 2)
Wherein K (z, zi) it is kernel function, wiFor the corresponding weight of basic function, w=[w0,w1,…,wN]TnFor noise, clothes From εn~N (0, σ2), therefore tn~N (y (zn,w),σ2).Assume anticipation function tnBetween it is separate, then just have:
Φ is the structural matrix of a N × (N+1) in formula, in order to avoid over-fitting, the weights ω needed restraint in model, Assume its obedienceGauss distribution, α is hyper parameter.When one group of new variable is input into, corresponding desired value t* is p (t* | t)~p (w, α, σ2| t), it is distributed according to prior probability distribution and possibility predication, the Posterior probability distribution of weight can be obtained:
p(ω,α,σ2| t)=p (ω | t, α, σ2)p(α,σ2|t) (18)
Approximate processing is carried out to above formula, finally into maximization p (α, σ2|t)∝p(t|α,σ2)p(α)p(σ2) process, Namely find parameter alpha and σ2Most likely value αMP
Fast correlation vector machine starts dynamically to expand basic matrix Φ in the training process from empty set, so as to increase limit seemingly Right function, or remove the row of basic matrix Φ redundancies increasing object function.By by border likelihood function p (t | α, σ2) take it is right Number, and note L (α)=log [p (t | α, σ2)], arrangement has:
Wherein L (α-i) be expressed as working as αiDuring=∞, corresponding basis vector φiCorresponding border likelihood letter after being removed Several logarithms, and l (αi) represent in the logarithmic function of border likelihood only and αiRelevant independent sector.SiBe defined as it is sparse because Son, QiFor quality factor.L (α) has unique maximum of points to be:
In order to maximize L (α), according to formula (20), constantly iteration to be searching out suitable weight, at this moment hyper parameter α Also can constantly update against weight w, by being continuously updated, final training pattern can be obtainedThe corresponding weight of some sample points is zero, and those points being not zero are exactly associated vector.Such as Shown in Fig. 2, fast correlation vector machine classification rudimentary algorithm step is as follows:
(1) σ is initialized2=0;
(2) with single base vector φiInitialization αi, can be obtained by formula (20) analysis and arrangementAnd Others α is setm(m ≠ i) is infinity;
(3) covariance matrix Σ, weight matrix μ are calculated and to all M basic function φmInitialization SmAnd Qm
(4) from all M basic function φmThe base vector φ of candidate is selected in seti
(5) calculate
(6) if θi> 0 and αi< ∞, reevaluate αi
(7) if θi> 0 and αi=∞, adds φiTo in model and reevaluate αi
(8) if θi≤ 0 and αi< ∞, delete φiAnd α is seti=∞;
(9) covariance matrix Σ is recalculated with Laplace approach methods, in weight matrix μ and corresponding iterative process SmAnd Qm
(10) if restraining or reaching maximum iteration time, terminator;Otherwise go to step (4);End condition is:Appoint The meaning corresponding α of basic function in a modeli, there is αi< le12 and
Set up fast correlation vector machine and after disaggregated model, then multiple two graders are adopted into " one-to-one " method phase With reference to, a multi-categorizer is set up, if sample to be sorted is k classification, any two class of this k apoplexy due to endogenous wind can constitute a base This grader of fast correlation vector machine two, pairwise classification is carried out to all of training sample, and such k classification is between any two altogether Meter may be constructedThe individual grader of fast correlation vector machine two, each fast correlation vector machine classifier is only respective It is trained on corresponding sample set.It is using the method for ballot, each is to be measured when carrying out class test to unknown sample Sample is all through allIndividual grader is differentiated.For example, when sample is classified between the class of i, j two, machine differentiates knot Really it belongs to the i-th class, just increases by 1 ticket in the i-th class, otherwise Jia 1 to the ballot of jth class, until the classification of all of grader is completed, Finally count who gets the most votes's class and be test sample generic.
If classification function fijX () is used for differentiating the class sample of i, j two, if fijX () < 0, then differentiate that x belongs to the i-th class, remembers i classes 1 ticket, otherwise sentence x and belong to jth class, note j classes obtain 1 ticket, during last decision-making, compare ticket which kind of obtains at most, then will test Sample is planned to such.
Below we combine the concrete data weighting extreme learning machine sewage disposal on-line fault diagnosis above-mentioned to the present invention Method is specifically described, as follows:
The data of experiment simulation, from UCI data bases, are the daily monitoring datas in two years of a sewage treatment plant, whole Individual data set has 527 records including including imperfect record one, each sample dimension for 38 (i.e. 38 measurands, it is right Each is answered to refer to target value), all complete record of whole property values has 380, and monitored water body one has 13 kinds of states, each State numeral replaces (saving state for convenience to claim).527 distribution situations recorded under 13 kinds of states see the table below 1.
Distribution situation of the 1-527 record of table under 13 kinds of states
Classification 1 2 3 4 5 6 7 8 9 10 11 12 13
Number 279 1 1 4 116 3 1 1 65 1 53 1 1
In order to simplify the complexity of classification, sample is divided into 4 big class, such as table 2 below by us according to the property of sample class.
Distribution situation of the 2-527 record of table under 4 kinds of states
Classification 1 2 3 4
Number 332 116 65 14
Classification 1 is normal condition, and classification 2 is the normal condition that performance exceedes meansigma methodss, and classification 3 is that flow of inlet water is low just Reason condition, classification 4 is the failure that the reasons such as the abnormal condition that second pond failure, heavy rain cause and solid solubility overload cause Situation.
The on-line fault diagnosis method of the above-mentioned Fast RVM sewage disposals clustered based on unbalanced data of the present embodiment, The step of comprising following order:
S1. the incomplete data of 147 attributes are weeded out in 527 sewage data first, are obtained 380 attributes and are completely counted According to then by data by formulaNormalized, by the data set after process 2 are pressed:1 ratio is carried out Optimum allocation random stratified sampling survey, obtains history data set xoldWith online updating test set xnew
S2. many several classes of samples (first kind) that historical data is concentrated are extracted, is polymerized to using K-means methods Two classes, are then modeled the primary sources after cluster using fast correlation vector machine method, obtain appropriate number of phase Vector is closed, many several classes of samples are replaced with selected associated vector;
S3. according to the multiplying power to up-sampling, using method from virtual minority class to up-sampling by the minority in historical sample Class sample (the 3rd class and the 4th class) is expanded;
S4. by process after the historical sample data of all classes reconfigure and constitute new history training set, such as the institute of table 3 Show, set up many classification based training models of fast correlation vector machine of " one-to-one ".Many classification based training model selection RBF kernel functions, core Width parameter by being determined using the trellis search method of 5 folding cross validations to new training set, then according to a total of four Individual classification, sets up altogether 6 two graders;
S5. from online updating test set xnewIn take k new samples and tested in multi-categorizer model, 6 are classified Device distinguishes input test collection xnew, voted, class test result is preserved, historical data concentration is added to, remove history Front k sample in data set;
S6. step S2 is returned to, re -training model, continuous repeatedly said process, until online updating data test is finished, Final on-line testing result is obtained, so as to realize the identification of the on-line operation state to sewage disposal process.The present invention is adopted Based on cluster Fast RVM sewage disposal on-line fault diagnosis models can be good at meet require, so as to realize to sewage The real-time monitoring of processing procedure running status and control, are worthy to be popularized.
Distribution situation of the 2-527 record of table under 4 kinds of states
The examples of implementation of the above are only the preferred embodiments of the invention, not limit the enforcement model of the present invention with this Enclose, therefore the change that all shapes according to the present invention, principle are made, all should cover within the scope of the present invention.

Claims (4)

1. a kind of Fast RVM sewage disposals on-line fault diagnosis method, it is characterised in that comprise the following steps:
S1. the incomplete sample of attribute in sewage data is weeded out, due to the difference of each input variable dimension, normalizing is carried out to it Change is processed, and in normalizing to [0,1] interval, and determines history data set xoldWith renewal test set xnew
S2. many several classes of samples in historical data are compressed using the fast correlation vector machine method for being based on cluster;
S3. according to virtual minority class the minority class sample in historical data is expanded to the method for up-sampling;
S4. the sample data of all classes in the historical data after process is reconfigured and constitutes new history training set, and set up The many classification based training models of fast correlation vector machine of " one-to-one ";
S5. from renewal test set xnewK new samples of middle addition are tested in model, and preserve class test result, by it It is added to historical data concentration, removes k sample before historical data is concentrated;
S6. step S2 is returned to, unbalanced historical data is processed again, training pattern, continuous repeatedly said process, until online Update the data and be completed, obtain final on-line testing result, so as to realize the on-line operation state to sewage disposal process Identification.
2. a kind of Fast RVM sewage disposals on-line fault diagnosis method according to claim 1, it is characterised in that described The step of S2, specially:
S201, many several classes of sample set X={ x of hypothesis1,x2,…,xi,…,xnIt is n RdThe data in space, wherein d are sample category Property dimension, from n data object randomly choose k object as initial cluster centre;
S202 then to remaining sample object then according to the distance of each cluster centre be separately dispensed into distance it is most close In cluster centre;The formula of computed range is as follows, it is assumed that cjFor the center of j-th class, then xiWith cjDistance be:
d ( x i , c j ) = ( x i 1 - c j 1 ) 2 + ... + ( x i m - c j m ) 2 + ... + ( x i d - c j d ) 2 - - - ( 1 )
S203, the point in set update the cluster centre of each class, it is assumed that the sample of j-th apoplexy due to endogenous wind isContain njIndividual sample, then such cluster centre beWherein For class center cjM-th attribute, computing formula is as follows:
c j m = x j 1 m + x j 2 m + ... + x jm j m n j - - - ( 2 )
S204, continuous repeat step S202, S203, till canonical measure function convergence, are marked using mean square deviation as calculating Quasi- measure function, its form is:
J = Σ j = 1 k Σ q = 1 n j ( d ( x j q - c j ) ) 2 n - 1 - - - ( 3 )
S205, by cluster after many several classes of samples carry out fast correlation vector machine classification model construction, so as to obtain setting quantity phase Vector is closed, the number of these associated vectors, with certain representativeness, is then used than original many several classes of data much less The associated vector of selection replaces original many several classes of samples so as to the compression to many several classes of samples.
3. a kind of Fast RVM sewage disposals on-line fault diagnosis method according to claim 1, it is characterised in that described The step of S3, specially:
S301, to each sample x in minority class, with Euclidean distance as criterion calculation it to each sample in minority class sample set This distance, obtains wherein k arest neighbors, and records the subscript of neighbour's sample;
S302, according to up-sampling multiplying power N, to each minority class sample x, from its k arest neighbors N number of sample is randomly selected This, is designated as y1,y2,…,yN
S303, in former sample x and yjStochastic linear interpolation is carried out between (j=1,2 ..., N), new minority class sample p is constructedj, That is new samples:
pj=x+rand (0,1) * (yj- x), j=1,2 ..., N (4)
Wherein rand (0,1) represents a random number in interval (0,1).
4. a kind of Fast RVM sewage disposals on-line fault diagnosis method according to claim 1, it is characterised in that in step In rapid S4, many classification based training models of fast correlation vector machine of " one-to-one ", it is as follows that it sets up process:
Historical data after process is defined aszn∈Rd,tn∈ R, wherein N are the number of samples of data set, and n is sample Sequence number, d is the dimension of sample attribute, znFor the input of sample, tnFor the desired value of sample, anticipation function is as shown in formula one:
tn=y (zn;w)+εn (5)
Wherein y (z) is defined as follows formula
y ( z ; w ) = Σ i = 0 N ω i K ( z , z i ) + w 0 - - - ( 6 )
Wherein K (z, zi) it is kernel function, wiFor the corresponding weight of basic function, w=[w0,w1,…,wN]TnFor noise, ε is obeyedn~ N(0,σ2), therefore tn~N (y (zn,w),σ2), it is assumed that prediction target tnBetween it is separate, then just have:
p ( t | σ 2 , ω ) = Π i = 1 N N ( t i | y ( z i , ω ) , σ 2 ) = ( 2 πσ 2 ) - N 2 exp ( - | | t - Φ ω | | 2 σ 2 ) - - - ( 7 )
Φ is the structural matrix of a N × (N+1) in formula, in order to avoid over-fitting, the weights ω needed restraint in model, it is assumed that Its obedienceGauss distribution, α is hyper parameter, when be input into one group of new variable when, corresponding desired value t* be p (t* | T)~p (w, α, σ2| t), it is distributed according to prior probability distribution and possibility predication, obtains the Posterior probability distribution of weight:
p(ω,α,σ2| t)=p (ω | t, α, σ2)p(α,σ2|t) (8)
Approximate processing is carried out to above formula, finally into maximization p (α, σ2|t)∝p(t|α,σ2)p(α)p(σ2) process, also It is to find parameter alpha and σ2Most likely value αMP
Fast correlation vector machine starts dynamically to expand basic matrix Φ in the training process from empty set, so as to increase marginal likelihood letter Number, or remove the row of basic matrix Φ redundancies increasing object function;By by border likelihood function p (t | α, σ2) take the logarithm, Note L (α)=log [p (t | α, σ2)], arrangement has:
L ( α ) = L ( α - i ) + 1 2 [ logα i - l o g ( α i + S i ) + ( Q i ) 2 α i + S i ] = L ( α - i ) + l ( α i ) - - - ( 9 )
Wherein L (α-i) be expressed as working as αiDuring=∞, corresponding basis vector φiCorresponding border likelihood function after being removed Logarithm, and l (αi) represent in the logarithmic function of border likelihood only and αiRelevant independent sector.SiIt is defined as the sparse factor, Qi For quality factor.L (α) has unique maximum of points to be:
α i = S i 2 Q i 2 - S i Q i 2 > S i ∞ Q i 2 ≤ S i - - - ( 10 )
In order to maximize L (α), according to formula (10), constantly iteration searching out suitable weight, at this moment also can by hyper parameter α Constantly update against weight w, by being continuously updated, obtain final training patternSome The corresponding weight of sample point is zero, and those points being not zero are exactly associated vector;Fast correlation vector machine classification rudimentary algorithm step It is rapid as follows:
(1) σ is initialized2=0;
(2) with single base vector φiInitialization αi, obtained by formula (10) analysis and arrangementAnd other are set αm(m ≠ i) is infinity;
(3) covariance matrix Σ, weight matrix μ are calculated and to all M basic function φmInitialization SmAnd Qm
(4) from all M basic function φmThe base vector φ of candidate is selected in seti
(5) calculate
(6) if θi> 0 and αi< ∞, reevaluate αi
(7) if θi> 0 and αi=∞, adds φiTo in model and reevaluate αi
(8) if θi≤ 0 and αi< ∞, delete φiAnd α is seti=∞;
(9) covariance matrix Σ, the S in weight matrix μ and corresponding iterative process are recalculated with Laplace approach methodsmWith Qm
(10) if restraining or reaching maximum iteration time, terminator;Otherwise go to step (4);End condition is:Arbitrarily exist The corresponding α of basic function in modeli, there is αi< le12 and
Set up basic fast correlation vector machine and after disaggregated model, then multiple two graders are tied using " one-to-one " method Altogether, so as to setting up a multi-categorizer.
CN201710000827.6A 2017-01-03 2017-01-03 Online fault diagnosing method for Fast RVM (relevance vector machine) sewage treatment Pending CN106681305A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710000827.6A CN106681305A (en) 2017-01-03 2017-01-03 Online fault diagnosing method for Fast RVM (relevance vector machine) sewage treatment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710000827.6A CN106681305A (en) 2017-01-03 2017-01-03 Online fault diagnosing method for Fast RVM (relevance vector machine) sewage treatment

Publications (1)

Publication Number Publication Date
CN106681305A true CN106681305A (en) 2017-05-17

Family

ID=58850054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710000827.6A Pending CN106681305A (en) 2017-01-03 2017-01-03 Online fault diagnosing method for Fast RVM (relevance vector machine) sewage treatment

Country Status (1)

Country Link
CN (1) CN106681305A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108564677A (en) * 2018-03-26 2018-09-21 唐天才 A kind of data intelligence management method for New-energy electric vehicle
CN108875783A (en) * 2018-05-09 2018-11-23 西安工程大学 A kind of extreme learning machine Diagnosis Method of Transformer Faults towards unbalanced dataset
CN109508726A (en) * 2017-09-15 2019-03-22 北京京东尚科信息技术有限公司 Data processing method and its system
CN109558893A (en) * 2018-10-31 2019-04-02 华南理工大学 Fast integration sewage treatment method for diagnosing faults based on resampling pond
CN111400528A (en) * 2020-03-16 2020-07-10 南方科技大学 Image compression method, device, server and storage medium
CN112734129A (en) * 2021-01-21 2021-04-30 中国科学院地理科学与资源研究所 Air pollution space-time trend prediction method based on unsupervised restrictive optimization
CN112863134A (en) * 2020-12-31 2021-05-28 浙江清华长三角研究院 Intelligent diagnosis system and method for rural sewage treatment facility abnormal operation
CN115111717A (en) * 2021-03-08 2022-09-27 佛山市顺德区美的电热电器制造有限公司 Temperature adjusting device, temperature control method and device thereof, electronic device and storage medium
CN116719831A (en) * 2023-08-03 2023-09-08 四川中测仪器科技有限公司 Standard database establishment and update method for health monitoring

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140107977A1 (en) * 2012-10-16 2014-04-17 Mitsubishi Aircraft Corporation Condition diagnosing method and condition diagnosing device
CN104680015A (en) * 2015-03-02 2015-06-03 华南理工大学 Online soft measurement method for sewage treatment based on quick relevance vector machine
CN105487526A (en) * 2016-01-04 2016-04-13 华南理工大学 FastRVM (fast relevance vector machine) wastewater treatment fault diagnosis method
CN105740619A (en) * 2016-01-28 2016-07-06 华南理工大学 On-line fault diagnosis method of weighted extreme learning machine sewage treatment on the basis of kernel function

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140107977A1 (en) * 2012-10-16 2014-04-17 Mitsubishi Aircraft Corporation Condition diagnosing method and condition diagnosing device
CN104680015A (en) * 2015-03-02 2015-06-03 华南理工大学 Online soft measurement method for sewage treatment based on quick relevance vector machine
CN105487526A (en) * 2016-01-04 2016-04-13 华南理工大学 FastRVM (fast relevance vector machine) wastewater treatment fault diagnosis method
CN105740619A (en) * 2016-01-28 2016-07-06 华南理工大学 On-line fault diagnosis method of weighted extreme learning machine sewage treatment on the basis of kernel function

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508726A (en) * 2017-09-15 2019-03-22 北京京东尚科信息技术有限公司 Data processing method and its system
CN108564677A (en) * 2018-03-26 2018-09-21 唐天才 A kind of data intelligence management method for New-energy electric vehicle
CN108875783A (en) * 2018-05-09 2018-11-23 西安工程大学 A kind of extreme learning machine Diagnosis Method of Transformer Faults towards unbalanced dataset
CN109558893B (en) * 2018-10-31 2022-12-16 华南理工大学 Rapid integrated sewage treatment fault diagnosis method based on resampling pool
CN109558893A (en) * 2018-10-31 2019-04-02 华南理工大学 Fast integration sewage treatment method for diagnosing faults based on resampling pond
CN111400528A (en) * 2020-03-16 2020-07-10 南方科技大学 Image compression method, device, server and storage medium
CN111400528B (en) * 2020-03-16 2023-09-01 南方科技大学 Image compression method, device, server and storage medium
CN112863134A (en) * 2020-12-31 2021-05-28 浙江清华长三角研究院 Intelligent diagnosis system and method for rural sewage treatment facility abnormal operation
CN112863134B (en) * 2020-12-31 2022-11-18 浙江清华长三角研究院 Intelligent diagnosis system and method for rural sewage treatment facility abnormal operation
CN112734129B (en) * 2021-01-21 2021-09-07 中国科学院地理科学与资源研究所 Air pollution space-time trend prediction method based on unsupervised restrictive optimization
CN112734129A (en) * 2021-01-21 2021-04-30 中国科学院地理科学与资源研究所 Air pollution space-time trend prediction method based on unsupervised restrictive optimization
CN115111717A (en) * 2021-03-08 2022-09-27 佛山市顺德区美的电热电器制造有限公司 Temperature adjusting device, temperature control method and device thereof, electronic device and storage medium
CN116719831A (en) * 2023-08-03 2023-09-08 四川中测仪器科技有限公司 Standard database establishment and update method for health monitoring
CN116719831B (en) * 2023-08-03 2023-10-27 四川中测仪器科技有限公司 Standard database establishment and update method for health monitoring

Similar Documents

Publication Publication Date Title
CN106681305A (en) Online fault diagnosing method for Fast RVM (relevance vector machine) sewage treatment
CN105487526B (en) A kind of Fast RVM sewage treatment method for diagnosing faults
CN110361176B (en) Intelligent fault diagnosis method based on multitask feature sharing neural network
CN110188047B (en) Double-channel convolutional neural network-based repeated defect report detection method
CN106355030A (en) Fault detection method based on analytic hierarchy process and weighted vote decision fusion
Gustafsson et al. Comparison and validation of community structures in complex networks
CN104794368A (en) Rolling bearing fault classifying method based on FOA-MKSVM (fruit fly optimization algorithm-multiple kernel support vector machine)
CN105871879B (en) Network element abnormal behaviour automatic testing method and device
CN116108758B (en) Landslide susceptibility evaluation method
CN102185735A (en) Network security situation prediction method
CN112001110A (en) Structural damage identification monitoring method based on vibration signal space real-time recursive graph convolutional neural network
CN108446616A (en) Method for extracting roads based on full convolutional neural networks integrated study
CN101178703A (en) Failure diagnosis chart clustering method based on network dividing
CN107704883A (en) A kind of sorting technique and system of the grade of magnesite ore
CN114429152A (en) Rolling bearing fault diagnosis method based on dynamic index antagonism self-adaption
Wang et al. An improved weighted naive bayesian classification algorithm based on multivariable linear regression model
CN109164794B (en) Multivariable industrial process Fault Classification based on inclined F value SELM
CN117056678B (en) Machine pump equipment operation fault diagnosis method and device based on small sample
CN114186639A (en) Electrical accident classification method based on dual-weighted naive Bayes
CN114519508A (en) Credit risk assessment method based on time sequence deep learning and legal document information
CN108830407B (en) Sensor distribution optimization method in structure health monitoring under multi-working condition
CN106203520A (en) SAR image sorting technique based on degree of depth Method Using Relevance Vector Machine
CN110533341A (en) A kind of Livable City evaluation method based on BP neural network
CN115204475A (en) Drug rehabilitation place security incident risk assessment method
CN103761530A (en) Hyperspectral image unmixing method based on relevance vector machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170517