CN106681305A - Online fault diagnosing method for Fast RVM (relevance vector machine) sewage treatment - Google Patents
Online fault diagnosing method for Fast RVM (relevance vector machine) sewage treatment Download PDFInfo
- Publication number
- CN106681305A CN106681305A CN201710000827.6A CN201710000827A CN106681305A CN 106681305 A CN106681305 A CN 106681305A CN 201710000827 A CN201710000827 A CN 201710000827A CN 106681305 A CN106681305 A CN 106681305A
- Authority
- CN
- China
- Prior art keywords
- sample
- data
- fast
- vector machine
- historical data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B23/00—Testing or monitoring of control systems or parts thereof
- G05B23/02—Electric testing or monitoring
- G05B23/0205—Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
- G05B23/0218—Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults
- G05B23/0243—Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults model based detection method, e.g. first-principles knowledge model
- G05B23/0254—Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults model based detection method, e.g. first-principles knowledge model based on a quantitative model, e.g. mathematical relationships between inputs and outputs; functions: observer, Kalman filter, residual calculation, Neural Networks
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an online fault diagnosing method for Fast RVM (relevance vector machine) sewage treatment. The method includes the steps of firstly, removing samples with incomplete attributes in sewage data, normalizing the samples into a [0, 1] interval, and determining a historical data set and an updating test set; secondly, using a relevance vector machine method based on clustering to compress the majority data of the historical data set; thirdly, using a virtual minority upward sampling method to extend the minority data of the historical data set; fourth, building a 'one-to-one' fast relevance vector machine multi-classification training model; fifthly, adding new samples from the updating test set into the model for testing, and updating the historical data set; sixthly, returning to the second step, reprocessing unbalanced historical data, training the model, and repeating the above process until online data testing is finished. By the online fault diagnosing method, the unbalance of the sewage data is lowered effectively, classification accuracy is increased, online updating speed is increased, operation faults can be diagnosed in real time, and the safety operation of a sewage treatment plant is guaranteed.
Description
Technical field
The present invention relates to sewage treatment area, more particularly to a kind of Fast RVM sewage disposals on-line fault diagnosis method.
Background technology
At present, environmental conservation has become the important foundation of China's sustainable economic development, as China's industrial economy is sent out
Exhibition is rapid, and city process is constantly accelerated, the discharge capacity of the industrial wastewater rapid growth with the increase of industrial water consumption, most of
The direct discharge of waste water again severe contamination rivers water bodys, destroy ecological balance, indirectly have impact on the life of people.Sewage
Used as the crucial protective barrier of natural water, its operation is fine or not will to directly affect the safe coefficient of water environment for treatment plant.Sewage is given birth to
Change complex treatment process, influence factor is very more, and sewage treatment plant is difficult to the fortune for maintaining a long-term stability in actual moving process
OK, once operation troubles occurs can usually cause that effluent quality is up to standard, operating cost increases serious with secondary environmental pollution etc.
Problem.Therefore, it is necessary to be monitored to sewage treatment plant's running status, sewage disposal process failure is diagnosed to be in time and is located
Reason.
The fault diagnosis of sewage disposal process running status is substantially a pattern classification problem, and in virtual condition fortune
In row classification, the skewness weighing apparatus problem of sewage data set can be usually run into, prior art has some limitations, is being used for
When unbalanced data is classified, category of model accuracy cannot meet requirement, and to the fault diagnosis of biochemical wastewater treatment pole is brought
Big difficulty;Simultaneously in the middle of real process, fault diagnosis is actually a continuous learning process, its spy for projecting
Point study is not once to carry out offline, but what data were added one by one, the process being constantly optimized.On-line study side
Method requires that before next data are obtained training must be completed, and can otherwise affect completing for next step decision-making, and at sewage
The fault message that the operation of reason factory occurs is particularly important, so what online system failure diagnosis more focused on is rapidity and accurately
Property.
The content of the invention
It is an object of the invention to overcome the deficiencies in the prior art, there is provided a kind of Fast clustered based on unbalanced data
RVM sewage disposal on-line fault diagnosis methods, by the fast correlation vector machine method based on cluster to many several classes of data compressions
With virtual minority class to the method for up-sampling to minority class data extending, the disequilibrium of sewage data is reduced, improve classification
Accuracy rate, while many disaggregated models are set up to biochemical processing procedure of sewage using Fast RVM, accelerates online updating speed, so as to
Ensure that the accurate forthright and real-time of the on-line fault diagnosis of sewage disposal process.
For achieving the above object, technical scheme provided by the present invention is:A kind of online failure of Fast RVM sewage disposals
Diagnostic method, comprises the following steps:
S1. the incomplete sample of attribute in sewage data is weeded out, due to the difference of each input variable dimension, it is carried out
Normalized, in normalizing to [0,1] interval, and determines history data set xoldWith renewal test set xnew;
S2. many several classes of samples in historical data are compressed using the fast correlation vector machine method for being based on cluster;
S3. according to virtual minority class the minority class sample in historical data is expanded to the method for up-sampling;
S4. the sample data of all classes in the historical data after process is reconfigured and constitutes new history training set, and
Set up many classification based training models of fast correlation vector machine of " one-to-one ";
S5. from renewal test set xnewK new samples of middle addition are tested in model, and preserve class test result,
Historical data concentration is added to, removes k sample before historical data is concentrated;
S6. step S2 is returned to, unbalanced historical data is processed again, training pattern, continuous repeatedly said process, until
Online updating data test is finished, and obtains final on-line testing result, so as to realize the on-line operation shape to sewage disposal process
The identification of state.
Described step S2, specially:
S201, many several classes of sample set X={ x of hypothesis1,x2,…,xi,…,xnIt is n RdThe data in space, wherein d are sample
The dimension of this attribute, randomly chooses k object as initial cluster centre from n data object;
S202 then to remaining sample object then according to the distance of each cluster centre be separately dispensed into distance most phase
In near cluster centre;The formula of computed range is as follows, it is assumed that cjFor the center of j-th class, then xiWith cjDistance be:
S203, the point in set update the cluster centre of each class, it is assumed that the sample of j-th apoplexy due to endogenous wind isContain njIndividual sample, then such cluster centre beWherein
For class center cjM-th attribute, computing formula is as follows:
S204, constantly repeat S202, S203 step, till canonical measure function convergence, using mean square deviation as meter
Canonical measure function is calculated, its form is:
S205, by cluster after many several classes of samples carry out fast correlation vector machine classification model construction, it is certain such that it is able to obtain
The associated vector of quantity, the number of these associated vectors than original many several classes of data much less, and with certain representative
Property, then replace original many several classes of samples so as to the compression to many several classes of samples with the associated vector chosen.
Described step S3, specially:
S301, to each sample x in minority class, with Euclidean distance as criterion calculation, it is every in minority class sample set
The distance of individual sample, obtains wherein k arest neighbors, and records the subscript of neighbour's sample;
S302, according to up-sampling multiplying power N, to each minority class sample x, from its k arest neighbors N is randomly selected
Individual sample, is designated as y1,y2,…,yN;
S303, in former sample x and yjStochastic linear interpolation is carried out between (j=1,2 ..., N), new minority class sample is constructed
This pj, i.e. new samples:
pj=x+rand (0,1) * (yj- x), j=1,2 ..., N (4)
Wherein rand (0,1) represents a random number in interval (0,1).
In step S4, many classification based training models of fast correlation vector machine of " one-to-one ", it is as follows that it sets up process:
Historical data after process and can be defined asWherein N is the sample of data set
This number, n is sample sequence number, and d is the dimension of sample attribute, znFor the input of sample, tnFor the desired value of sample, anticipation function
As shown in formula one:
tn=y (zn;w)+εn (5)
The wherein definition of y (z) such as formula (shown in 2)
Wherein K (z, zi) it is kernel function, wiFor the corresponding weight of basic function, w=[w0,w1,…,wN]T,εnFor noise, clothes
From εn~N (0, σ2), therefore tn~N (y (zn,w),σ2).Assume prediction target tnBetween it is separate, then just have:
Φ is the structural matrix of a N × (N+1) in formula, in order to avoid over-fitting, the weights ω needed restraint in model,
Assume its obedienceGauss distribution, α is hyper parameter.When one group of new variable is input into, corresponding desired value t* is p
(t* | t)~p (w, α, σ2| t), it is distributed according to prior probability distribution and possibility predication, the Posterior probability distribution of weight can be obtained:
p(ω,α,σ2| t)=p (ω | t, α, σ2)p(α,σ2|t) (8)
Approximate processing is carried out to above formula, finally into maximization p (α, σ2|t)∝p(t|α,σ2)p(α)p(σ2) process,
Namely find parameter alpha and σ2Most likely value αMP、
Fast correlation vector machine starts dynamically to expand basic matrix Φ in the training process from empty set, so as to increase limit seemingly
Right function, or remove the row of basic matrix Φ redundancies increasing object function.By by border likelihood function p (t | α, σ2) take it is right
Number, and note L (α)=log [p (t | α, σ2)], arrangement has:
Wherein L (α-i) be expressed as working as αiDuring=∞, corresponding basis vector φiCorresponding border likelihood letter after being removed
Several logarithms, and l (αi) represent in the logarithmic function of border likelihood only and αiRelevant independent sector.SiBe defined as it is sparse because
Son, QiFor quality factor.L (α) has unique maximum of points to be:
In order to maximize L (α), according to formula (10), constantly iteration to be searching out suitable weight, at this moment hyper parameter α
Also can constantly update against weight w, by being continuously updated, final training pattern can be obtainedThe corresponding weight of some sample points is zero, and those points being not zero are exactly associated vector.It is comprehensive
Upper described, fast correlation vector machine classification rudimentary algorithm step is as follows:
(1) σ is initialized2=0;
(2) with single base vector φiInitialization αi, can be obtained by formula (10) analysis and arrangementAnd
Others α is setm(m ≠ i) is infinity;
(3) covariance matrix Σ, weight matrix μ are calculated and to all M basic function φmInitialization SmAnd Qm;
(4) from all M basic function φmThe base vector φ of candidate is selected in seti;
(5) calculate
(6) if θi> 0 and αi< ∞, reevaluate αi;
(7) if θi> 0 and αi=∞, adds φiTo in model and reevaluate αi;
(8) if θi≤ 0 and αi< ∞, delete φiAnd α is seti=∞;
(9) covariance matrix Σ is recalculated with Laplace approach methods, in weight matrix μ and corresponding iterative process
SmAnd Qm;
(10) if restraining or reaching maximum iteration time, terminator;Otherwise go to step (4);End condition is:Appoint
The meaning corresponding α of basic function in a modeli, there is αi< le12 and
Set up fast correlation vector machine and after disaggregated model, then multiple two graders are adopted into " one-to-one " method phase
With reference to, a multi-categorizer is set up, if sample to be sorted is k classification, any two class of this k apoplexy due to endogenous wind can constitute a base
This grader of fast correlation vector machine two, pairwise classification is carried out to all of training sample, and such k classification is between any two altogether
Meter may be constructedThe individual grader of fast correlation vector machine two, each fast correlation vector machine classifier is only respective
It is trained on corresponding sample set.It is using the method for ballot, each is to be measured when carrying out class test to unknown sample
Sample is all through allIndividual grader is differentiated.For example, when sample is classified between the class of i, j two, machine differentiates knot
Really it belongs to the i-th class, just increases by 1 ticket in the i-th class, otherwise Jia 1 to the ballot of jth class, until the classification of all of grader is completed,
Finally count who gets the most votes's class and be test sample generic.
If classification function fijX () is used for differentiating the class sample of i, j two, if fijX () < 0, then differentiate that x belongs to the i-th class, remembers i classes
1 ticket, otherwise sentence x and belong to jth class, note j classes obtain 1 ticket, during last decision-making, compare ticket which kind of obtains at most, then will test
Sample is planned to such.
The present invention compared with prior art, has the advantage that and beneficial effect:
1st, the present invention establishes a kind of on-line fault diagnosis of the Fast RVM sewage disposals clustered based on unbalanced data
Model, by the fast correlation vector machine method based on cluster to many several classes of data compressions and virtual minority class to top sampling method
To minority class data extending, the disequilibrium of sewage data is reduced, while using Fast RVM to biochemical processing procedure of sewage
Many disaggregated models are set up, accelerates online updating speed, then real-time diagnosis and more new model are carried out according to operating mode interpolation data, etc.
Fault diagnosis next time is treated, so as to establish on-line fault diagnosis model.The on-time model is improve to biochemical wastewater treatment system
The fault diagnosis precision of system, on-line performance is good, effect is significant.
2nd, model of the invention is to many several classes of data compressions and virtual minority class based on the fast correlation vector machine for clustering
To top sampling method to minority class data extending, the disequilibrium of sewage data is reduced, can not only be obtained in equilibrium criterion
Good result, but also reasonable classifying quality can be obtained in unbalanced data, Fast RVM are employed on this basis
The multi-categorizer of foundation, its key point is that its hyper parameter to training sample carries out Fast estimation, removes the non-of training sample
Associated vector, it is ensured that model it is openness, so as to reduce the training time.Therefore, one kind that the present invention is adopted is based on uneven number
On-line fault diagnosis modeling is carried out to sewage disposal process according to the on-line fault diagnosis method of the Fast RVM sewage disposals of cluster,
Ensure that the accurate forthright and real-time of the on-line fault diagnosis of sewage disposal process.
3rd, when in-circuit emulation of the present invention is tested, need the data new to each group to be tested and added model to carry out more
Newly.History data set taking restricted memory by way of keeping its capacity, make training data all the time be limited group, often increase
As soon as the newest observation data of group, abandon immediately one group of earliest observation data, so as to ensure model in all comprising new data
Information, it is to avoid data message contained by history floods the information that new data is included.
Description of the drawings
Fig. 1 is Fast RVM sewage disposal on-line fault diagnosis method stream of the model of the present invention based on unbalanced data cluster
Cheng Tu.
Fig. 2 is model fast correlation vector machine sorting algorithm flow chart of the present invention.
Fig. 3 is many disaggregated model schematic diagrams of fast correlation vector machine of model of the present invention " one-to-one ".
Specific embodiment
With reference to specific embodiment, the present invention is described in further detail.
As shown in figure 1, the Fast RVM sewage disposal on-line fault diagnosis methods that the present invention is provided, based on unbalanced data
Cluster, concrete condition is as follows:
S1. the incomplete sample of attribute in sewage data is weeded out, due to the difference of each input variable dimension, it is carried out
Normalized, in normalizing to [0,1] interval, and determines history data set xoldWith renewal test set xnew;
S2. many several classes of samples in historical data are compressed using the fast correlation vector machine method for being based on cluster;
S3. according to virtual minority class the minority class sample in historical data is expanded to the method for up-sampling;
S4. the sample data of all classes in the historical data after process is reconfigured and constitutes new history training set, and
Set up many classification based training models of fast correlation vector machine of " one-to-one ";
S5. from renewal test set xnewK new samples of middle addition are tested in model, and preserve class test result,
Historical data concentration is added to, removes k sample before historical data is concentrated;
S6. step S2 is returned to, unbalanced historical data is processed again, training pattern, continuous repeatedly said process, until
Online updating data test is finished, and obtains final on-line testing result, so as to realize the on-line operation shape to sewage disposal process
The identification of state.
Described step S2, specially:
S201, many several classes of sample set X={ x of hypothesis1,x2,…,xi,…,xnIt is n RdThe data in space, wherein d are sample
The dimension of this attribute, randomly chooses k object as initial cluster centre from n data object;
S202 then to remaining sample object then according to the distance of each cluster centre be separately dispensed into distance most phase
In near cluster centre;The formula of computed range is as follows, it is assumed that cjFor the center of j-th class, then xiWith cjDistance be:
S203, the point in set update the cluster centre of each class, it is assumed that the sample of j-th apoplexy due to endogenous wind isContain njIndividual sample, then such cluster centre beWherein
For class center cjM-th attribute, computing formula is as follows:
S204, constantly repeat S202, S203 step, till canonical measure function convergence, using mean square deviation as meter
Canonical measure function is calculated, its form is:
S205, by cluster after many several classes of samples carry out fast correlation vector machine classification model construction, it is certain such that it is able to obtain
The associated vector of quantity, the number of these associated vectors than original many several classes of data much less, and with certain representative
Property, then replace original many several classes of samples so as to the compression to many several classes of samples with the associated vector chosen.
Described step S3, specially:
S301, to each sample x in minority class, with Euclidean distance as criterion calculation, it is every in minority class sample set
The distance of individual sample, wherein k arest neighbors of acquisition, and the subscript of neighbour's sample is recorded, here k takes 5;
S302, according to up-sampling multiplying power N, to each minority class sample x, from its k arest neighbors N is randomly selected
Individual sample, is designated as y1,y2,…,yN;
S303, in former sample x and yjStochastic linear interpolation is carried out between (j=1,2 ..., N), new minority class sample is constructed
This pj, i.e. new samples:
pj=x+rand (0,1) * (yj- x), j=1,2 ..., N (14)
Wherein rand (0,1) represents a random number in interval (0,1).
In step S4, many classification based training models of fast correlation vector machine of " one-to-one ", as shown in figure 3, it sets up process
It is as follows:
Historical data after process and can be defined asWherein N is the sample of data set
This number, n is sample sequence number, and d is the dimension of sample attribute, znFor the input of sample, tnFor the desired value of sample, anticipation function
As shown in formula one:
tn=y (zn;w)+εn (15)
The wherein definition of y (z) such as formula (shown in 2)
Wherein K (z, zi) it is kernel function, wiFor the corresponding weight of basic function, w=[w0,w1,…,wN]T,εnFor noise, clothes
From εn~N (0, σ2), therefore tn~N (y (zn,w),σ2).Assume anticipation function tnBetween it is separate, then just have:
Φ is the structural matrix of a N × (N+1) in formula, in order to avoid over-fitting, the weights ω needed restraint in model,
Assume its obedienceGauss distribution, α is hyper parameter.When one group of new variable is input into, corresponding desired value t* is p
(t* | t)~p (w, α, σ2| t), it is distributed according to prior probability distribution and possibility predication, the Posterior probability distribution of weight can be obtained:
p(ω,α,σ2| t)=p (ω | t, α, σ2)p(α,σ2|t) (18)
Approximate processing is carried out to above formula, finally into maximization p (α, σ2|t)∝p(t|α,σ2)p(α)p(σ2) process,
Namely find parameter alpha and σ2Most likely value αMP、
Fast correlation vector machine starts dynamically to expand basic matrix Φ in the training process from empty set, so as to increase limit seemingly
Right function, or remove the row of basic matrix Φ redundancies increasing object function.By by border likelihood function p (t | α, σ2) take it is right
Number, and note L (α)=log [p (t | α, σ2)], arrangement has:
Wherein L (α-i) be expressed as working as αiDuring=∞, corresponding basis vector φiCorresponding border likelihood letter after being removed
Several logarithms, and l (αi) represent in the logarithmic function of border likelihood only and αiRelevant independent sector.SiBe defined as it is sparse because
Son, QiFor quality factor.L (α) has unique maximum of points to be:
In order to maximize L (α), according to formula (20), constantly iteration to be searching out suitable weight, at this moment hyper parameter α
Also can constantly update against weight w, by being continuously updated, final training pattern can be obtainedThe corresponding weight of some sample points is zero, and those points being not zero are exactly associated vector.Such as
Shown in Fig. 2, fast correlation vector machine classification rudimentary algorithm step is as follows:
(1) σ is initialized2=0;
(2) with single base vector φiInitialization αi, can be obtained by formula (20) analysis and arrangementAnd
Others α is setm(m ≠ i) is infinity;
(3) covariance matrix Σ, weight matrix μ are calculated and to all M basic function φmInitialization SmAnd Qm;
(4) from all M basic function φmThe base vector φ of candidate is selected in seti;
(5) calculate
(6) if θi> 0 and αi< ∞, reevaluate αi;
(7) if θi> 0 and αi=∞, adds φiTo in model and reevaluate αi;
(8) if θi≤ 0 and αi< ∞, delete φiAnd α is seti=∞;
(9) covariance matrix Σ is recalculated with Laplace approach methods, in weight matrix μ and corresponding iterative process
SmAnd Qm;
(10) if restraining or reaching maximum iteration time, terminator;Otherwise go to step (4);End condition is:Appoint
The meaning corresponding α of basic function in a modeli, there is αi< le12 and
Set up fast correlation vector machine and after disaggregated model, then multiple two graders are adopted into " one-to-one " method phase
With reference to, a multi-categorizer is set up, if sample to be sorted is k classification, any two class of this k apoplexy due to endogenous wind can constitute a base
This grader of fast correlation vector machine two, pairwise classification is carried out to all of training sample, and such k classification is between any two altogether
Meter may be constructedThe individual grader of fast correlation vector machine two, each fast correlation vector machine classifier is only respective
It is trained on corresponding sample set.It is using the method for ballot, each is to be measured when carrying out class test to unknown sample
Sample is all through allIndividual grader is differentiated.For example, when sample is classified between the class of i, j two, machine differentiates knot
Really it belongs to the i-th class, just increases by 1 ticket in the i-th class, otherwise Jia 1 to the ballot of jth class, until the classification of all of grader is completed,
Finally count who gets the most votes's class and be test sample generic.
If classification function fijX () is used for differentiating the class sample of i, j two, if fijX () < 0, then differentiate that x belongs to the i-th class, remembers i classes
1 ticket, otherwise sentence x and belong to jth class, note j classes obtain 1 ticket, during last decision-making, compare ticket which kind of obtains at most, then will test
Sample is planned to such.
Below we combine the concrete data weighting extreme learning machine sewage disposal on-line fault diagnosis above-mentioned to the present invention
Method is specifically described, as follows:
The data of experiment simulation, from UCI data bases, are the daily monitoring datas in two years of a sewage treatment plant, whole
Individual data set has 527 records including including imperfect record one, each sample dimension for 38 (i.e. 38 measurands, it is right
Each is answered to refer to target value), all complete record of whole property values has 380, and monitored water body one has 13 kinds of states, each
State numeral replaces (saving state for convenience to claim).527 distribution situations recorded under 13 kinds of states see the table below 1.
Distribution situation of the 1-527 record of table under 13 kinds of states
Classification | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
Number | 279 | 1 | 1 | 4 | 116 | 3 | 1 | 1 | 65 | 1 | 53 | 1 | 1 |
In order to simplify the complexity of classification, sample is divided into 4 big class, such as table 2 below by us according to the property of sample class.
Distribution situation of the 2-527 record of table under 4 kinds of states
Classification | 1 | 2 | 3 | 4 |
Number | 332 | 116 | 65 | 14 |
Classification 1 is normal condition, and classification 2 is the normal condition that performance exceedes meansigma methodss, and classification 3 is that flow of inlet water is low just
Reason condition, classification 4 is the failure that the reasons such as the abnormal condition that second pond failure, heavy rain cause and solid solubility overload cause
Situation.
The on-line fault diagnosis method of the above-mentioned Fast RVM sewage disposals clustered based on unbalanced data of the present embodiment,
The step of comprising following order:
S1. the incomplete data of 147 attributes are weeded out in 527 sewage data first, are obtained 380 attributes and are completely counted
According to then by data by formulaNormalized, by the data set after process 2 are pressed:1 ratio is carried out
Optimum allocation random stratified sampling survey, obtains history data set xoldWith online updating test set xnew。
S2. many several classes of samples (first kind) that historical data is concentrated are extracted, is polymerized to using K-means methods
Two classes, are then modeled the primary sources after cluster using fast correlation vector machine method, obtain appropriate number of phase
Vector is closed, many several classes of samples are replaced with selected associated vector;
S3. according to the multiplying power to up-sampling, using method from virtual minority class to up-sampling by the minority in historical sample
Class sample (the 3rd class and the 4th class) is expanded;
S4. by process after the historical sample data of all classes reconfigure and constitute new history training set, such as the institute of table 3
Show, set up many classification based training models of fast correlation vector machine of " one-to-one ".Many classification based training model selection RBF kernel functions, core
Width parameter by being determined using the trellis search method of 5 folding cross validations to new training set, then according to a total of four
Individual classification, sets up altogether 6 two graders;
S5. from online updating test set xnewIn take k new samples and tested in multi-categorizer model, 6 are classified
Device distinguishes input test collection xnew, voted, class test result is preserved, historical data concentration is added to, remove history
Front k sample in data set;
S6. step S2 is returned to, re -training model, continuous repeatedly said process, until online updating data test is finished,
Final on-line testing result is obtained, so as to realize the identification of the on-line operation state to sewage disposal process.The present invention is adopted
Based on cluster Fast RVM sewage disposal on-line fault diagnosis models can be good at meet require, so as to realize to sewage
The real-time monitoring of processing procedure running status and control, are worthy to be popularized.
Distribution situation of the 2-527 record of table under 4 kinds of states
The examples of implementation of the above are only the preferred embodiments of the invention, not limit the enforcement model of the present invention with this
Enclose, therefore the change that all shapes according to the present invention, principle are made, all should cover within the scope of the present invention.
Claims (4)
1. a kind of Fast RVM sewage disposals on-line fault diagnosis method, it is characterised in that comprise the following steps:
S1. the incomplete sample of attribute in sewage data is weeded out, due to the difference of each input variable dimension, normalizing is carried out to it
Change is processed, and in normalizing to [0,1] interval, and determines history data set xoldWith renewal test set xnew;
S2. many several classes of samples in historical data are compressed using the fast correlation vector machine method for being based on cluster;
S3. according to virtual minority class the minority class sample in historical data is expanded to the method for up-sampling;
S4. the sample data of all classes in the historical data after process is reconfigured and constitutes new history training set, and set up
The many classification based training models of fast correlation vector machine of " one-to-one ";
S5. from renewal test set xnewK new samples of middle addition are tested in model, and preserve class test result, by it
It is added to historical data concentration, removes k sample before historical data is concentrated;
S6. step S2 is returned to, unbalanced historical data is processed again, training pattern, continuous repeatedly said process, until online
Update the data and be completed, obtain final on-line testing result, so as to realize the on-line operation state to sewage disposal process
Identification.
2. a kind of Fast RVM sewage disposals on-line fault diagnosis method according to claim 1, it is characterised in that described
The step of S2, specially:
S201, many several classes of sample set X={ x of hypothesis1,x2,…,xi,…,xnIt is n RdThe data in space, wherein d are sample category
Property dimension, from n data object randomly choose k object as initial cluster centre;
S202 then to remaining sample object then according to the distance of each cluster centre be separately dispensed into distance it is most close
In cluster centre;The formula of computed range is as follows, it is assumed that cjFor the center of j-th class, then xiWith cjDistance be:
S203, the point in set update the cluster centre of each class, it is assumed that the sample of j-th apoplexy due to endogenous wind isContain njIndividual sample, then such cluster centre beWherein
For class center cjM-th attribute, computing formula is as follows:
S204, continuous repeat step S202, S203, till canonical measure function convergence, are marked using mean square deviation as calculating
Quasi- measure function, its form is:
S205, by cluster after many several classes of samples carry out fast correlation vector machine classification model construction, so as to obtain setting quantity phase
Vector is closed, the number of these associated vectors, with certain representativeness, is then used than original many several classes of data much less
The associated vector of selection replaces original many several classes of samples so as to the compression to many several classes of samples.
3. a kind of Fast RVM sewage disposals on-line fault diagnosis method according to claim 1, it is characterised in that described
The step of S3, specially:
S301, to each sample x in minority class, with Euclidean distance as criterion calculation it to each sample in minority class sample set
This distance, obtains wherein k arest neighbors, and records the subscript of neighbour's sample;
S302, according to up-sampling multiplying power N, to each minority class sample x, from its k arest neighbors N number of sample is randomly selected
This, is designated as y1,y2,…,yN;
S303, in former sample x and yjStochastic linear interpolation is carried out between (j=1,2 ..., N), new minority class sample p is constructedj,
That is new samples:
pj=x+rand (0,1) * (yj- x), j=1,2 ..., N (4)
Wherein rand (0,1) represents a random number in interval (0,1).
4. a kind of Fast RVM sewage disposals on-line fault diagnosis method according to claim 1, it is characterised in that in step
In rapid S4, many classification based training models of fast correlation vector machine of " one-to-one ", it is as follows that it sets up process:
Historical data after process is defined aszn∈Rd,tn∈ R, wherein N are the number of samples of data set, and n is sample
Sequence number, d is the dimension of sample attribute, znFor the input of sample, tnFor the desired value of sample, anticipation function is as shown in formula one:
tn=y (zn;w)+εn (5)
Wherein y (z) is defined as follows formula
Wherein K (z, zi) it is kernel function, wiFor the corresponding weight of basic function, w=[w0,w1,…,wN]T,εnFor noise, ε is obeyedn~
N(0,σ2), therefore tn~N (y (zn,w),σ2), it is assumed that prediction target tnBetween it is separate, then just have:
Φ is the structural matrix of a N × (N+1) in formula, in order to avoid over-fitting, the weights ω needed restraint in model, it is assumed that
Its obedienceGauss distribution, α is hyper parameter, when be input into one group of new variable when, corresponding desired value t* be p (t* |
T)~p (w, α, σ2| t), it is distributed according to prior probability distribution and possibility predication, obtains the Posterior probability distribution of weight:
p(ω,α,σ2| t)=p (ω | t, α, σ2)p(α,σ2|t) (8)
Approximate processing is carried out to above formula, finally into maximization p (α, σ2|t)∝p(t|α,σ2)p(α)p(σ2) process, also
It is to find parameter alpha and σ2Most likely value αMP、
Fast correlation vector machine starts dynamically to expand basic matrix Φ in the training process from empty set, so as to increase marginal likelihood letter
Number, or remove the row of basic matrix Φ redundancies increasing object function;By by border likelihood function p (t | α, σ2) take the logarithm,
Note L (α)=log [p (t | α, σ2)], arrangement has:
Wherein L (α-i) be expressed as working as αiDuring=∞, corresponding basis vector φiCorresponding border likelihood function after being removed
Logarithm, and l (αi) represent in the logarithmic function of border likelihood only and αiRelevant independent sector.SiIt is defined as the sparse factor, Qi
For quality factor.L (α) has unique maximum of points to be:
In order to maximize L (α), according to formula (10), constantly iteration searching out suitable weight, at this moment also can by hyper parameter α
Constantly update against weight w, by being continuously updated, obtain final training patternSome
The corresponding weight of sample point is zero, and those points being not zero are exactly associated vector;Fast correlation vector machine classification rudimentary algorithm step
It is rapid as follows:
(1) σ is initialized2=0;
(2) with single base vector φiInitialization αi, obtained by formula (10) analysis and arrangementAnd other are set
αm(m ≠ i) is infinity;
(3) covariance matrix Σ, weight matrix μ are calculated and to all M basic function φmInitialization SmAnd Qm;
(4) from all M basic function φmThe base vector φ of candidate is selected in seti;
(5) calculate
(6) if θi> 0 and αi< ∞, reevaluate αi;
(7) if θi> 0 and αi=∞, adds φiTo in model and reevaluate αi;
(8) if θi≤ 0 and αi< ∞, delete φiAnd α is seti=∞;
(9) covariance matrix Σ, the S in weight matrix μ and corresponding iterative process are recalculated with Laplace approach methodsmWith
Qm;
(10) if restraining or reaching maximum iteration time, terminator;Otherwise go to step (4);End condition is:Arbitrarily exist
The corresponding α of basic function in modeli, there is αi< le12 and
Set up basic fast correlation vector machine and after disaggregated model, then multiple two graders are tied using " one-to-one " method
Altogether, so as to setting up a multi-categorizer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710000827.6A CN106681305A (en) | 2017-01-03 | 2017-01-03 | Online fault diagnosing method for Fast RVM (relevance vector machine) sewage treatment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710000827.6A CN106681305A (en) | 2017-01-03 | 2017-01-03 | Online fault diagnosing method for Fast RVM (relevance vector machine) sewage treatment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106681305A true CN106681305A (en) | 2017-05-17 |
Family
ID=58850054
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710000827.6A Pending CN106681305A (en) | 2017-01-03 | 2017-01-03 | Online fault diagnosing method for Fast RVM (relevance vector machine) sewage treatment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106681305A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108564677A (en) * | 2018-03-26 | 2018-09-21 | 唐天才 | A kind of data intelligence management method for New-energy electric vehicle |
CN108875783A (en) * | 2018-05-09 | 2018-11-23 | 西安工程大学 | A kind of extreme learning machine Diagnosis Method of Transformer Faults towards unbalanced dataset |
CN109508726A (en) * | 2017-09-15 | 2019-03-22 | 北京京东尚科信息技术有限公司 | Data processing method and its system |
CN109558893A (en) * | 2018-10-31 | 2019-04-02 | 华南理工大学 | Fast integration sewage treatment method for diagnosing faults based on resampling pond |
CN111400528A (en) * | 2020-03-16 | 2020-07-10 | 南方科技大学 | Image compression method, device, server and storage medium |
CN112734129A (en) * | 2021-01-21 | 2021-04-30 | 中国科学院地理科学与资源研究所 | Air pollution space-time trend prediction method based on unsupervised restrictive optimization |
CN112863134A (en) * | 2020-12-31 | 2021-05-28 | 浙江清华长三角研究院 | Intelligent diagnosis system and method for rural sewage treatment facility abnormal operation |
CN115111717A (en) * | 2021-03-08 | 2022-09-27 | 佛山市顺德区美的电热电器制造有限公司 | Temperature adjusting device, temperature control method and device thereof, electronic device and storage medium |
CN116719831A (en) * | 2023-08-03 | 2023-09-08 | 四川中测仪器科技有限公司 | Standard database establishment and update method for health monitoring |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140107977A1 (en) * | 2012-10-16 | 2014-04-17 | Mitsubishi Aircraft Corporation | Condition diagnosing method and condition diagnosing device |
CN104680015A (en) * | 2015-03-02 | 2015-06-03 | 华南理工大学 | Online soft measurement method for sewage treatment based on quick relevance vector machine |
CN105487526A (en) * | 2016-01-04 | 2016-04-13 | 华南理工大学 | FastRVM (fast relevance vector machine) wastewater treatment fault diagnosis method |
CN105740619A (en) * | 2016-01-28 | 2016-07-06 | 华南理工大学 | On-line fault diagnosis method of weighted extreme learning machine sewage treatment on the basis of kernel function |
-
2017
- 2017-01-03 CN CN201710000827.6A patent/CN106681305A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140107977A1 (en) * | 2012-10-16 | 2014-04-17 | Mitsubishi Aircraft Corporation | Condition diagnosing method and condition diagnosing device |
CN104680015A (en) * | 2015-03-02 | 2015-06-03 | 华南理工大学 | Online soft measurement method for sewage treatment based on quick relevance vector machine |
CN105487526A (en) * | 2016-01-04 | 2016-04-13 | 华南理工大学 | FastRVM (fast relevance vector machine) wastewater treatment fault diagnosis method |
CN105740619A (en) * | 2016-01-28 | 2016-07-06 | 华南理工大学 | On-line fault diagnosis method of weighted extreme learning machine sewage treatment on the basis of kernel function |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109508726A (en) * | 2017-09-15 | 2019-03-22 | 北京京东尚科信息技术有限公司 | Data processing method and its system |
CN108564677A (en) * | 2018-03-26 | 2018-09-21 | 唐天才 | A kind of data intelligence management method for New-energy electric vehicle |
CN108875783A (en) * | 2018-05-09 | 2018-11-23 | 西安工程大学 | A kind of extreme learning machine Diagnosis Method of Transformer Faults towards unbalanced dataset |
CN109558893B (en) * | 2018-10-31 | 2022-12-16 | 华南理工大学 | Rapid integrated sewage treatment fault diagnosis method based on resampling pool |
CN109558893A (en) * | 2018-10-31 | 2019-04-02 | 华南理工大学 | Fast integration sewage treatment method for diagnosing faults based on resampling pond |
CN111400528A (en) * | 2020-03-16 | 2020-07-10 | 南方科技大学 | Image compression method, device, server and storage medium |
CN111400528B (en) * | 2020-03-16 | 2023-09-01 | 南方科技大学 | Image compression method, device, server and storage medium |
CN112863134A (en) * | 2020-12-31 | 2021-05-28 | 浙江清华长三角研究院 | Intelligent diagnosis system and method for rural sewage treatment facility abnormal operation |
CN112863134B (en) * | 2020-12-31 | 2022-11-18 | 浙江清华长三角研究院 | Intelligent diagnosis system and method for rural sewage treatment facility abnormal operation |
CN112734129B (en) * | 2021-01-21 | 2021-09-07 | 中国科学院地理科学与资源研究所 | Air pollution space-time trend prediction method based on unsupervised restrictive optimization |
CN112734129A (en) * | 2021-01-21 | 2021-04-30 | 中国科学院地理科学与资源研究所 | Air pollution space-time trend prediction method based on unsupervised restrictive optimization |
CN115111717A (en) * | 2021-03-08 | 2022-09-27 | 佛山市顺德区美的电热电器制造有限公司 | Temperature adjusting device, temperature control method and device thereof, electronic device and storage medium |
CN116719831A (en) * | 2023-08-03 | 2023-09-08 | 四川中测仪器科技有限公司 | Standard database establishment and update method for health monitoring |
CN116719831B (en) * | 2023-08-03 | 2023-10-27 | 四川中测仪器科技有限公司 | Standard database establishment and update method for health monitoring |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106681305A (en) | Online fault diagnosing method for Fast RVM (relevance vector machine) sewage treatment | |
CN105487526B (en) | A kind of Fast RVM sewage treatment method for diagnosing faults | |
CN110188047B (en) | Double-channel convolutional neural network-based repeated defect report detection method | |
CN116108758B (en) | Landslide susceptibility evaluation method | |
Gustafsson et al. | Comparison and validation of community structures in complex networks | |
CN108090510A (en) | A kind of integrated learning approach and device based on interval optimization | |
CN104794368A (en) | Rolling bearing fault classifying method based on FOA-MKSVM (fruit fly optimization algorithm-multiple kernel support vector machine) | |
CN105871879B (en) | Network element abnormal behaviour automatic testing method and device | |
CN112001110A (en) | Structural damage identification monitoring method based on vibration signal space real-time recursive graph convolutional neural network | |
CN102185735A (en) | Network security situation prediction method | |
CN101178703A (en) | Failure diagnosis chart clustering method based on network dividing | |
CN107545038A (en) | A kind of file classification method and equipment | |
CN114429152A (en) | Rolling bearing fault diagnosis method based on dynamic index antagonism self-adaption | |
CN108830407B (en) | Sensor distribution optimization method in structure health monitoring under multi-working condition | |
Wang et al. | An improved weighted naive bayesian classification algorithm based on multivariable linear regression model | |
CN117056678B (en) | Machine pump equipment operation fault diagnosis method and device based on small sample | |
Mostafaei et al. | Fully automated operational modal identification of regular and irregular buildings with ensemble learning | |
CN107463994A (en) | Semi-supervised flexible measurement method based on coorinated training extreme learning machine model | |
CN114519508A (en) | Credit risk assessment method based on time sequence deep learning and legal document information | |
CN114186639A (en) | Electrical accident classification method based on dual-weighted naive Bayes | |
CN106203520A (en) | SAR image sorting technique based on degree of depth Method Using Relevance Vector Machine | |
CN106911512A (en) | Link Forecasting Methodology and system based on game in commutative figure | |
CN110533341A (en) | A kind of Livable City evaluation method based on BP neural network | |
CN115204475A (en) | Drug rehabilitation place security incident risk assessment method | |
CN103761530A (en) | Hyperspectral image unmixing method based on relevance vector machine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170517 |