CN107688825B - Improved integrated weighted extreme learning machine sewage treatment fault diagnosis method - Google Patents
Improved integrated weighted extreme learning machine sewage treatment fault diagnosis method Download PDFInfo
- Publication number
- CN107688825B CN107688825B CN201710654311.3A CN201710654311A CN107688825B CN 107688825 B CN107688825 B CN 107688825B CN 201710654311 A CN201710654311 A CN 201710654311A CN 107688825 B CN107688825 B CN 107688825B
- Authority
- CN
- China
- Prior art keywords
- output
- weight
- matrix
- sewage treatment
- learning machine
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 239000010865 sewage Substances 0.000 title claims abstract description 58
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000003745 diagnosis Methods 0.000 title claims abstract description 39
- 238000012549 training Methods 0.000 claims abstract description 40
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 35
- 238000011056 performance test Methods 0.000 claims abstract description 3
- 239000011159 matrix material Substances 0.000 claims description 47
- 239000000523 sample Substances 0.000 claims description 47
- 230000006870 function Effects 0.000 claims description 26
- 238000005457 optimization Methods 0.000 claims description 25
- 210000002569 neuron Anatomy 0.000 claims description 16
- 238000010606 normalization Methods 0.000 claims description 9
- 230000010354 integration Effects 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 6
- 210000004205 output neuron Anatomy 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 4
- 206010066901 Treatment failure Diseases 0.000 claims description 3
- 238000011423 initialization method Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims 1
- 239000000126 substance Substances 0.000 claims 1
- 238000007635 classification algorithm Methods 0.000 description 5
- 238000013145 classification model Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000003851 biochemical process Effects 0.000 description 1
- 238000003912 environmental pollution Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000004062 sedimentation Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
- Feedback Control In General (AREA)
Abstract
The invention discloses an improved integrated weighted extreme learning machine sewage treatment fault diagnosis method, which comprises the following steps: s1, assigning the initial weight of the weighted extreme learning machine by adopting an assignment formula which tends to a few samples aiming at the base classifier; s2, training a base classifier; s3, providing a novel integrated algorithm-based classifier weight updating formula, integrating a plurality of base classifiers by using a weighted extreme learning machine as a base classifier and an Adaboost iteration method, and establishing an improved sewage fault diagnosis model; s4, inputting sample data generated in the sewage treatment process, setting the number T of the base classifiers of the integrated algorithm, the optimal kernel width gamma of the base classifiers and the corresponding optimal regularization coefficient C, establishing a fault diagnosis model of the sewage treatment system and carrying out performance test. The invention can realize the classification of the unbalanced data of a plurality of classes, improves the classification performance of the unbalanced data, particularly the classification accuracy of a few classes, and effectively improves the accuracy of fault diagnosis in the sewage treatment process.
Description
Technical Field
The invention relates to the technical field of sewage treatment fault diagnosis, in particular to an improved sewage treatment fault diagnosis method of an integrated weighted extreme learning machine.
Background
The sewage treatment is a complex biochemical process with a great number of influencing factors, the sewage treatment plant is difficult to keep long-term stable operation, and the fault easily causes serious problems of substandard effluent quality, increased operating cost, secondary environmental pollution and the like, so the operating state of the sewage treatment plant needs to be monitored, the operating fault is diagnosed and timely treated.
The fault diagnosis of the sewage treatment process is actually a problem of pattern recognition, and the problem of unbalanced distribution of sewage data sets is often encountered in the classification process. The traditional machine learning method is easy to make the classification accuracy biased to the majority, and the actual classification is more important to the classification accuracy of the minority, namely the classification accuracy of the fault class. The fault can be timely and accurately found, so that the loss of the sewage treatment plant can be reduced to a great extent, and the working efficiency of the sewage treatment plant is improved.
Disclosure of Invention
The invention provides an improved integrated weighted extreme learning machine sewage treatment fault diagnosis method aiming at the fault diagnosis problem of a sewage treatment plant, and the method introduces an unbalanced classification evaluation index G-mean into an Adaboost integrated classification algorithm which takes a weighted extreme learning machine as a base classifier, is used for fault diagnosis in the sewage treatment process, can realize the classification of unbalanced data of multiple categories, improves the classification performance of the unbalanced data, particularly the classification accuracy of a few categories, and effectively improves the fault diagnosis accuracy in the sewage treatment process.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: an improved integrated weighted extreme learning machine sewage treatment fault diagnosis method comprises the following steps:
s1, assigning the initial weight of the weighted extreme learning machine by adopting an assignment formula which tends to a few samples aiming at the base classifier;
s2, training a base classifier: calculating the recall rate recall and the performance evaluation index G-mean value of the previous base classifier, adopting an initial weight matrix updating formula based on G-mean, adjusting the weight matrix of the next base classifier of the weighted extreme learning machine and establishing a base classifier model, wherein the steps are as follows:
s2.1, giving a sewage sample set { (x)1,y1),(x2,y2),…,(xi,yi),…,(xN,yN) In which xiE X denotes the attribute value of the ith sample, yiIndicates the category label corresponding to the ith sample, N is the total number of samples, yiE.y {1,2, …, K, …, K }, where K denotes the kth class and K denotes a total of K classes; setting the number of base classifiers of the integration algorithm and recording as T;
s2.2, training the training samples by using a weighted kernel extreme learning machine as a base classifier to obtain a training model ht,For the t-th base classifier htFirst, the recall rate R of each class is obtained1,R2,…Rk,…,RKK is the kth class, M is the total number of classes, and then the number of each class is calculated as nkAnd classification result A (x) of each samplei) In the case of a partial pair, A (x)i) 1 ═ 1; if the error is found, A (x)i) -1; finally, G _ mean ═ R is obtained1·R2…RK)1/K;
S2.3, if G _ mean is less than or equal to 0.5, exiting iteration;
s2.4, according to the calculation base classifier htWeight calculation formula ofComputing the weight λ of the tth base classifiertThe smaller G _ mean, λtThe smaller the training error is, the smaller the proportion of the t-th base classifier in the whole integration algorithm is, and vice versa;
s2.5, adjusting weight distribution D of next iteration of samplet+1,Dt+1The adjustment rule of (2) is as follows:
s2.6, making T equal to T +1, if T is less than T, returning to S2.2, otherwise, ending;
s3, providing a novel weight updating formula of an integrated algorithm-based classifier, integrating a plurality of base classifiers by using a weighted extreme learning machine as a base classifier and an Adaboost iteration method, and establishing an improved sewage fault diagnosis model, wherein the weight updating formula comprises the following steps:
s3.1, setting the number of the base classifiers of the integration algorithm and recording as T;
s3.2, determining a sample x according to a weight initialization methodiInitial weight distribution D of1(i):i=1,2,…,N;
S3.3Training T base classifiers according to the method of S2, and updating the formula according to the weights of the base classifiersCalculating the weight of the base classifier;
s4, inputting sample data generated in the sewage treatment process, setting the number of the base classifiers of the integrated algorithm as T, setting the optimal kernel width gamma of the base classifiers and the corresponding optimal regularization coefficient C, establishing a fault diagnosis model of the sewage treatment system and carrying out performance test.
In step S1, two weight initialization schemes are selected, one is an automatic weighting scheme:wherein W1Denotes a first weighting scheme, nkThe number of samples corresponding to the training samples with the category k is obtained;
another idea of weight initialization is to push the ratio of minority classes and majority classes towards 0.618:1, which is essentially to trade off the recognition accuracy of minority classes by sacrificing the classification accuracy of majority classes:wherein W2Representing a second weighting scheme.
In step S2.2, the modeling of the weighted kernel limit learning machine is specifically as follows:
the extreme learning machine adopts a framework of a single hidden layer feedforward neural network SLFN, and N sewage treatment fault diagnosis training samples are given (x)1,y1),(x2,y2),…,(xN,yN) The standard SLFN output model with L hidden nodes is represented as follows:
wherein, βiRepresenting the output weight of the ith hidden neuron and the connected output neurons, G being the hidden layer neuron activation function, wiRepresenting input weights of the input layer and the i-th hidden neuron, biRepresents the bias of the i-th hidden neuron, ojIs the actual output value of the jth output neuron;
for N number of sewage treatment failure diagnosis samples, there is one (w)i,bi) And βiSo thatAnd further, the SLFN model is approximated to a sample set by zero errors, namely, the hidden layer feedforward neural network can fit the SLFN model without errors, namely:it is expressed as H β ═ T, where:
wherein, H is a hidden layer output matrix, β is an output weight matrix, and T is an output layer output matrix;
when the activation function G is a differentiable function, the SLFN parameters need not be adjusted in their entirety, and the link weights w are inputiAnd hidden layer bias biRandomly selected during the initialization process of network parameters and kept unchanged during the training process, the training SLFN is equivalent to solving the least square solution of the linear system H β ═ T, and can be converted into the following optimization problem:
Minimize:||Hβ-T||2and β ceiling
The optimization problem is mathematically expressed as:
Minimize:
wherein, ξi=[ξi,1,…ξi,K]TIs a sewage treatment fault diagnosis training sample xiThe Moore-Penrose generalized inverse matrix H output by the neuron of the hidden layer is the error vector between the output value of the corresponding output node and the real value+Can be solved to obtain:
the orthogonal projection method KKT can be used for effectively aligning H+Solve when HTH or HHTIn the case of a non-singular matrix H+=(HTH)-1HTOr H+=HT(HTH)-1In order to obtain better stability and generalization performance of the obtained model, the solution is carried outWhen it is needed to HTH or HHTDiagonal element plus a positive valueObtaining:
i denotes the identity matrix and the corresponding output function is:
or when:
the corresponding ELM output function is:
for better handling of unbalanced data, each sample is weighted such that samples belonging to different classes get different weights, so the mathematical form of the above-mentioned optimization problem is rewritten as:
Subject to:
where W is an N diagonal matrix of definitions, each main diagonal element WiiCorrespond to one sample xiDifferent types of samples will be automatically allocated with different weights, C is a normalization coefficient;
defining Lagrange function to solve the quadratic programming problem according to the KKT optimization condition, and then equivalently solving the following formula:
wherein, αiAre Lagrange multipliers, all non-negative;
the corresponding KKT optimization limiting conditions are as follows:
the algorithm solves the hidden layer output weight as:
the weighting scheme employs the sample weight distribution D in step S2.5t;
When the hidden layer feature map h (x) is unknown, the kernel matrix is defined as follows:
ΩELM=HHT:ΩELMi,j=h(xi)·h(xj)=K(xi,xj)i=1,2,…,N;j=1,2,…,N
this kernel function K (·) needs to satisfy the Mercer condition, at which time the output expression is written as:
therefore, the hidden layer feature mapping of the ELM can be kept unknown, and meanwhile, the number L of the hidden layer neurons does not need to be set;
the final output equation of the weighted extreme learning machine based on the kernel function is as follows:
where I is an identity matrix, C is a normalization coefficient, W is a weighting matrix, T is an output layer matrix, and ΩELMIs a kernel matrix;
in summary, the process of the weighted extreme learning machine training algorithm based on the kernel function is as follows:
s2.2.1, giving each sample weight according to the weighting scheme, and calculating a weighting matrix W;
s2.2.2, calculating a kernel matrix omega according to the kernel functionELM;
S2.2.3, calculating the output result f (x) of the network.
In step S4, the number T of base classifiers of the ensemble classifier is set to 20, and the kernel width γ and normalization coefficient C of the base classifiers satisfying the optimal performance of the algorithm are found by using the mesh parameter optimization, where γ is found in the optimization range of {2 {-18,2(-18+step),…,220Step is 0.5; c has an optimization range of {2 }-18,2(-18+step),…,250},step=0.5。
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the method firstly introduces the unbalanced classification evaluation index G-mean into an Adaboost integrated classification algorithm which takes a weighted extreme learning machine as a base classifier, and provides a novel integrated algorithm base classifier weight value updating formula.
2. The method of the invention firstly provides an initial weight matrix updating formula based on G-mean, which is used for modeling a weighted extreme learning machine.
3. The invention adopts the classifier of the weighted extreme learning machine as the base classifier of the integrated learning algorithm, and can improve the learning speed of the classifier, thereby realizing the real-time and accurate monitoring of the running state of the sewage treatment plant.
4. The method can improve the integral classification accuracy of the sewage treatment ancient fault diagnosis system, especially can improve the identification accuracy of fault categories, and has important significance for fault early warning and timely treatment of the sewage treatment system.
5. The method can effectively ensure the stable operation of the sewage treatment plant and the sewage treatment quality, and reduce secondary pollution.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The present invention will be further described with reference to the following specific examples.
Referring to fig. 1, the integrated weighted learning extreme machine sewage treatment fault diagnosis method provided by the embodiment includes the following steps:
step S1, initial base classifier weighted extreme learning machineAnd (6) assigning a weight value. There are two weight initialization schemes, one is an automatic weighting scheme:wherein W1Denotes a first weighting scheme, nkThe number of samples corresponding to the training samples with the category k is obtained;
another idea of weight initialization is to push the ratio of minority classes and majority classes towards 0.618:1, which is essentially to trade off the recognition accuracy of minority classes by sacrificing the classification accuracy of majority classes:wherein W2Representing a second weighting scheme.
Step S2, training a base classifier:
s2.1, giving a sewage sample set { (x)1,y1),(x2,y2),…,(xi,yi),…,(xN,yN) In which xiE X denotes the attribute value of the ith sample, yiIndicates the category label corresponding to the ith sample, N is the total number of samples, yiE.y {1,2, …, K, …, K }, where K denotes the kth class and K denotes a total of K classes; setting the number of base classifiers of the integration algorithm and recording as T;
s2.2, training the training samples by using a weighted kernel extreme learning machine as a base classifier to obtain a training model ht,For the t-th base classifier htFirst, the recall rate R of each class is obtained1,R2,…Rk,…,RKK is the kth class, M is the total number of classes, and then the number of each class is calculated as nkAnd classification result A (x) of each samplei) In the case of a partial pair, A (x)i) 1 ═ 1; if the error is found, A (x)i) -1; finally, G _ mean ═ R is obtained1·R2…RK)1/K;
S2.3, if G _ mean is less than or equal to 0.5, exiting iteration;
s2.4, according to the calculation base classifier htWeight calculation formula ofComputing the weight λ of the tth base classifiertThe smaller G _ mean, λtThe smaller the training error is, the smaller the proportion of the t-th base classifier in the whole integration algorithm is, and vice versa;
s2.5, adjusting weight distribution D of next iteration of samplet+1,Dt+1The adjustment rule of (2) is as follows:
s2.6, making T equal to T +1, if T is less than T, returning to S2.2, otherwise, ending;
and finishing training the base classifier.
In the step S2.2, the modeling of the weighted kernel limit learning machine specifically includes the following steps:
the extreme learning machine adopts a framework of a single hidden layer feedforward neural network SLFN, and N sewage treatment fault diagnosis training samples are given (x)1,y1),(x2,y2),…,(xN,yN) The standard SLFN output model with L hidden nodes is represented as follows:
wherein, βiRepresenting the output weight of the ith hidden neuron and the connected output neurons, G being the hidden layer neuron activation function, wiRepresenting input weights of the input layer and the i-th hidden neuron, biRepresents the bias of the i-th hidden neuron, ojIs the actual output value of the jth output neuron;
for N number of sewage treatment failure diagnosis samples, there is one (w)i,bi) And βiSo thatAnd further, the SLFN model is approximated to a sample set by zero errors, namely, the hidden layer feedforward neural network can fit the SLFN model without errors, namely:it is expressed as H β ═ T, where:
wherein, H is a hidden layer output matrix, β is an output weight matrix, and T is an output layer output matrix;
when the activation function G is a differentiable function, the SLFN parameters need not be adjusted in their entirety, and the link weights w are inputiAnd hidden layer bias biRandomly selected during the initialization process of network parameters and kept unchanged during the training process, the training SLFN is equivalent to solving the least square solution of the linear system H β ═ T, and can be converted into the following optimization problem:
Minimize:||Hβ-T||2and β ceiling
The optimization problem is mathematically expressed as:
wherein, ξi=[ξi,1,…ξi,K]TIs a sewage treatment fault diagnosis training sample xiError between output value and true value of its corresponding output nodeDifference vector, Moore-Penrose generalized inverse matrix H output by hidden layer neurons+Can be solved to obtain:
the orthogonal projection method KKT can be used for effectively aligning H+Solve when HTH or HHTIn the case of a non-singular matrix H+=(HTH)-1HTOr H+=HT(HTH)-1In order to obtain better stability and generalization performance of the obtained model, the solution is carried outWhen it is needed to HTH or HHTDiagonal element plus a positive valueObtaining:
i denotes the identity matrix and the corresponding output function is:
or when:
the corresponding ELM output function is:
for better handling of unbalanced data, each sample is weighted such that samples belonging to different classes get different weights, so the mathematical form of the above-mentioned optimization problem is rewritten as:
where W is an N diagonal matrix of definitions, each main diagonal element WiiCorrespond to one sample xiDifferent types of samples will be automatically allocated with different weights, C is a normalization coefficient;
defining Lagrange function to solve the quadratic programming problem according to the KKT optimization condition, and then equivalently solving the following formula:
Minimize:
wherein, αiAre Lagrange multipliers, all non-negative;
the corresponding KKT optimization limiting conditions are as follows:
the algorithm solves the hidden layer output weight as:
the weighting scheme employs the sample weight distribution D in step S2.5t;
When the hidden layer feature map h (x) is unknown, the kernel matrix is defined as follows:
ΩELM=HHT:ΩELMi,j=h(xi)·h(xj)=K(xi,xj)i=1,2,…,N;j=1,2,…,N
this kernel function K (·) needs to satisfy the Mercer condition, at which time the output expression is written as:
therefore, the hidden layer feature mapping of the ELM can be kept unknown, and meanwhile, the number L of the hidden layer neurons does not need to be set;
the final output equation of the weighted extreme learning machine based on the kernel function is as follows:
where I is an identity matrix, C is a normalization coefficient, W is a weighting matrix, T is an output layer matrix, and ΩELMIs a kernel matrix;
in summary, the process of the weighted extreme learning machine training algorithm based on the kernel function is as follows:
s2.2.1, giving each sample weight according to the weighting scheme, and calculating a weighting matrix W;
s2.2.2, calculating a kernel matrix omega according to the kernel functionELM;
S2.2.3, calculating the output result f (x) of the network.
Step S3, a novel integrated algorithm-based classifier weight updating formula is provided, a weighted extreme learning machine is used as a base classifier, a plurality of base classifiers are integrated by an Adaboost iteration method, and an improved sewage fault diagnosis model is established, wherein the steps and the processes are as follows:
s3.1, setting the number of the base classifiers of the integration algorithm and recording as T;
s3.2, determining a sample x according to a weight initialization methodiInitial weight distribution D of1(i):i=1,2,…,N;
S3.3, training T pieces according to the method of S2A base classifier for updating the formula based on the weight of the base classifierCalculating the weight of the base classifier;
s3.4, integrating the T basic classifiers to obtain a sewage fault diagnosis model:
and finishing modeling of the sewage fault diagnosis model.
And step S4, setting the number T of the base classifiers of the integrated classifier to be 20, and searching the kernel width gamma and the normalization coefficient C of the base classifier which meet the optimal performance of the algorithm by adopting a grid parameter optimization mode. The optimization range of gamma is {2-18,2(-18+step),…,220Step is 0.5; c has an optimization range of {2 }-18,2(-18+step),…,250},step=0.5。
The data of experimental simulation comes from California university database (UCI) and is daily monitoring data of a sewage treatment plant, the dimension of each sample of the whole data set is 38, all attribute values are completely recorded with 380, 13 states of the monitored water body are totally obtained, and each state is replaced by a number. To simplify the complexity of classification, we classified the samples into 4 broad classes according to the nature of the class of samples, as shown in table 1 below. In table 1, the category 1 is a normal case, the category 2 is a normal case in which the performance exceeds the average value, the category 3 is a normal case in which the inflow rate is low, and the category 4 is a failure case due to a failure in the secondary sedimentation tank, an abnormal state due to heavy rain, and an overload in the solid solubility. The number of the class 1 samples in the normal condition is more, and the samples belong to a plurality of classes; and the category 3 and the category 4 belong to a few categories due to the small number of samples, and the distribution ratio of the four categories of samples is 39.6:14.6:8:1 through simplification of the data categories. The parameter optimization shows that the two weight initialization schemes adopted by the software example respectively have the following optimal parameters: w1 (C2)26.5,γ=213),W2:(C=227.5,γ=213.5)。
According to the steps, 3/4 of a sewage sample set, namely 285 groups of samples in total, is used as a training sample set in a simulation experiment, different weight initialization schemes are adopted, a final classification model is generated through integrated iteration, and the remaining sample set is used as a test sample and is substituted into the classification model to obtain a final classification result, namely a sewage treatment fault diagnosis result. Wherein AdaG1WELM represents the algorithm adopting the W1 initial weight scheme, and AdaG2WELM represents the algorithm adopting the W2 initial weight scheme.
TABLE 1 sample Category number distribution
TABLE 2 results compared to conventional Classification Algorithm
TABLE 3 comparison of results with current similar algorithms
Tables 2 and 3 show the experimental results comparing the algorithms used in the present invention (AdaG1WKELM and AdaG2WKELM) with the conventional classification algorithm and the current similar research algorithm, respectively. The traditional classification algorithm comprises a Back Propagation Neural Network (BPNN), a Support Vector Machine (SVM), a Relevance Vector Machine (RVM), a Fast relevance vector machine (Fast RVM), an Extreme Learning Machine (ELM), and a weighted extreme learning machine (K-WELM) based on a kernel function; current similar research algorithms include B-PCA-CBPNN, wellm, and Pre-processed Fast RVM. R1-acc, R2-acc, R3-acc and R4-acc respectively represent the classification accuracy of each class, Total acc represents the overall classification accuracy, and G-mean ═ is (R1×R2×R3×R4)1/4Training time represents the modulusType training time. It can be known from the table that although the classification accuracy of AdaG1WKELM and AdaG2WKELM for most types of samples is lower than that of other types of algorithms, the classification accuracy for a few types of samples is higher, especially the classification accuracy of the fourth type, i.e. the fault class, and the overall G-mean value and the overall accuracy are the greatest. It can be seen that the algorithm used by the software is well suited to classify unbalanced data sets. In conclusion, the fault diagnosis method based on the G-mean integrated extreme learning machine used by the software can accurately judge the faults which may occur in the sewage treatment process, and the fault treatment capacity of the sewage treatment plant is enhanced.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.
Claims (3)
1. An improved integrated weighted learning machine sewage treatment fault diagnosis method is characterized by comprising the following steps:
s1, assigning the initial weight of the weighted extreme learning machine by adopting an assignment formula which tends to a few samples aiming at the base classifier;
there are two weight initialization schemes, one is an automatic weighting scheme:wherein W1Denotes a first weighting scheme, nkThe number of samples corresponding to the training samples with the category k is obtained;
another idea of weight initialization is to push the ratio of minority classes and majority classes towards 0.618:1, which is essentially to trade off the recognition accuracy of minority classes by sacrificing the classification accuracy of majority classes:wherein W2Representing a second weighting scheme;
s2, training a base classifier: calculating the recall rate recall and the performance evaluation index G-mean value of the previous base classifier, adopting an initial weight matrix updating formula based on G-mean, adjusting the weight matrix of the next base classifier of the weighted extreme learning machine and establishing a base classifier model, wherein the steps are as follows:
s2.1, giving a sewage sample set { (x)1,y1),(x2,y2),…,(xi,yi),…,(xN,yN) In which xiE X denotes the attribute value of the ith sample, yiIndicates the category label corresponding to the ith sample, N is the total number of samples, yiE.y {1,2, …, K, …, K }, where K denotes the kth class and K denotes a total of K classes; setting the number of base classifiers of the integration algorithm and recording as T;
s2.2, training the training samples by using a weighted kernel extreme learning machine as a base classifier to obtain a training model ht(x),For the t-th base classifier ht(x) First, the recall rate R of each class is obtained1,R2,…Rk,…,RKK is the kth class, K is the total number of classes, and then the number of each class is calculated as nkAnd classification result A (x) of each samplei) In the case of a partial pair, A (x)i) 1 ═ 1; if the error is found, A (x)i) -1; finally, G _ mean ═ R is obtained1·R2…RK)1/K;
S2.3, if G _ mean is less than or equal to 0.5, exiting iteration;
s2.4, according to the calculation base classifier ht(x) Weight calculation formula ofComputing the weight λ of the tth base classifiertThe smaller G _ mean, λtThe smaller the training error, the smaller the weight of the t-th base classifier in the whole integrated algorithm, and vice versa;
S2.5, adjusting weight distribution D of next iteration of samplet+1,Dt+1The adjustment rule of (2) is as follows:
s2.6, making T equal to T +1, if T is less than T, returning to S2.2, otherwise, ending;
s3, providing an integrated algorithm-based classifier weight value updating formula, integrating a plurality of base classifiers by using a weighted extreme learning machine as a base classifier and an Adaboost iteration method, and establishing an improved sewage fault diagnosis model, wherein the steps and processes are as follows:
s3.1, setting the number of the base classifiers of the integration algorithm and recording as T;
s3.2, determining a sample x according to a weight initialization methodiInitial weight distribution D of1(i):i=1,2,…,N;
S3.3, training T base classifiers according to the method of S2, and updating the formula according to the weights of the base classifiersCalculating the weight of the base classifier;
s4, inputting sample data generated in the sewage treatment process, setting the number of the base classifiers of the integrated algorithm as T, setting the optimal kernel width gamma of the base classifiers and the corresponding optimal regularization coefficient C, establishing a fault diagnosis model of the sewage treatment system and carrying out performance test.
2. The improved integrated weighted learning extreme machine sewage treatment fault diagnosis method as claimed in claim 1, wherein in step S2.2, the modeling of the weighted kernel extreme learning machine is specifically as follows:
the extreme learning machine adopts a framework of a single hidden layer feedforward neural network SLFN, and N sewage treatment fault diagnosis training samples are given (x)1,y1),(x2,y2),…,(xN,yN) The standard SLFN output model with L hidden nodes is represented as follows:
wherein, βiRepresenting the output weight of the ith hidden neuron and the connected output neurons, G being the hidden layer neuron activation function, wiRepresenting input weights of the input layer and the i-th hidden neuron, biRepresents the bias of the i-th hidden neuron, ojIs the actual output value of the jth output neuron;
for N number of sewage treatment failure diagnosis samples, there is one (w)i,bi) And βiSo thatAnd further, the SLFN model is approximated to a sample set by zero errors, namely, the hidden layer feedforward neural network can fit the SLFN model without errors, namely:it is expressed as H β ═ T, where:
wherein, H is a hidden layer output matrix, β is an output weight matrix, and T is an output layer output matrix;
when the activation function G is a differentiable function, the SLFN parameters need not be adjusted all together, and the chaining weight is inputWeight wiAnd hidden layer bias biRandomly selected during the initialization process of network parameters and kept unchanged during the training process, the training SLFN is equivalent to solving the least square solution of the linear system H β ═ T, and can be converted into the following optimization problem:
Minimize:||Hβ-T||2and β ceiling
The optimization problem is mathematically expressed as:
Subject to:
wherein, ξi=[ξi,1,…ξi,K]TIs a sewage treatment fault diagnosis training sample xiThe Moore-Penrose generalized inverse matrix H + output by the neuron of the hidden layer can be solved to obtain the error vector between the output value of the corresponding output node and the real value:
the orthogonal projection method KKT can be used for effectively aligning H+Solve when HTH or HHTIn the case of a non-singular matrix H+=(HTH)-1HTOr H+=HT(HTH)-1In order to obtain better stability and generalization performance of the obtained model, the solution is carried outWhen it is needed to HTH or HHTDiagonal element plus a positive valueObtaining:
i denotes the identity matrix and the corresponding output function is:
or when:
the corresponding ELM output function is:
for better handling of unbalanced data, each sample is weighted such that samples belonging to different classes get different weights, so the mathematical form of the above-mentioned optimization problem is rewritten as:
Subject to:
where W is an N diagonal matrix of definitions, each main diagonal element WiiCorrespond to one sample xiDifferent types of samples will be automatically allocated with different weights, C is a normalization coefficient;
according to the KKT optimization condition, defining a Lagrange function to solve a quadratic programming problem, and then equivalently solving the following formula:
wherein the content of the first and second substances,αiare Lagrange multipliers, all non-negative;
the corresponding KKT optimization limiting conditions are as follows:
the algorithm solves the hidden layer output weight as:
the weighting scheme employs the sample weight distribution D in step S2.5t;
When the hidden layer feature map h (x) is unknown, the kernel matrix is defined as follows:
ΩELM=HHT:ΩELMi,j=h(xi)·h(xj)=K(xi,xj)i=1,2,…,N;j=1,2,…,N
this kernel function K (·) needs to satisfy the Mercer condition, at which time the output expression is written as:
therefore, the hidden layer feature mapping of the ELM can be kept unknown, and meanwhile, the number L of the hidden layer neurons does not need to be set;
the final output equation of the weighted extreme learning machine based on the kernel function is as follows:
where I is an identity matrix, C is a normalization coefficient, W is a weighting matrix, T is an output layer matrix, and ΩELMIs a kernel matrix;
in summary, the process of the weighted extreme learning machine training algorithm based on the kernel function is as follows:
s2.2.1, giving each sample weight according to the weighting scheme, and calculating a weighting matrix W;
s2.2.2, calculating a kernel matrix omega according to the kernel functionELM;
S2.2.3, calculating the output result f (x) of the network.
3. The improved integrated weighted learning machine sewage treatment fault diagnosis method according to claim 1, characterized in that: in step S4, the number T of base classifiers of the ensemble classifier is set to 20, and the kernel width γ and normalization coefficient C of the base classifiers satisfying the optimal performance of the algorithm are found by using the mesh parameter optimization, where γ is found in the optimization range of {2 {-18,2(-18+step),…,220Step is 0.5; c has an optimization range of {2 }-18,2(-18+step),…,250},step=0.5。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710654311.3A CN107688825B (en) | 2017-08-03 | 2017-08-03 | Improved integrated weighted extreme learning machine sewage treatment fault diagnosis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710654311.3A CN107688825B (en) | 2017-08-03 | 2017-08-03 | Improved integrated weighted extreme learning machine sewage treatment fault diagnosis method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107688825A CN107688825A (en) | 2018-02-13 |
CN107688825B true CN107688825B (en) | 2020-02-18 |
Family
ID=61153142
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710654311.3A Expired - Fee Related CN107688825B (en) | 2017-08-03 | 2017-08-03 | Improved integrated weighted extreme learning machine sewage treatment fault diagnosis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107688825B (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109190280A (en) * | 2018-09-18 | 2019-01-11 | 东北农业大学 | A kind of pollution source of groundwater inverting recognition methods based on core extreme learning machine alternative model |
CN109558893B (en) * | 2018-10-31 | 2022-12-16 | 华南理工大学 | Rapid integrated sewage treatment fault diagnosis method based on resampling pool |
CN109492710B (en) * | 2018-12-07 | 2021-07-13 | 天津智行瑞祥汽车科技有限公司 | New energy automobile fault detection auxiliary method |
CN109739209A (en) * | 2018-12-11 | 2019-05-10 | 深圳供电局有限公司 | A kind of electric network failure diagnosis method based on Classification Data Mining |
CN109858564B (en) * | 2019-02-21 | 2023-05-05 | 上海电力学院 | Improved Adaboost-SVM model generation method suitable for wind power converter fault diagnosis |
CN110084291B (en) * | 2019-04-12 | 2021-10-22 | 湖北工业大学 | Student behavior analysis method and device based on big data extreme learning |
CN110363230B (en) * | 2019-06-27 | 2021-07-20 | 华南理工大学 | Stacking integrated sewage treatment fault diagnosis method based on weighted base classifier |
CN111160457B (en) * | 2019-12-27 | 2023-07-11 | 南京航空航天大学 | Scroll engine fault detection method based on soft-class extreme learning machine |
CN112257942B (en) * | 2020-10-29 | 2023-11-14 | 中国特种设备检测研究院 | Stress corrosion cracking prediction method and system |
CN112183676A (en) * | 2020-11-10 | 2021-01-05 | 浙江大学 | Water quality soft measurement method based on mixed dimensionality reduction and kernel function extreme learning machine |
CN113323823B (en) * | 2021-06-08 | 2022-10-25 | 云南大学 | AWKELM-based fan blade icing fault detection method and system |
CN113551904B (en) * | 2021-06-29 | 2023-06-30 | 西北工业大学 | Gear box multi-type concurrent fault diagnosis method based on hierarchical machine learning |
CN113965449B (en) * | 2021-09-28 | 2023-04-18 | 南京航空航天大学 | Method for improving fault diagnosis accuracy rate of self-organizing cellular network based on evolution weighted width learning system |
CN114154701A (en) * | 2021-11-25 | 2022-03-08 | 南方电网数字电网研究院有限公司 | Power failure prediction method and device based on weighted extreme learning machine |
CN114492164A (en) * | 2021-12-24 | 2022-05-13 | 吉林大学 | Organic pollutant migration numerical model substitution method based on multi-core extreme learning machine |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103473598A (en) * | 2013-09-17 | 2013-12-25 | 山东大学 | Extreme learning machine based on length-changing particle swarm optimization algorithm |
KR20140127061A (en) * | 2013-04-24 | 2014-11-03 | 주식회사 지넬릭스 | Oral Hygiene functional composition and a method of manufacturing |
CN105631477A (en) * | 2015-12-25 | 2016-06-01 | 天津大学 | Traffic sign recognition method based on extreme learning machine and self-adaptive lifting |
CN105740619A (en) * | 2016-01-28 | 2016-07-06 | 华南理工大学 | On-line fault diagnosis method of weighted extreme learning machine sewage treatment on the basis of kernel function |
CN106874934A (en) * | 2017-01-12 | 2017-06-20 | 华南理工大学 | Sewage disposal method for diagnosing faults based on weighting extreme learning machine Integrated Algorithm |
-
2017
- 2017-08-03 CN CN201710654311.3A patent/CN107688825B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20140127061A (en) * | 2013-04-24 | 2014-11-03 | 주식회사 지넬릭스 | Oral Hygiene functional composition and a method of manufacturing |
CN103473598A (en) * | 2013-09-17 | 2013-12-25 | 山东大学 | Extreme learning machine based on length-changing particle swarm optimization algorithm |
CN105631477A (en) * | 2015-12-25 | 2016-06-01 | 天津大学 | Traffic sign recognition method based on extreme learning machine and self-adaptive lifting |
CN105740619A (en) * | 2016-01-28 | 2016-07-06 | 华南理工大学 | On-line fault diagnosis method of weighted extreme learning machine sewage treatment on the basis of kernel function |
CN106874934A (en) * | 2017-01-12 | 2017-06-20 | 华南理工大学 | Sewage disposal method for diagnosing faults based on weighting extreme learning machine Integrated Algorithm |
Non-Patent Citations (2)
Title |
---|
《boosting weighted ELM for imbalanced learning》;LI K et al;《Neurocomputing》;20131025;全文 * |
《不平衡模糊加权极限学习机及其集成方法研究》;姚乔兵;《中国优秀硕士学位论文全文数据库信息科技辑》;20170315(第2017年第03期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN107688825A (en) | 2018-02-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107688825B (en) | Improved integrated weighted extreme learning machine sewage treatment fault diagnosis method | |
CN105740619B (en) | Weighting extreme learning machine sewage disposal on-line fault diagnosis method based on kernel function | |
CN106874581B (en) | Building air conditioner energy consumption prediction method based on BP neural network model | |
CN108228716B (en) | SMOTE _ Bagging integrated sewage treatment fault diagnosis method based on weighted extreme learning machine | |
CN106874934A (en) | Sewage disposal method for diagnosing faults based on weighting extreme learning machine Integrated Algorithm | |
CN108445752B (en) | Random weight neural network integrated modeling method for self-adaptively selecting depth features | |
CN106600059A (en) | Intelligent power grid short-term load predication method based on improved RBF neural network | |
CN111427750B (en) | GPU power consumption estimation method, system and medium of computer platform | |
CN108805193B (en) | Electric power missing data filling method based on hybrid strategy | |
CN110009030B (en) | Sewage treatment fault diagnosis method based on stacking meta-learning strategy | |
CN110363230B (en) | Stacking integrated sewage treatment fault diagnosis method based on weighted base classifier | |
CN106778838A (en) | A kind of method for predicting air quality | |
CN109284662B (en) | Underwater sound signal classification method based on transfer learning | |
CN111985845B (en) | Node priority optimization method of heterogeneous Spark cluster | |
CN106296434B (en) | Grain yield prediction method based on PSO-LSSVM algorithm | |
CN113379116A (en) | Cluster and convolutional neural network-based line loss prediction method for transformer area | |
CN108805206A (en) | A kind of modified LSSVM method for building up for analog circuit fault classification | |
CN108763418A (en) | A kind of sorting technique and device of text | |
CN108074011A (en) | The monitoring method and system of a kind of sludge discharge | |
CN107544447A (en) | A kind of chemical process Fault Classification based on core study | |
Tucci et al. | Adaptive FIR neural model for centroid learning in self-organizing maps | |
CN113296947B (en) | Resource demand prediction method based on improved XGBoost model | |
CN107766887A (en) | A kind of local weighted deficiency of data mixes clustering method | |
CN114814707A (en) | Intelligent ammeter stress error analysis method, equipment, terminal and readable medium | |
Yang et al. | An improved probabilistic neural network with ga optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200218 |