CN107688825B - Improved integrated weighted extreme learning machine sewage treatment fault diagnosis method - Google Patents

Improved integrated weighted extreme learning machine sewage treatment fault diagnosis method Download PDF

Info

Publication number
CN107688825B
CN107688825B CN201710654311.3A CN201710654311A CN107688825B CN 107688825 B CN107688825 B CN 107688825B CN 201710654311 A CN201710654311 A CN 201710654311A CN 107688825 B CN107688825 B CN 107688825B
Authority
CN
China
Prior art keywords
output
weight
matrix
sewage treatment
learning machine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710654311.3A
Other languages
Chinese (zh)
Other versions
CN107688825A (en
Inventor
许玉格
赖春伶
孙称立
陈立定
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201710654311.3A priority Critical patent/CN107688825B/en
Publication of CN107688825A publication Critical patent/CN107688825A/en
Application granted granted Critical
Publication of CN107688825B publication Critical patent/CN107688825B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses an improved integrated weighted extreme learning machine sewage treatment fault diagnosis method, which comprises the following steps: s1, assigning the initial weight of the weighted extreme learning machine by adopting an assignment formula which tends to a few samples aiming at the base classifier; s2, training a base classifier; s3, providing a novel integrated algorithm-based classifier weight updating formula, integrating a plurality of base classifiers by using a weighted extreme learning machine as a base classifier and an Adaboost iteration method, and establishing an improved sewage fault diagnosis model; s4, inputting sample data generated in the sewage treatment process, setting the number T of the base classifiers of the integrated algorithm, the optimal kernel width gamma of the base classifiers and the corresponding optimal regularization coefficient C, establishing a fault diagnosis model of the sewage treatment system and carrying out performance test. The invention can realize the classification of the unbalanced data of a plurality of classes, improves the classification performance of the unbalanced data, particularly the classification accuracy of a few classes, and effectively improves the accuracy of fault diagnosis in the sewage treatment process.

Description

Improved integrated weighted extreme learning machine sewage treatment fault diagnosis method
Technical Field
The invention relates to the technical field of sewage treatment fault diagnosis, in particular to an improved sewage treatment fault diagnosis method of an integrated weighted extreme learning machine.
Background
The sewage treatment is a complex biochemical process with a great number of influencing factors, the sewage treatment plant is difficult to keep long-term stable operation, and the fault easily causes serious problems of substandard effluent quality, increased operating cost, secondary environmental pollution and the like, so the operating state of the sewage treatment plant needs to be monitored, the operating fault is diagnosed and timely treated.
The fault diagnosis of the sewage treatment process is actually a problem of pattern recognition, and the problem of unbalanced distribution of sewage data sets is often encountered in the classification process. The traditional machine learning method is easy to make the classification accuracy biased to the majority, and the actual classification is more important to the classification accuracy of the minority, namely the classification accuracy of the fault class. The fault can be timely and accurately found, so that the loss of the sewage treatment plant can be reduced to a great extent, and the working efficiency of the sewage treatment plant is improved.
Disclosure of Invention
The invention provides an improved integrated weighted extreme learning machine sewage treatment fault diagnosis method aiming at the fault diagnosis problem of a sewage treatment plant, and the method introduces an unbalanced classification evaluation index G-mean into an Adaboost integrated classification algorithm which takes a weighted extreme learning machine as a base classifier, is used for fault diagnosis in the sewage treatment process, can realize the classification of unbalanced data of multiple categories, improves the classification performance of the unbalanced data, particularly the classification accuracy of a few categories, and effectively improves the fault diagnosis accuracy in the sewage treatment process.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: an improved integrated weighted extreme learning machine sewage treatment fault diagnosis method comprises the following steps:
s1, assigning the initial weight of the weighted extreme learning machine by adopting an assignment formula which tends to a few samples aiming at the base classifier;
s2, training a base classifier: calculating the recall rate recall and the performance evaluation index G-mean value of the previous base classifier, adopting an initial weight matrix updating formula based on G-mean, adjusting the weight matrix of the next base classifier of the weighted extreme learning machine and establishing a base classifier model, wherein the steps are as follows:
s2.1, giving a sewage sample set { (x)1,y1),(x2,y2),…,(xi,yi),…,(xN,yN) In which xiE X denotes the attribute value of the ith sample, yiIndicates the category label corresponding to the ith sample, N is the total number of samples, yiE.y {1,2, …, K, …, K }, where K denotes the kth class and K denotes a total of K classes; setting the number of base classifiers of the integration algorithm and recording as T;
s2.2, training the training samples by using a weighted kernel extreme learning machine as a base classifier to obtain a training model htFor the t-th base classifier htFirst, the recall rate R of each class is obtained1,R2,…Rk,…,RKK is the kth class, M is the total number of classes, and then the number of each class is calculated as nkAnd classification result A (x) of each samplei) In the case of a partial pair, A (x)i) 1 ═ 1; if the error is found, A (x)i) -1; finally, G _ mean ═ R is obtained1·R2…RK)1/K
S2.3, if G _ mean is less than or equal to 0.5, exiting iteration;
s2.4, according to the calculation base classifier htWeight calculation formula of
Figure GDA0002237216500000022
Computing the weight λ of the tth base classifiertThe smaller G _ mean, λtThe smaller the training error is, the smaller the proportion of the t-th base classifier in the whole integration algorithm is, and vice versa;
s2.5, adjusting weight distribution D of next iteration of samplet+1,Dt+1The adjustment rule of (2) is as follows:
Figure GDA0002237216500000023
s2.6, making T equal to T +1, if T is less than T, returning to S2.2, otherwise, ending;
s3, providing a novel weight updating formula of an integrated algorithm-based classifier, integrating a plurality of base classifiers by using a weighted extreme learning machine as a base classifier and an Adaboost iteration method, and establishing an improved sewage fault diagnosis model, wherein the weight updating formula comprises the following steps:
s3.1, setting the number of the base classifiers of the integration algorithm and recording as T;
s3.2, determining a sample x according to a weight initialization methodiInitial weight distribution D of1(i):i=1,2,…,N;
S3.3Training T base classifiers according to the method of S2, and updating the formula according to the weights of the base classifiers
Figure GDA0002237216500000031
Calculating the weight of the base classifier;
s3.4, integrating the T basic classifiers to obtain a sewage fault diagnosis model:
Figure GDA0002237216500000032
s4, inputting sample data generated in the sewage treatment process, setting the number of the base classifiers of the integrated algorithm as T, setting the optimal kernel width gamma of the base classifiers and the corresponding optimal regularization coefficient C, establishing a fault diagnosis model of the sewage treatment system and carrying out performance test.
In step S1, two weight initialization schemes are selected, one is an automatic weighting scheme:
Figure GDA0002237216500000033
wherein W1Denotes a first weighting scheme, nkThe number of samples corresponding to the training samples with the category k is obtained;
another idea of weight initialization is to push the ratio of minority classes and majority classes towards 0.618:1, which is essentially to trade off the recognition accuracy of minority classes by sacrificing the classification accuracy of majority classes:
Figure GDA0002237216500000034
wherein W2Representing a second weighting scheme.
In step S2.2, the modeling of the weighted kernel limit learning machine is specifically as follows:
the extreme learning machine adopts a framework of a single hidden layer feedforward neural network SLFN, and N sewage treatment fault diagnosis training samples are given (x)1,y1),(x2,y2),…,(xN,yN) The standard SLFN output model with L hidden nodes is represented as follows:
Figure GDA0002237216500000041
wherein, βiRepresenting the output weight of the ith hidden neuron and the connected output neurons, G being the hidden layer neuron activation function, wiRepresenting input weights of the input layer and the i-th hidden neuron, biRepresents the bias of the i-th hidden neuron, ojIs the actual output value of the jth output neuron;
for N number of sewage treatment failure diagnosis samples, there is one (w)i,bi) And βiSo thatAnd further, the SLFN model is approximated to a sample set by zero errors, namely, the hidden layer feedforward neural network can fit the SLFN model without errors, namely:
Figure GDA0002237216500000043
it is expressed as H β ═ T, where:
Figure GDA0002237216500000044
Figure GDA0002237216500000045
wherein, H is a hidden layer output matrix, β is an output weight matrix, and T is an output layer output matrix;
when the activation function G is a differentiable function, the SLFN parameters need not be adjusted in their entirety, and the link weights w are inputiAnd hidden layer bias biRandomly selected during the initialization process of network parameters and kept unchanged during the training process, the training SLFN is equivalent to solving the least square solution of the linear system H β ═ T, and can be converted into the following optimization problem:
Minimize:||Hβ-T||2and β ceiling
The optimization problem is mathematically expressed as:
Minimize:
Subject to:
Figure GDA0002237216500000051
wherein, ξi=[ξi,1,…ξi,K]TIs a sewage treatment fault diagnosis training sample xiThe Moore-Penrose generalized inverse matrix H output by the neuron of the hidden layer is the error vector between the output value of the corresponding output node and the real value+Can be solved to obtain:
Figure GDA0002237216500000052
the orthogonal projection method KKT can be used for effectively aligning H+Solve when HTH or HHTIn the case of a non-singular matrix H+=(HTH)-1HTOr H+=HT(HTH)-1In order to obtain better stability and generalization performance of the obtained model, the solution is carried outWhen it is needed to HTH or HHTDiagonal element plus a positive valueObtaining:
Figure GDA0002237216500000055
i denotes the identity matrix and the corresponding output function is:
Figure GDA0002237216500000056
or when:
Figure GDA0002237216500000057
the corresponding ELM output function is:
Figure GDA0002237216500000058
for better handling of unbalanced data, each sample is weighted such that samples belonging to different classes get different weights, so the mathematical form of the above-mentioned optimization problem is rewritten as:
Minimize:
Figure GDA0002237216500000059
Subject to:
where W is an N diagonal matrix of definitions, each main diagonal element WiiCorrespond to one sample xiDifferent types of samples will be automatically allocated with different weights, C is a normalization coefficient;
defining Lagrange function to solve the quadratic programming problem according to the KKT optimization condition, and then equivalently solving the following formula:
Minimize:
Figure GDA0002237216500000061
wherein, αiAre Lagrange multipliers, all non-negative;
the corresponding KKT optimization limiting conditions are as follows:
Figure GDA0002237216500000062
Figure GDA0002237216500000063
the algorithm solves the hidden layer output weight as:
Figure GDA0002237216500000065
the weighting scheme employs the sample weight distribution D in step S2.5t
When the hidden layer feature map h (x) is unknown, the kernel matrix is defined as follows:
ΩELM=HHTELMi,j=h(xi)·h(xj)=K(xi,xj)i=1,2,…,N;j=1,2,…,N
this kernel function K (·) needs to satisfy the Mercer condition, at which time the output expression is written as:
therefore, the hidden layer feature mapping of the ELM can be kept unknown, and meanwhile, the number L of the hidden layer neurons does not need to be set;
the final output equation of the weighted extreme learning machine based on the kernel function is as follows:
Figure GDA0002237216500000072
where I is an identity matrix, C is a normalization coefficient, W is a weighting matrix, T is an output layer matrix, and ΩELMIs a kernel matrix;
in summary, the process of the weighted extreme learning machine training algorithm based on the kernel function is as follows:
s2.2.1, giving each sample weight according to the weighting scheme, and calculating a weighting matrix W;
s2.2.2, calculating a kernel matrix omega according to the kernel functionELM
S2.2.3, calculating the output result f (x) of the network.
In step S4, the number T of base classifiers of the ensemble classifier is set to 20, and the kernel width γ and normalization coefficient C of the base classifiers satisfying the optimal performance of the algorithm are found by using the mesh parameter optimization, where γ is found in the optimization range of {2 {-18,2(-18+step),…,220Step is 0.5; c has an optimization range of {2 }-18,2(-18+step),…,250},step=0.5。
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the method firstly introduces the unbalanced classification evaluation index G-mean into an Adaboost integrated classification algorithm which takes a weighted extreme learning machine as a base classifier, and provides a novel integrated algorithm base classifier weight value updating formula.
2. The method of the invention firstly provides an initial weight matrix updating formula based on G-mean, which is used for modeling a weighted extreme learning machine.
3. The invention adopts the classifier of the weighted extreme learning machine as the base classifier of the integrated learning algorithm, and can improve the learning speed of the classifier, thereby realizing the real-time and accurate monitoring of the running state of the sewage treatment plant.
4. The method can improve the integral classification accuracy of the sewage treatment ancient fault diagnosis system, especially can improve the identification accuracy of fault categories, and has important significance for fault early warning and timely treatment of the sewage treatment system.
5. The method can effectively ensure the stable operation of the sewage treatment plant and the sewage treatment quality, and reduce secondary pollution.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The present invention will be further described with reference to the following specific examples.
Referring to fig. 1, the integrated weighted learning extreme machine sewage treatment fault diagnosis method provided by the embodiment includes the following steps:
step S1, initial base classifier weighted extreme learning machineAnd (6) assigning a weight value. There are two weight initialization schemes, one is an automatic weighting scheme:
Figure GDA0002237216500000081
wherein W1Denotes a first weighting scheme, nkThe number of samples corresponding to the training samples with the category k is obtained;
another idea of weight initialization is to push the ratio of minority classes and majority classes towards 0.618:1, which is essentially to trade off the recognition accuracy of minority classes by sacrificing the classification accuracy of majority classes:
Figure GDA0002237216500000082
wherein W2Representing a second weighting scheme.
Step S2, training a base classifier:
s2.1, giving a sewage sample set { (x)1,y1),(x2,y2),…,(xi,yi),…,(xN,yN) In which xiE X denotes the attribute value of the ith sample, yiIndicates the category label corresponding to the ith sample, N is the total number of samples, yiE.y {1,2, …, K, …, K }, where K denotes the kth class and K denotes a total of K classes; setting the number of base classifiers of the integration algorithm and recording as T;
s2.2, training the training samples by using a weighted kernel extreme learning machine as a base classifier to obtain a training model ht
Figure GDA0002237216500000093
For the t-th base classifier htFirst, the recall rate R of each class is obtained1,R2,…Rk,…,RKK is the kth class, M is the total number of classes, and then the number of each class is calculated as nkAnd classification result A (x) of each samplei) In the case of a partial pair, A (x)i) 1 ═ 1; if the error is found, A (x)i) -1; finally, G _ mean ═ R is obtained1·R2…RK)1/K
S2.3, if G _ mean is less than or equal to 0.5, exiting iteration;
s2.4, according to the calculation base classifier htWeight calculation formula of
Figure GDA0002237216500000091
Computing the weight λ of the tth base classifiertThe smaller G _ mean, λtThe smaller the training error is, the smaller the proportion of the t-th base classifier in the whole integration algorithm is, and vice versa;
s2.5, adjusting weight distribution D of next iteration of samplet+1,Dt+1The adjustment rule of (2) is as follows:
Figure GDA0002237216500000092
s2.6, making T equal to T +1, if T is less than T, returning to S2.2, otherwise, ending;
and finishing training the base classifier.
In the step S2.2, the modeling of the weighted kernel limit learning machine specifically includes the following steps:
the extreme learning machine adopts a framework of a single hidden layer feedforward neural network SLFN, and N sewage treatment fault diagnosis training samples are given (x)1,y1),(x2,y2),…,(xN,yN) The standard SLFN output model with L hidden nodes is represented as follows:
Figure GDA0002237216500000101
wherein, βiRepresenting the output weight of the ith hidden neuron and the connected output neurons, G being the hidden layer neuron activation function, wiRepresenting input weights of the input layer and the i-th hidden neuron, biRepresents the bias of the i-th hidden neuron, ojIs the actual output value of the jth output neuron;
for N number of sewage treatment failure diagnosis samples, there is one (w)i,bi) And βiSo that
Figure GDA0002237216500000102
And further, the SLFN model is approximated to a sample set by zero errors, namely, the hidden layer feedforward neural network can fit the SLFN model without errors, namely:it is expressed as H β ═ T, where:
Figure GDA0002237216500000104
Figure GDA0002237216500000105
wherein, H is a hidden layer output matrix, β is an output weight matrix, and T is an output layer output matrix;
when the activation function G is a differentiable function, the SLFN parameters need not be adjusted in their entirety, and the link weights w are inputiAnd hidden layer bias biRandomly selected during the initialization process of network parameters and kept unchanged during the training process, the training SLFN is equivalent to solving the least square solution of the linear system H β ═ T, and can be converted into the following optimization problem:
Minimize:||Hβ-T||2and β ceiling
The optimization problem is mathematically expressed as:
Minimize:
Figure GDA0002237216500000111
Subject to:
Figure GDA0002237216500000112
wherein, ξi=[ξi,1,…ξi,K]TIs a sewage treatment fault diagnosis training sample xiError between output value and true value of its corresponding output nodeDifference vector, Moore-Penrose generalized inverse matrix H output by hidden layer neurons+Can be solved to obtain:
Figure GDA0002237216500000113
the orthogonal projection method KKT can be used for effectively aligning H+Solve when HTH or HHTIn the case of a non-singular matrix H+=(HTH)-1HTOr H+=HT(HTH)-1In order to obtain better stability and generalization performance of the obtained model, the solution is carried out
Figure GDA0002237216500000114
When it is needed to HTH or HHTDiagonal element plus a positive value
Figure GDA0002237216500000115
Obtaining:
Figure GDA0002237216500000116
i denotes the identity matrix and the corresponding output function is:
Figure GDA0002237216500000117
or when:
Figure GDA0002237216500000118
the corresponding ELM output function is:
for better handling of unbalanced data, each sample is weighted such that samples belonging to different classes get different weights, so the mathematical form of the above-mentioned optimization problem is rewritten as:
Minimize:
Figure GDA0002237216500000121
Subject to:
Figure GDA0002237216500000122
where W is an N diagonal matrix of definitions, each main diagonal element WiiCorrespond to one sample xiDifferent types of samples will be automatically allocated with different weights, C is a normalization coefficient;
defining Lagrange function to solve the quadratic programming problem according to the KKT optimization condition, and then equivalently solving the following formula:
Minimize:
wherein, αiAre Lagrange multipliers, all non-negative;
the corresponding KKT optimization limiting conditions are as follows:
Figure GDA0002237216500000124
Figure GDA0002237216500000125
the algorithm solves the hidden layer output weight as:
Figure GDA0002237216500000127
the weighting scheme employs the sample weight distribution D in step S2.5t
When the hidden layer feature map h (x) is unknown, the kernel matrix is defined as follows:
ΩELM=HHTELMi,j=h(xi)·h(xj)=K(xi,xj)i=1,2,…,N;j=1,2,…,N
this kernel function K (·) needs to satisfy the Mercer condition, at which time the output expression is written as:
Figure GDA0002237216500000131
therefore, the hidden layer feature mapping of the ELM can be kept unknown, and meanwhile, the number L of the hidden layer neurons does not need to be set;
the final output equation of the weighted extreme learning machine based on the kernel function is as follows:
Figure GDA0002237216500000132
where I is an identity matrix, C is a normalization coefficient, W is a weighting matrix, T is an output layer matrix, and ΩELMIs a kernel matrix;
in summary, the process of the weighted extreme learning machine training algorithm based on the kernel function is as follows:
s2.2.1, giving each sample weight according to the weighting scheme, and calculating a weighting matrix W;
s2.2.2, calculating a kernel matrix omega according to the kernel functionELM
S2.2.3, calculating the output result f (x) of the network.
Step S3, a novel integrated algorithm-based classifier weight updating formula is provided, a weighted extreme learning machine is used as a base classifier, a plurality of base classifiers are integrated by an Adaboost iteration method, and an improved sewage fault diagnosis model is established, wherein the steps and the processes are as follows:
s3.1, setting the number of the base classifiers of the integration algorithm and recording as T;
s3.2, determining a sample x according to a weight initialization methodiInitial weight distribution D of1(i):i=1,2,…,N;
S3.3, training T pieces according to the method of S2A base classifier for updating the formula based on the weight of the base classifier
Figure GDA0002237216500000133
Calculating the weight of the base classifier;
s3.4, integrating the T basic classifiers to obtain a sewage fault diagnosis model:
Figure GDA0002237216500000141
and finishing modeling of the sewage fault diagnosis model.
And step S4, setting the number T of the base classifiers of the integrated classifier to be 20, and searching the kernel width gamma and the normalization coefficient C of the base classifier which meet the optimal performance of the algorithm by adopting a grid parameter optimization mode. The optimization range of gamma is {2-18,2(-18+step),…,220Step is 0.5; c has an optimization range of {2 }-18,2(-18+step),…,250},step=0.5。
The data of experimental simulation comes from California university database (UCI) and is daily monitoring data of a sewage treatment plant, the dimension of each sample of the whole data set is 38, all attribute values are completely recorded with 380, 13 states of the monitored water body are totally obtained, and each state is replaced by a number. To simplify the complexity of classification, we classified the samples into 4 broad classes according to the nature of the class of samples, as shown in table 1 below. In table 1, the category 1 is a normal case, the category 2 is a normal case in which the performance exceeds the average value, the category 3 is a normal case in which the inflow rate is low, and the category 4 is a failure case due to a failure in the secondary sedimentation tank, an abnormal state due to heavy rain, and an overload in the solid solubility. The number of the class 1 samples in the normal condition is more, and the samples belong to a plurality of classes; and the category 3 and the category 4 belong to a few categories due to the small number of samples, and the distribution ratio of the four categories of samples is 39.6:14.6:8:1 through simplification of the data categories. The parameter optimization shows that the two weight initialization schemes adopted by the software example respectively have the following optimal parameters: w1 (C2)26.5,γ=213),W2:(C=227.5,γ=213.5)。
According to the steps, 3/4 of a sewage sample set, namely 285 groups of samples in total, is used as a training sample set in a simulation experiment, different weight initialization schemes are adopted, a final classification model is generated through integrated iteration, and the remaining sample set is used as a test sample and is substituted into the classification model to obtain a final classification result, namely a sewage treatment fault diagnosis result. Wherein AdaG1WELM represents the algorithm adopting the W1 initial weight scheme, and AdaG2WELM represents the algorithm adopting the W2 initial weight scheme.
TABLE 1 sample Category number distribution
Figure GDA0002237216500000142
Figure GDA0002237216500000151
TABLE 2 results compared to conventional Classification Algorithm
TABLE 3 comparison of results with current similar algorithms
Figure GDA0002237216500000153
Tables 2 and 3 show the experimental results comparing the algorithms used in the present invention (AdaG1WKELM and AdaG2WKELM) with the conventional classification algorithm and the current similar research algorithm, respectively. The traditional classification algorithm comprises a Back Propagation Neural Network (BPNN), a Support Vector Machine (SVM), a Relevance Vector Machine (RVM), a Fast relevance vector machine (Fast RVM), an Extreme Learning Machine (ELM), and a weighted extreme learning machine (K-WELM) based on a kernel function; current similar research algorithms include B-PCA-CBPNN, wellm, and Pre-processed Fast RVM. R1-acc, R2-acc, R3-acc and R4-acc respectively represent the classification accuracy of each class, Total acc represents the overall classification accuracy, and G-mean ═ is (R1×R2×R3×R4)1/4Training time represents the modulusType training time. It can be known from the table that although the classification accuracy of AdaG1WKELM and AdaG2WKELM for most types of samples is lower than that of other types of algorithms, the classification accuracy for a few types of samples is higher, especially the classification accuracy of the fourth type, i.e. the fault class, and the overall G-mean value and the overall accuracy are the greatest. It can be seen that the algorithm used by the software is well suited to classify unbalanced data sets. In conclusion, the fault diagnosis method based on the G-mean integrated extreme learning machine used by the software can accurately judge the faults which may occur in the sewage treatment process, and the fault treatment capacity of the sewage treatment plant is enhanced.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims (3)

1. An improved integrated weighted learning machine sewage treatment fault diagnosis method is characterized by comprising the following steps:
s1, assigning the initial weight of the weighted extreme learning machine by adopting an assignment formula which tends to a few samples aiming at the base classifier;
there are two weight initialization schemes, one is an automatic weighting scheme:
Figure FDA0002282416790000011
wherein W1Denotes a first weighting scheme, nkThe number of samples corresponding to the training samples with the category k is obtained;
another idea of weight initialization is to push the ratio of minority classes and majority classes towards 0.618:1, which is essentially to trade off the recognition accuracy of minority classes by sacrificing the classification accuracy of majority classes:
Figure FDA0002282416790000012
wherein W2Representing a second weighting scheme;
s2, training a base classifier: calculating the recall rate recall and the performance evaluation index G-mean value of the previous base classifier, adopting an initial weight matrix updating formula based on G-mean, adjusting the weight matrix of the next base classifier of the weighted extreme learning machine and establishing a base classifier model, wherein the steps are as follows:
s2.1, giving a sewage sample set { (x)1,y1),(x2,y2),…,(xi,yi),…,(xN,yN) In which xiE X denotes the attribute value of the ith sample, yiIndicates the category label corresponding to the ith sample, N is the total number of samples, yiE.y {1,2, …, K, …, K }, where K denotes the kth class and K denotes a total of K classes; setting the number of base classifiers of the integration algorithm and recording as T;
s2.2, training the training samples by using a weighted kernel extreme learning machine as a base classifier to obtain a training model ht(x),
Figure FDA0002282416790000013
For the t-th base classifier ht(x) First, the recall rate R of each class is obtained1,R2,…Rk,…,RKK is the kth class, K is the total number of classes, and then the number of each class is calculated as nkAnd classification result A (x) of each samplei) In the case of a partial pair, A (x)i) 1 ═ 1; if the error is found, A (x)i) -1; finally, G _ mean ═ R is obtained1·R2…RK)1/K
S2.3, if G _ mean is less than or equal to 0.5, exiting iteration;
s2.4, according to the calculation base classifier ht(x) Weight calculation formula of
Figure FDA0002282416790000021
Computing the weight λ of the tth base classifiertThe smaller G _ mean, λtThe smaller the training error, the smaller the weight of the t-th base classifier in the whole integrated algorithm, and vice versa;
S2.5, adjusting weight distribution D of next iteration of samplet+1,Dt+1The adjustment rule of (2) is as follows:
Figure FDA0002282416790000022
s2.6, making T equal to T +1, if T is less than T, returning to S2.2, otherwise, ending;
s3, providing an integrated algorithm-based classifier weight value updating formula, integrating a plurality of base classifiers by using a weighted extreme learning machine as a base classifier and an Adaboost iteration method, and establishing an improved sewage fault diagnosis model, wherein the steps and processes are as follows:
s3.1, setting the number of the base classifiers of the integration algorithm and recording as T;
s3.2, determining a sample x according to a weight initialization methodiInitial weight distribution D of1(i):i=1,2,…,N;
S3.3, training T base classifiers according to the method of S2, and updating the formula according to the weights of the base classifiers
Figure FDA0002282416790000023
Calculating the weight of the base classifier;
s3.4, integrating the T basic classifiers to obtain a sewage fault diagnosis model:
Figure FDA0002282416790000024
s4, inputting sample data generated in the sewage treatment process, setting the number of the base classifiers of the integrated algorithm as T, setting the optimal kernel width gamma of the base classifiers and the corresponding optimal regularization coefficient C, establishing a fault diagnosis model of the sewage treatment system and carrying out performance test.
2. The improved integrated weighted learning extreme machine sewage treatment fault diagnosis method as claimed in claim 1, wherein in step S2.2, the modeling of the weighted kernel extreme learning machine is specifically as follows:
the extreme learning machine adopts a framework of a single hidden layer feedforward neural network SLFN, and N sewage treatment fault diagnosis training samples are given (x)1,y1),(x2,y2),…,(xN,yN) The standard SLFN output model with L hidden nodes is represented as follows:
Figure FDA0002282416790000031
wherein, βiRepresenting the output weight of the ith hidden neuron and the connected output neurons, G being the hidden layer neuron activation function, wiRepresenting input weights of the input layer and the i-th hidden neuron, biRepresents the bias of the i-th hidden neuron, ojIs the actual output value of the jth output neuron;
for N number of sewage treatment failure diagnosis samples, there is one (w)i,bi) And βiSo that
Figure FDA0002282416790000032
And further, the SLFN model is approximated to a sample set by zero errors, namely, the hidden layer feedforward neural network can fit the SLFN model without errors, namely:
Figure FDA0002282416790000033
it is expressed as H β ═ T, where:
Figure FDA0002282416790000034
Figure FDA0002282416790000035
wherein, H is a hidden layer output matrix, β is an output weight matrix, and T is an output layer output matrix;
when the activation function G is a differentiable function, the SLFN parameters need not be adjusted all together, and the chaining weight is inputWeight wiAnd hidden layer bias biRandomly selected during the initialization process of network parameters and kept unchanged during the training process, the training SLFN is equivalent to solving the least square solution of the linear system H β ═ T, and can be converted into the following optimization problem:
Minimize:||Hβ-T||2and β ceiling
The optimization problem is mathematically expressed as:
Minimize:
Figure FDA0002282416790000041
Subject to:
wherein, ξi=[ξi,1,…ξi,K]TIs a sewage treatment fault diagnosis training sample xiThe Moore-Penrose generalized inverse matrix H + output by the neuron of the hidden layer can be solved to obtain the error vector between the output value of the corresponding output node and the real value:
Figure FDA0002282416790000043
the orthogonal projection method KKT can be used for effectively aligning H+Solve when HTH or HHTIn the case of a non-singular matrix H+=(HTH)-1HTOr H+=HT(HTH)-1In order to obtain better stability and generalization performance of the obtained model, the solution is carried outWhen it is needed to HTH or HHTDiagonal element plus a positive value
Figure FDA0002282416790000045
Obtaining:
Figure FDA0002282416790000046
i denotes the identity matrix and the corresponding output function is:
Figure FDA0002282416790000047
or when:
Figure FDA0002282416790000048
the corresponding ELM output function is:
Figure FDA0002282416790000051
for better handling of unbalanced data, each sample is weighted such that samples belonging to different classes get different weights, so the mathematical form of the above-mentioned optimization problem is rewritten as:
Minimize:
Figure FDA0002282416790000052
Subject to:
where W is an N diagonal matrix of definitions, each main diagonal element WiiCorrespond to one sample xiDifferent types of samples will be automatically allocated with different weights, C is a normalization coefficient;
according to the KKT optimization condition, defining a Lagrange function to solve a quadratic programming problem, and then equivalently solving the following formula:
Minimize:
Figure FDA0002282416790000054
wherein the content of the first and second substances,αiare Lagrange multipliers, all non-negative;
the corresponding KKT optimization limiting conditions are as follows:
Figure FDA0002282416790000055
Figure FDA0002282416790000056
Figure FDA0002282416790000057
the algorithm solves the hidden layer output weight as:
Figure FDA0002282416790000058
the weighting scheme employs the sample weight distribution D in step S2.5t
When the hidden layer feature map h (x) is unknown, the kernel matrix is defined as follows:
ΩELM=HHTELMi,j=h(xi)·h(xj)=K(xi,xj)i=1,2,…,N;j=1,2,…,N
this kernel function K (·) needs to satisfy the Mercer condition, at which time the output expression is written as:
Figure FDA0002282416790000061
therefore, the hidden layer feature mapping of the ELM can be kept unknown, and meanwhile, the number L of the hidden layer neurons does not need to be set;
the final output equation of the weighted extreme learning machine based on the kernel function is as follows:
Figure FDA0002282416790000062
where I is an identity matrix, C is a normalization coefficient, W is a weighting matrix, T is an output layer matrix, and ΩELMIs a kernel matrix;
in summary, the process of the weighted extreme learning machine training algorithm based on the kernel function is as follows:
s2.2.1, giving each sample weight according to the weighting scheme, and calculating a weighting matrix W;
s2.2.2, calculating a kernel matrix omega according to the kernel functionELM
S2.2.3, calculating the output result f (x) of the network.
3. The improved integrated weighted learning machine sewage treatment fault diagnosis method according to claim 1, characterized in that: in step S4, the number T of base classifiers of the ensemble classifier is set to 20, and the kernel width γ and normalization coefficient C of the base classifiers satisfying the optimal performance of the algorithm are found by using the mesh parameter optimization, where γ is found in the optimization range of {2 {-18,2(-18+step),…,220Step is 0.5; c has an optimization range of {2 }-18,2(-18+step),…,250},step=0.5。
CN201710654311.3A 2017-08-03 2017-08-03 Improved integrated weighted extreme learning machine sewage treatment fault diagnosis method Expired - Fee Related CN107688825B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710654311.3A CN107688825B (en) 2017-08-03 2017-08-03 Improved integrated weighted extreme learning machine sewage treatment fault diagnosis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710654311.3A CN107688825B (en) 2017-08-03 2017-08-03 Improved integrated weighted extreme learning machine sewage treatment fault diagnosis method

Publications (2)

Publication Number Publication Date
CN107688825A CN107688825A (en) 2018-02-13
CN107688825B true CN107688825B (en) 2020-02-18

Family

ID=61153142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710654311.3A Expired - Fee Related CN107688825B (en) 2017-08-03 2017-08-03 Improved integrated weighted extreme learning machine sewage treatment fault diagnosis method

Country Status (1)

Country Link
CN (1) CN107688825B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190280A (en) * 2018-09-18 2019-01-11 东北农业大学 A kind of pollution source of groundwater inverting recognition methods based on core extreme learning machine alternative model
CN109558893B (en) * 2018-10-31 2022-12-16 华南理工大学 Rapid integrated sewage treatment fault diagnosis method based on resampling pool
CN109492710B (en) * 2018-12-07 2021-07-13 天津智行瑞祥汽车科技有限公司 New energy automobile fault detection auxiliary method
CN109739209A (en) * 2018-12-11 2019-05-10 深圳供电局有限公司 A kind of electric network failure diagnosis method based on Classification Data Mining
CN109858564B (en) * 2019-02-21 2023-05-05 上海电力学院 Improved Adaboost-SVM model generation method suitable for wind power converter fault diagnosis
CN110084291B (en) * 2019-04-12 2021-10-22 湖北工业大学 Student behavior analysis method and device based on big data extreme learning
CN110363230B (en) * 2019-06-27 2021-07-20 华南理工大学 Stacking integrated sewage treatment fault diagnosis method based on weighted base classifier
CN111160457B (en) * 2019-12-27 2023-07-11 南京航空航天大学 Scroll engine fault detection method based on soft-class extreme learning machine
CN112257942B (en) * 2020-10-29 2023-11-14 中国特种设备检测研究院 Stress corrosion cracking prediction method and system
CN112183676A (en) * 2020-11-10 2021-01-05 浙江大学 Water quality soft measurement method based on mixed dimensionality reduction and kernel function extreme learning machine
CN113323823B (en) * 2021-06-08 2022-10-25 云南大学 AWKELM-based fan blade icing fault detection method and system
CN113551904B (en) * 2021-06-29 2023-06-30 西北工业大学 Gear box multi-type concurrent fault diagnosis method based on hierarchical machine learning
CN113965449B (en) * 2021-09-28 2023-04-18 南京航空航天大学 Method for improving fault diagnosis accuracy rate of self-organizing cellular network based on evolution weighted width learning system
CN114154701A (en) * 2021-11-25 2022-03-08 南方电网数字电网研究院有限公司 Power failure prediction method and device based on weighted extreme learning machine
CN114492164A (en) * 2021-12-24 2022-05-13 吉林大学 Organic pollutant migration numerical model substitution method based on multi-core extreme learning machine

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473598A (en) * 2013-09-17 2013-12-25 山东大学 Extreme learning machine based on length-changing particle swarm optimization algorithm
KR20140127061A (en) * 2013-04-24 2014-11-03 주식회사 지넬릭스 Oral Hygiene functional composition and a method of manufacturing
CN105631477A (en) * 2015-12-25 2016-06-01 天津大学 Traffic sign recognition method based on extreme learning machine and self-adaptive lifting
CN105740619A (en) * 2016-01-28 2016-07-06 华南理工大学 On-line fault diagnosis method of weighted extreme learning machine sewage treatment on the basis of kernel function
CN106874934A (en) * 2017-01-12 2017-06-20 华南理工大学 Sewage disposal method for diagnosing faults based on weighting extreme learning machine Integrated Algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140127061A (en) * 2013-04-24 2014-11-03 주식회사 지넬릭스 Oral Hygiene functional composition and a method of manufacturing
CN103473598A (en) * 2013-09-17 2013-12-25 山东大学 Extreme learning machine based on length-changing particle swarm optimization algorithm
CN105631477A (en) * 2015-12-25 2016-06-01 天津大学 Traffic sign recognition method based on extreme learning machine and self-adaptive lifting
CN105740619A (en) * 2016-01-28 2016-07-06 华南理工大学 On-line fault diagnosis method of weighted extreme learning machine sewage treatment on the basis of kernel function
CN106874934A (en) * 2017-01-12 2017-06-20 华南理工大学 Sewage disposal method for diagnosing faults based on weighting extreme learning machine Integrated Algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《boosting weighted ELM for imbalanced learning》;LI K et al;《Neurocomputing》;20131025;全文 *
《不平衡模糊加权极限学习机及其集成方法研究》;姚乔兵;《中国优秀硕士学位论文全文数据库信息科技辑》;20170315(第2017年第03期);全文 *

Also Published As

Publication number Publication date
CN107688825A (en) 2018-02-13

Similar Documents

Publication Publication Date Title
CN107688825B (en) Improved integrated weighted extreme learning machine sewage treatment fault diagnosis method
CN105740619B (en) Weighting extreme learning machine sewage disposal on-line fault diagnosis method based on kernel function
CN106874581B (en) Building air conditioner energy consumption prediction method based on BP neural network model
CN108228716B (en) SMOTE _ Bagging integrated sewage treatment fault diagnosis method based on weighted extreme learning machine
CN106874934A (en) Sewage disposal method for diagnosing faults based on weighting extreme learning machine Integrated Algorithm
CN108445752B (en) Random weight neural network integrated modeling method for self-adaptively selecting depth features
CN106600059A (en) Intelligent power grid short-term load predication method based on improved RBF neural network
CN111427750B (en) GPU power consumption estimation method, system and medium of computer platform
CN108805193B (en) Electric power missing data filling method based on hybrid strategy
CN110009030B (en) Sewage treatment fault diagnosis method based on stacking meta-learning strategy
CN110363230B (en) Stacking integrated sewage treatment fault diagnosis method based on weighted base classifier
CN106778838A (en) A kind of method for predicting air quality
CN109284662B (en) Underwater sound signal classification method based on transfer learning
CN111985845B (en) Node priority optimization method of heterogeneous Spark cluster
CN106296434B (en) Grain yield prediction method based on PSO-LSSVM algorithm
CN113379116A (en) Cluster and convolutional neural network-based line loss prediction method for transformer area
CN108805206A (en) A kind of modified LSSVM method for building up for analog circuit fault classification
CN108763418A (en) A kind of sorting technique and device of text
CN108074011A (en) The monitoring method and system of a kind of sludge discharge
CN107544447A (en) A kind of chemical process Fault Classification based on core study
Tucci et al. Adaptive FIR neural model for centroid learning in self-organizing maps
CN113296947B (en) Resource demand prediction method based on improved XGBoost model
CN107766887A (en) A kind of local weighted deficiency of data mixes clustering method
CN114814707A (en) Intelligent ammeter stress error analysis method, equipment, terminal and readable medium
Yang et al. An improved probabilistic neural network with ga optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200218