CN115719033A - Coal mill fault diagnosis method and system based on multi-model fusion - Google Patents

Coal mill fault diagnosis method and system based on multi-model fusion Download PDF

Info

Publication number
CN115719033A
CN115719033A CN202211287127.7A CN202211287127A CN115719033A CN 115719033 A CN115719033 A CN 115719033A CN 202211287127 A CN202211287127 A CN 202211287127A CN 115719033 A CN115719033 A CN 115719033A
Authority
CN
China
Prior art keywords
model
training
data
coal mill
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211287127.7A
Other languages
Chinese (zh)
Inventor
魏勇
孙胡彬
江学文
周晓亮
李楠
叶君辉
赵敏
寿志杰
詹港明
卢子轩
李锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Jiyi Technology Co ltd
Original Assignee
Hangzhou Jiyi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Jiyi Technology Co ltd filed Critical Hangzhou Jiyi Technology Co ltd
Priority to CN202211287127.7A priority Critical patent/CN115719033A/en
Publication of CN115719033A publication Critical patent/CN115719033A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a coal mill fault diagnosis method and system based on multi-model fusion. The problem of prior art can not judge coal pulverizer trouble classification, can't in time take corresponding measure to deal with the coal pulverizer trouble is solved. The method comprises the steps of collecting, denoising and normalizing historical operating data to construct a training set and a verification set, training a first layer model through various prediction algorithms, reconstructing output data, and training to construct a second layer model as a coal mill fault diagnosis model. And a multi-model fusion method is adopted to fuse and integrate a plurality of fault prediction models, and a coal mill fault diagnosis model is established, so that the accurate diagnosis of the faults and the categories of the coal mill is realized. The established coal mill fault diagnosis model effectively prevents the over-fitting phenomenon, has excellent model generalization capability, is simple to realize, is high in calculation efficiency, and has high model correction redundancy degree. And an optimal hyper-parameter combination is found by combining a grid search method, so that the accuracy of the model is improved, and a high-performance coal mill fault diagnosis model is obtained.

Description

Coal mill fault diagnosis method and system based on multi-model fusion
Technical Field
The invention relates to the technical field of coal mill fault diagnosis, in particular to a coal mill fault diagnosis method and system based on multi-model fusion.
Background
The coal mill is an important auxiliary equipment for the combustion of the boiler of the coal-fired power plant and is a core equipment of a pulverizing system, and the running state of the coal mill directly influences the safety and the economical efficiency of the running of the boiler. The working environment of the coal mill is severe, the coal mill is in a high-load operation state for a long time, the coal quality of coal of a coal-fired power plant is complex and various, the internal structure and the working process of the coal mill are complex, and therefore faults of the coal mill occur frequently and high fault risks are faced in daily operation. Once the coal mill fails, the combustion of the boiler can be directly influenced, and even the boiler can be shut down in severe cases.
The fault types of the coal mill are various, including and not limited to abnormal vibration, internal ignition, coal blockage, coal breakage and the like, and corresponding measures need to be taken for timely intervention according to different fault risks. However, at present, the coal-fired power plant mainly carries out the conventional mode of regular inspection or data monitoring, and the fault risk is difficult to diagnose timely and effectively. Therefore, the realization of the online fault diagnosis of the coal mill is an urgent problem to be solved in a coal-fired power plant, and has important engineering application value for the safe and economic operation of a unit.
In recent years, with the development of artificial intelligence technology, a machine learning-based coal mill fault model is gradually proposed. At present, fault diagnosis for a coal mill mainly focuses on monitoring of operating parameters such as current, outlet air temperature and outlet air pressure of the coal mill, monitored real-time data are compared with data predicted by an established fault model, and when the difference value of the monitored real-time data and the data is larger than a set threshold value, fault diagnosis of the coal mill is achieved. However, the above method cannot judge the fault type, and cannot take corresponding measures to cope with the fault of the coal mill in time.
Disclosure of Invention
The invention mainly solves the problems that the prior art cannot judge the fault type of a coal mill and cannot take corresponding measures to cope with the fault of the coal mill in time, and provides a coal mill fault diagnosis method and system based on multi-model fusion.
The technical problem of the invention is mainly solved by the following technical scheme: a coal mill fault diagnosis method based on multi-model fusion is characterized by comprising the following steps: the method comprises the following steps:
s1, collecting historical operating data of a coal mill, and constructing a cross validation training set and a validation set;
s2, setting multiple algorithms to respectively construct training models, and respectively performing hyper-parameter optimization training on each training model according to a training set to obtain multiple first-layer prediction models;
s3, outputting prediction data according to the first layer prediction model, and performing combined reconstruction on the prediction data to obtain second layer training data;
s4, constructing a second-layer training model, and performing regression training according to second-layer training data to obtain a second-layer prediction model;
and S5, collecting the operation data of the coal mill to carry out fault diagnosis.
The invention adopts a multi-model fusion method to fuse and integrate a plurality of fault prediction models, establishes the fault diagnosis model of the coal mill and realizes the accurate diagnosis of the faults and the categories of the coal mill. The coal mill fault diagnosis model established by the invention can effectively prevent the over-fitting phenomenon, has excellent model generalization capability, is simple to realize and efficient in calculation, and has high model correction redundancy degree. And an optimal hyper-parameter combination is found by combining a grid search method, so that the accuracy of the model is further improved, and a high-performance coal mill fault diagnosis model is obtained.
The collected coal mill historical operating data comprises input data and output data, wherein the input data comprises and is not limited to the coal feeding amount of the coal mill, the current of the coal mill, the opening degree of a hot primary air baffle, the opening degree of a cold primary air baffle, the air temperature of an inlet of the coal mill, the air quantity of an inlet of the coal mill, the air pressure of an outlet of the coal mill, the temperature of an outlet of the coal mill, the pressure difference between the upper part and the lower part of a grinding bowl, the pressure difference between the lower part of a sealing air/grinding bowl, the rotating speed of the grinding bowl of the coal mill, the speed of a rotary separator, the speed of a rotor of the rotary separator, the current of the rotary separator, the temperature of a bearing on the rotary separator, the temperature of a lower bearing of the rotary separator, the temperature of lubricating oil of a speed reducer and the shaft temperature of an output shaft of the speed reducer of the separator.
The output data is in a fault state, including no fault, coal blockage, coal breakage, abnormal vibration, internal ignition, increased inlet air pressure and abnormal pebble coal discharge.
The input data of the operation data form a matrix of
Figure BDA0003899869470000021
The output data being in a fault state
Figure BDA0003899869470000022
Where n is the size of the data sample size and m is the number of input parameters.
As a preferred scheme, the method further comprises preprocessing the data after the data are acquired, wherein the preprocessing comprises data denoising and data normalization.
Denoising data: a Hankel filtering algorithm is adopted to carry out denoising processing on a noise signal in original data, and the method comprises the following specific steps:
(1) Let a one-dimensional signal x i =(x 1i ,x 2i ,…,x ni )=X i T I =1,2, \ 8230;, m, resulting in a Hankel matrix H i
Figure BDA0003899869470000031
Matrix H i I.e. a Hankel matrix, with the same elements on each anti-diagonal, H i Can obtain the original one-dimensional signal x by cyclic shift of each column or each row element i
(2) For matrix H i Performing Singular Value Decomposition (SVD):
H i =U ii V i T =∑ n j=1 σ ij H ij
Figure BDA0003899869470000032
wherein, U i And V i Are unitary matrices of n × n, satisfying U i T U i =U i U i T =I,V i T V i =V i V i T I, I is the identity matrix. Sigma i =[σ i1i2 ,…,σ in ]As a Hankel matrix H i N singular values of and satisfies sigma i1i2 >…>σ in 。H ij For the j-th singular value σ ij And (5) reconstructing a matrix.
(3) Reconstructing post-filter matrix H i *
As the correlation of the noise signal in the row and column directions is weak and the corresponding singular value is small, the first r singular values are intercepted to reconstruct and remove the noise, and a new filtered matrix H is obtained i *
Figure BDA0003899869470000041
(4) Reconstructing post-filter Hankel matrix
Figure BDA0003899869470000042
Will matrix H i * Elements on the inverse diagonal line are summed and averaged to obtain a new Hankel matrix
Figure BDA0003899869470000043
Figure BDA0003899869470000044
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003899869470000045
calculating by the formula:
Figure BDA0003899869470000046
to obtain new
Figure BDA0003899869470000047
Namely, the new data after the original data is denoised. De-noising is carried out on each one-dimensional signal, and a de-noised input data matrix is finally obtained
Figure BDA0003899869470000051
Comprises the following steps:
Figure BDA0003899869470000052
data normalization:
denoising the original input data by adopting a z-score method
Figure BDA0003899869470000053
Carrying out standardization treatment, wherein the calculation formulas are respectively as follows:
Figure BDA0003899869470000054
in the formula (I), the compound is shown in the specification,
Figure BDA0003899869470000055
i is more than or equal to 1 and less than or equal to n, j is more than or equal to 1 and less than or equal to m,
Figure BDA0003899869470000056
for denoised data, mu j For de-noised data
Figure BDA0003899869470000057
Mean value of (a) j For de-noised data
Figure BDA0003899869470000058
Standard deviation of (2).
Obtaining a data matrix R after data standardization as follows:
Figure BDA0003899869470000061
Figure BDA0003899869470000062
as a preferred scheme, the specific process of constructing the cross-validation training set and the validation set in step S1 includes:
randomly dividing data into training data and verification data according to a proportion;
equally dividing training data into K folds, taking K-1 fold as a training set, taking 1 fold as a verification set, and taking each fold as a verification set to generate K groups of training sets and verification sets.
The method specifically includes the steps that a preprocessed data matrix R is randomly divided into training data (Train) and testing data (Test), the data volume ratio between the training data and the testing data is 8:
Figure BDA0003899869470000063
Figure BDA0003899869470000064
subdividing the training data equally into K-fold (R) train_i I =1,2,3, \ 8230; K), where K-1 is the training set (R) part_train_i I =1,2,3, \ 8230; K), 1 fold as validation set (R) part_valid_i I =1,2,3, \8230Ok), and generating K groups of training sets and verification sets by taking each foldover as a verification set, specifically:
R train_i is R train I =1,2, \ 8230;, K:
Figure BDA0003899869470000065
training set R part_train_i Respectively as follows:
Figure BDA0003899869470000066
Figure BDA0003899869470000071
Figure BDA0003899869470000072
Figure BDA0003899869470000073
Figure BDA0003899869470000081
verification set R part_valid_i Comprises the following steps:
Figure BDA0003899869470000082
as a preferred scheme, an improved random forest algorithm, a Catboost algorithm and a deep neural network algorithm are respectively adopted for training to obtain a first-layer prediction model. Expressed by algorithm, the algorithm is one of RFC \ CatB \ DNN, and English letters are used for replacing description.
As a preferred scheme, the process of performing hyper-parameter optimization training on each training model in step S2 includes:
randomly acquiring a hyper-parameter combination of a set algorithm, training according to the set algorithm and a training set to obtain an initial model, outputting a plurality of initial prediction output values by the initial model, calculating a MacroF1 value of the initial prediction output value and a corresponding training data output value, and obtaining an average MacroF1 value;
setting the search times, repeating the steps, and selecting the maximum value from the obtained average MacroF1 values after the search is finished to obtain a corresponding hyper-parameter combination as an optimal hyper-parameter combination;
setting the optimal hyper-parameter combination as the final hyper-parameter of the set algorithm, training according to different training sets to obtain a plurality of first-layer prediction models, predicting the corresponding verification set according to the first-layer prediction models to obtain sub-prediction data, and merging the sub-prediction data to obtain prediction data of each algorithm.
The specific process is as follows:
(1) Before the initial search, setting required optimization super parameters and optimization ranges according to an algorithm, randomly selecting super parameter combinations in the optimization ranges, and setting the search times N of a grid search method;
(2) In the nth search (N is more than or equal to 1 and less than or equal to N), algorithm algorithmm, hyper-parameter combination and training set R are utilized part_train_i (i =1,2,3, \8230; K) training resulted in the nth initial model, and R was paired with the nth initial model part_valid_i Is/are as follows
Figure BDA0003899869470000091
Outputting K nth initial prediction output values Y train_i_n Calculating K nth initial prediction output values Y train_i_n And verification set Y train_i The MacroF1 value of (a) is calculated as follows:
Precision j =TP j /(TP j +FP j ),
Recall j =TP j /(TP j +FN j ),
Precision marco =∑ L j=1 Precision j /|L|,
Recall marco =∑ L j=1 Recall j /|L|,
MacroF1=2*Precision marco *Recall marco /(Precision marco +Recall marco );
and decomposing the multi-classification problem into a plurality of two-classification problems, and calculating various two-classification indexes by taking one class as a positive class and the other classes as negative classes every time. In the formula, TP j When the jth class is a positive class, the number of samples which are actually the positive class and are predicted to be the positive class; FP (Fabry-Perot) j The number of samples which are actually negative but predicted to be positive when the jth class is positive; FN (FN) j The number of samples which are actually positive class but are predicted to be negative class when the jth class is positive class; l is the number of categories; precision j The model accuracy when the jth class is a positive class; recall j Model recall rate when the jth class is a positive class; precision marco The average accuracy of the model; recall marco Average recall for the model; macroF1 is a harmonic mean of model accuracy and recall. The larger the MacroF1 is, the higher the model prediction accuracy is, and the better the model performance is.
(3) K nth initial prediction output values Y are obtained through calculation train_i_n And Y train_i The MacroF1 value of (A) is averaged to obtain MacroF1 mean_n
(4) N = N +1, randomly selecting a new hyper-parameter value in the hyper-parameter optimizing range, continuing searching, repeating the steps (2) and (3) until N > N, and entering the next step.
(5) Obtaining N MacroFs 1 after completing N times of searching mean_n From N MacroF1 mean_n Finding the maximum value in the above step, and finding the maximum value MacroF1 mean_n And the corresponding hyper-parameter combination is used as the optimal hyper-parameter combination Param _ best _ algorithms.
(6) Setting the final over-parameters of the algorithm as the optimal over-parameter combination Param _ best _ algorithm, aiming at K training sets R part_train_i Training to obtain K first-layer prediction models Model _ algorithm i
(7) Using a first layer prediction Model _ algorithmm i Centralize verification
Figure BDA0003899869470000092
Predicting to obtain K sub-prediction values Y validpredict_model_algorithm_i Meanwhile, a first layer prediction Model _ algorithm is adopted i For in the test data
Figure BDA0003899869470000101
Predicting to obtain a sub-test value Y test_model_algorithm_i
(8) Predicting K sub-prediction values Y validpredict_model_algorithm_i Merging to obtain the predicted value Y of the corresponding algorithm validpredict_model_algorithm
Figure BDA0003899869470000102
According to the method, different weights are distributed to the K first-layer prediction models by using the difference degree between the predicted value and the true value, and the weight is lower when the MacroF1 is smaller, so that the method has the beneficial effects of increasing the weight of the high MacroF1, reducing the weight of the low MacroF1 and improving the prediction accuracy.
The weight calculation formula is:
W i =(MacroF1) i /∑ i (MacroF1) i
obtaining a test value Y test_model_algorithm Comprises the following steps:
Y test_model_algorithm =∑ K i=1 W i *Y test_model_algorithm_i
as a preferred scheme, the process of constructing the training model according to the improved random forest algorithm includes:
s211, adopting a training set, adopting values of random forest algorithm for hyper-parameters, and setting an initial set Q of characteristics 0 Number of initialized decision trees T 0 Initializing random forest omega 0
S212, calculating the global weight W of each feature, and sequencing the features from large to small, wherein the calculation process of the global weight W is as follows:
a1. calculating information entropy IE:
IE=-∑P(l)logP(l),
wherein l is the class, P (l) is the class probability;
b1. dividing the node m by the characteristic k, calculating the information gain RE (m, k) of the node division according to IE and the conditional information entropy of the left node and the right node of the division, and further calculating the local weight W of the characteristic k of the decision tree psi Ψ (k):W Ψ (k)=∑ M m=1 RE(m,k)/(M-1),
Wherein M is the number of all nodes in the decision tree, and psi represents a single decision tree;
c1. let e Ψ For the out-of-set error value of the decision tree Ψ, let 1/e Ψ Normalizing the weight of the error value outside the cover of each decision tree:
W Ψ =(1/e Ψ -min Ψ (1/e Ψ ))/(max Ψ (1/e Ψ )-min Ψ (1/e Ψ )),
d1. calculating a global weight W (k) for feature k:
W(k)=∑ Ψ W Ψ (k)W Ψ /max kΨ W Ψ (k)W Ψ
s213. Order
Figure BDA0003899869470000111
|Q 0 I represents the set Q 0 Number of elements of (1), Q 0 V before middle row 0 Put the features of (a) into a set CV;
s214, order I 0 =|Q 0 |-V 0 And will leave I 0 The number of features is put into a set CI;
s215, initializing n =0;
s216, setting r as the number of the selected feature subsets during node division;
s217. When I n R, the following procedure is performed:
a2. compute CI n The mean value μ and standard deviation σ of the global weights W (k) of all features in (a), and setting a threshold t _ value = μ - σ; b2. compare CI n The size of the global weight W (k) and t _ value of all features in (1), if W (k) < t _ value, the feature is selected from CI n Is moved out and put into the set S n The preparation method comprises the following steps of (1) performing;
c2. simultaneous comparison of CI n Global weights W (k) and CV for all features in n Minimum global weight min for mid-feature CVn (W (k')), if W (k) > min CVn (W (k')), the feature is taken from CI n Move into set Z n Performing the following steps;
d2. will S n From Q n In removal of, Q n+1 =Q n -S n
e2. Will Z n Move into CV n In, CV n+1 =CV n +Z n ,CI n+1 =CI n -S n -Z n
f2 order V n+1 =|CV n+1 |,I n+1 =|CV n+1 L, calculate Δ V = V n+1 -V n ,ΔI=I n+1 -I n
g2. Let p be the probability that there is at least one significant feature in the decision tree for node splitting, q be the probability that none of the significant features in all nodes split,
p=1-q=(r/(V+I))/(r/I);
h2. calculating the partial derivatives P of P pairs V and I V And P I
P V =-(Δq/ΔV) T =(V!(V+I-1-r)!r)/((V-r)!(V+I)!),
P I =-(Δq/ΔI) T =((I-1)!(V+I-1-r)!Vr)/((I-r)!(V+I-1)!(V+I));
i2. Calculating the correlation degree between different decision trees:
ρ=1-(r/(V+I))/(r/(V+I-r));
j2. calculate Δ T and round up:
|ΔT|=|ρ*(P V ΔV+P I ΔI)/I-1|;
k2. let T n+1 =T n +ΔT;
l2. Using T n+1 Decision tree and feature set Q n+1 Establishing random forest omega n+1
m2. Calculate global weight W and pair Q n+1 Ranking the features in (1);
n. iteration n, not satisfying I n If r is greater than r, ending the iteration process;
s218, outputting the number T of decision trees when iteration is finished end And a feature set Q end
S219, setting the hyper-parameters to be optimized and the range thereof:
maximum depth of tree max _ depth: [3,16] step size of 1; feature sampling ratio max _ features: [0.5,1.0] with a step size of 0.1; leaf node minimum sample number min _ samples _ leaf: [1,5] with a step size of 1; node partitioning minimum sample number min _ samples _ split: [2,10] step size is 1.
The traditional random forest algorithm generates a random forest model through preset fixed features, the number of decision trees and the like, and actually feature importance and the optimal number have important significance on the performance of the model. The scheme adopts an improved random forest algorithm, selects a mode and a conventional hyper-parameter optimization method from the traditional characteristics in the region, removes unimportant characteristics by adopting an iterative correction mode, and can gradually correct the number of decision trees to gradually establish an optimal random forest model. The improved random forest algorithm has excellent robustness and accuracy, and the effect is superior to that of the traditional random forest algorithm.
After a training model is constructed according to an improved random forest algorithm, a super-parameter optimization training process is adopted to obtain an optimal super-parameter combination:
Param_best_RFC==[T end ,max_depth best ,max_features best ,min_samples_leaf best ,min_samples_split best ]obtaining K improved random forest first layer prediction models Model _ RFC i (i =1,2,3, \ 8230;, K), and improving the random forest algorithm prediction value Y validpredict_model_RFC And a test value Y test_model_RFC
Similarly, a training model is constructed by adopting a Catboost algorithm, and the process is as follows:
setting the over-parameters and the range thereof to be optimized:
decision tree quantities iterations: [50,500] step size 1; learning rate learning _ rate: [0.01,0.1], step size 0.01; regularization coefficient l2_ leaf _ reg: [1,5], step size 0.5; depth of tree: [3,10] step size 1; sample sampling ratio subsample: [0.5,1.0] with a step size of 0.1; column sampling ratio rsm: [0.5,1.0], step size 0.1.
And (3) obtaining an optimal hyper-parameter combination by adopting a hyper-parameter optimization training process:
Param_best_CATB==[iterations best ,learning_rate best ,l2_leaf_reg best ,subsample best ,rsm best ]obtaining K first-layer prediction models Model _ CATB of Catboost i (i =1,2,3, \8230;, K), and the Catboost algorithm predicted value Y validpredict_model_CATB And a test value Y test_model_CATB
The deep neural network algorithm is adopted to construct a training model, and the process is as follows:
the hidden layer is set to be 2 layers, and the hyper-parameters and the range thereof to be optimized are set as follows:
batch size batch _ size: [32,64,128]; the first hidden layer neuron number, first _ hidden _ layer: [7,40] step size 1; the second hidden layer neuron number second _ hidden _ layer: [7,20] step size 1; optimizer: [ 'Adam', 'SGD', 'LBFGS', 'Rprop' ].
And (3) obtaining an optimal hyper-parameter combination by adopting a hyper-parameter optimization training process:
Param_best_DNN==[batch_size best ,
first_hidden_layer best ,second_hidden_layer best ,optimizer best ]obtaining K first-layer prediction models Model _ DNN of the deep neural network i (i =1,2,3, \8230;, K), and the deep neural network algorithm predictor Y validpredict_model_DNN And a test value Y test_model_DNN
The specific process of the step S3 comprises the following steps:
reconstructing and combining predicted values and test values obtained by predicting the three algorithm first-layer prediction models, and respectively combining the predicted values and the test values to obtain a second-layer training data comprising a second-layer training set and a second-layer test set, wherein the second-layer training set is as follows: r train_s2 =[Y validpredict_model_RFC ,Y validpredict_model_CATB ,Y validpredict_model_DNN ,Y train ],
Wherein Y is validpredict_model_RFC ,Y validpredict_model_CATB And Y validpredict_model_DNN Inputting data for the second training set, Y train Outputting data for the second layer training set by using Y in the training data train
The second layer test set is:
R test_s2 =[Y test_model_RFC ,Y test_model_CATB ,Y test_model_DNN ,Y test ],
wherein Y is test_model_RFC ,Y test_model_CATB And Y test_model_DNN Input data for the second level test set, Y test Outputting data for the second layer test set by using Y in the test data test
As a preferable scheme, the specific process of step S4 includes:
and (3) taking the prediction data of each algorithm as input, adopting a multi-classification logistic regression model to construct a second-layer prediction model for training, wherein the loss function is a cross entropy loss function, outputting the probability of each fault category by utilizing softmax regression, and training to obtain an optimal multi-classification logistic regression model as a second-layer prediction model according to a training target of the minimized cross entropy loss function. The method comprises the following specific steps:
will Y validpredict_model_RFC ,Y validpredict_model_CATB And Y validpredict_model_DNN And as input, constructing a second-layer prediction model by adopting a multi-classification logistic regression model, wherein the loss function is a cross entropy loss function, and training the multi-classification logistic regression model.
The cross entropy loss function is:
LOSS=-1/n∑ iL-1 c=0 y ic log(p ic ),
wherein n is the number of samples in the second training set, L is the number of categories, y ic Is a symbolic function, if the class Y of the sample i train_i Is equal to c, then y ic Taking 1, otherwise, taking 0; p is a radical of ic Calculating the prediction probability of the sample i belonging to the category c by adopting softmax, and outputting the probability of each category by utilizing softmax regression, wherein the probability of the fault of the sample i belonging to the c-th category is calculated as follows:
Figure BDA0003899869470000141
where θ is a parameter matrix of a multi-class logistic regression function, x i Is input, namely Y validpredict_model_RFC ,Y validpredict_model_CATB And Y validpredict_model_DNN
Training to obtain an optimal multi-classification Logistic regression Model _ Logistic according to a training target of the minimum cross entropy loss function, wherein the optimal parameter matrix is theta best
As a preferable scheme, the specific process of step S5 includes:
collecting actual data samples;
inputting the samples into a plurality of first-layer prediction models of each algorithm, predicting to obtain a plurality of sub-prediction values, and obtaining a comprehensive prediction value through weighted calculation;
and combining the comprehensive predicted values of the algorithms to form second-layer prediction model input data, inputting the second-layer prediction model input data into the second-layer prediction model, and outputting the class with the maximum probability, namely the fault type. The method specifically comprises the following steps:
collecting actual data sample X new =(x 1 ,x 2 ,…,x m );
De-noising and standardizing the image by Hankel filtering to obtain X new ′=(x′ 1 ,x′ 2 ,…,x′ m );
Mixing X new ' input into each algorithm first prediction model:
passing through Model _ RFC i (i =1,2,3, \8230;, K) prediction results in K sub-prediction values Y _ RFC i And obtaining a comprehensive prediction value Y _ RFC through weighting calculation:
Y_RFC=∑ K i=1 w RFC_i *Y_RFC i
through Model _ CATB i Predicting to obtain K sub-prediction values Y _ CATB i And obtaining a comprehensive predicted value Y _ CATB through weighting calculation: y _ CATB = ∑ Σ K i=1 w CATB_i *Y_CATB i
Through Model _ DNN i Predicting to obtain K sub-prediction values Y _ DNN i And obtaining a comprehensive predicted value Y _ DNN through weighting calculation: y _ DNN = ∑ Σ K i=1 w DNN_i *Y_DNN i
The input data for constructing the second layer prediction model are as follows:
X second =[Y_RFC,Y_CATB,Y_DNN],
x is to be second And inputting the type of the second layer prediction Model _ Logistic to output the type with the highest probability, namely the fault type, so as to realize fault diagnosis.
In addition, the method also comprises a step of early warning after diagnosis, and the process comprises the following steps:
setting a probability difference threshold Δ P min With a time threshold value Δ t max When the predicted state of the second layer prediction model is no fault, the difference value between the maximum probability (no fault) and the next maximum probability (fault i) is made to be delta P 0-i
ΔP 0-i =P 0_first -P i_second Wherein P is 0_first To predict the probability value of the state being fault-free, P i_second The state probability value of the fault i with the occurrence probability of the second largest is obtained; at Δ t max Δ P within the interval 0-i ≤ΔP min And when the fault is always established, providing early warning for the occurrence of the fault i, and feeding back in time so as to remove the fault risk.
A coal mill fault diagnosis system based on multi-model fusion is applied to a coal mill fault diagnosis method based on multi-model fusion: comprises the steps of (a) preparing a mixture of a plurality of raw materials,
a data acquisition module: collecting historical operating data of a coal mill;
a data preprocessing module: preprocessing collected historical operating data;
a model training module: constructing a training set and a verification set of cross verification by utilizing the preprocessed data; respectively constructing training models by adopting various algorithms, and respectively carrying out hyper-parameter optimization training on each training model according to a training set to obtain a plurality of first-layer prediction models; outputting prediction data according to the first layer prediction model, and performing combined reconstruction on the prediction data to obtain second layer training data; constructing a second layer training model, and performing regression training according to second layer training data to obtain a second layer prediction model;
a model deployment module: deploying the trained second-layer prediction model to a server side by a computer;
a result output module: and inputting the real-time coal mill operation data into the deployed second-layer prediction model to obtain the real-time state of the coal mill.
Therefore, the invention has the advantages that:
1. and a multi-model fusion method is adopted to fuse and integrate a plurality of fault prediction models, a coal mill fault diagnosis model is established, and accurate diagnosis of faults and categories of the coal mill is realized.
2. The established coal mill fault diagnosis model can effectively prevent the over-fitting phenomenon, has excellent model generalization capability, is simple to realize, is high in calculation efficiency, and has high model correction redundancy degree. And an optimal hyper-parameter combination is found by combining a grid search method, so that the accuracy of the model is further improved, and a high-performance coal mill fault diagnosis model is obtained.
3. According to the predicted real-time state of the coal mill, corresponding measures can be rapidly implemented according to different fault types, and potential safety hazards and economic losses brought by faults are reduced to the maximum extent. Meanwhile, the model can realize an early warning function, and by setting a threshold value, when the difference value between the probability that the category is failure-free and the probabilities of other failure categories is smaller than the set threshold value in a continuous period, failure early warning feedback is timely carried out, and failure risk elimination is carried out.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a schematic diagram of the algorithm first prediction model building process in the present invention.
Detailed Description
The technical scheme of the invention is further specifically described by the following embodiments and the accompanying drawings.
Example 1:
the method for diagnosing the fault of the coal mill based on the multi-model fusion comprises the following steps as shown in fig. 1:
s1, collecting historical operating data of a coal mill, and constructing a cross validation training set and a validation set; the method specifically comprises the following steps:
s11, collected coal mill historical operating data comprises input data and output data, wherein the input data comprises and is not limited to coal feeding quantity of a coal mill, coal mill current, hot primary air baffle opening, cold primary air baffle opening, coal mill inlet air temperature, coal mill inlet air quantity, coal mill inlet air pressure, coal mill outlet temperature, upper and lower grinding bowl differential pressure, sealing air/grinding bowl lower differential pressure, coal mill grinding bowl rotating speed, rotary separator rotor speed, rotary separator current, rotary separator upper bearing temperature, rotary separator lower bearing temperature, separator reducer lubricating oil temperature and separator reducer output shaft temperature. The output data is in a fault state, including no fault: 0. coal plugging: 1. coal breaking: 2. abnormal vibration: 3. internal ignition: 4. the inlet wind pressure is increased: 5. abnormal discharge of pebble coal: 6. as shown in table 1:
Figure BDA0003899869470000171
the input data of the operation data form a matrix of
Figure BDA0003899869470000181
The output data being in a fault state
Figure BDA0003899869470000182
Where n is the size of the data sample size and m is the number of input parameters.
And S12, preprocessing the acquired data, wherein the preprocessing comprises data denoising and data standardization.
Denoising data: a Hankel filtering algorithm is adopted to carry out denoising processing on a noise signal in original data, and the method comprises the following specific steps:
(1) Let a one-dimensional signal x i =(x 1i ,x 2i ,…,x ni )=X i T I =1,2, \ 8230;, m, resulting in a Hankel matrix H i
Figure BDA0003899869470000183
Matrix H i I.e. a Hankel matrix, with the same elements on each anti-diagonal, H i Each column or each row element of (a) can obtain the original one-dimensional signal x by cyclic shift i
(2) For matrix H i Performing Singular Value Decomposition (SVD):
H i =U ii V i T =∑ n j=1 σ ij H ij
Figure BDA0003899869470000191
wherein, U i And V i Are unitary matrices of n × n, satisfying U i T U i =U i U i T =I,V i T V i =V i V i T I, I is the identity matrix. Sigma i =[σ i1i2 ,…,σ in ]As a Hankel matrix H i N singular values of and satisfies sigma i1i2 >…>σ in 。H ij For the j-th singular value σ ij And reconstructing a matrix.
(3) Reconstructing post-filter matrix H i *
Because the correlation of the noise signal in the row and column directions is weak, the corresponding singular value is small, therefore, the first r singular values are intercepted to reconstruct and remove the noise, and a new filtered matrix H is obtained i *
Figure BDA0003899869470000192
(4) Reconstructing post-filter Hankel matrix
Figure BDA0003899869470000193
Will matrix H i * Elements on the anti-diagonal line are summed and averaged to obtain a new Hankel matrix
Figure BDA0003899869470000194
Figure BDA0003899869470000201
Wherein the content of the first and second substances,
Figure BDA0003899869470000202
calculating by the formula:
Figure BDA0003899869470000203
to obtain new
Figure BDA0003899869470000204
Namely, the new data after the original data is denoised. Denoising each one-dimensional signal to finally obtain a denoised input data matrix
Figure BDA0003899869470000205
Comprises the following steps:
Figure BDA0003899869470000206
data normalization:
denoising the original input data by adopting a z-score method
Figure BDA0003899869470000207
The normalization processing is carried out, and the calculation formulas are respectively as follows:
Figure BDA0003899869470000211
in the formula (I), the compound is shown in the specification,
Figure BDA0003899869470000212
i is more than or equal to 1 and less than or equal to n, j is more than or equal to 1 and less than or equal to m,
Figure BDA0003899869470000213
for denoised data, mu j For de-noised data
Figure BDA0003899869470000214
Mean value of (a) j For de-noised data
Figure BDA0003899869470000215
Standard deviation of (d).
Obtaining a data matrix R after data standardization as follows:
Figure BDA0003899869470000216
Figure BDA0003899869470000217
s13, randomly dividing the preprocessed data matrix R into training data (Train) and Test data (Test), wherein the data quantity ratio between the training data and the Test data is 8:
Figure BDA0003899869470000218
Figure BDA0003899869470000219
subdividing the training data equally into K-fold (R) train_i I =1,2,3, \ 8230; K), where K-1 is the training set (R) part_train_i I =1,2,3, \8230k), 1 fold as validation set (R) part_valid_i I =1,2,3, \ 8230; K), and K groups of training sets and validation sets are generated with each foldover as a validation set, specifically:
R train_i is R train I =1,2, \ 8230;, K:
Figure BDA0003899869470000221
training set R part_train_i Respectively as follows:
Figure BDA0003899869470000222
Figure BDA0003899869470000223
Figure BDA0003899869470000231
verification set R part_valid_i Comprises the following steps:
Figure BDA0003899869470000232
s2, setting multiple algorithms to respectively construct training models, and respectively performing hyper-parameter optimization training on each training model according to a training set to obtain multiple first-layer prediction models; in the embodiment, the first-layer prediction model is obtained by training an improved random forest algorithm, a Catboost algorithm and a deep neural network algorithm respectively. Expressed by algorithm, the algorithm is one of RFC \ CatB \ DNN, and the description is replaced by English letters. By fusing three different model algorithms, the precision of the fused model is greatly improved, and the knowledge learned by each model is transferred to the simple classifier after fusion.
The process of carrying out the hyperparametric optimization training on each algorithm training model is a general training process, and specifically comprises the following steps:
method for carrying out hyper-parameter optimization by adopting cross validation and grid search method
(1) Before the initial search, setting required optimization super parameters and optimization ranges according to algorithm, randomly selecting super parameter combinations in the optimization ranges, and setting the search times N of a grid search method;
(2) In the nth search (N is more than or equal to 1 and less than or equal to N), algorithm algorithmm, hyper-parameter combination and training set R are utilized part_train_i (i =1,2,3, \8230; K) training resulted in the nth initial model, and R was paired with the nth initial model part_valid_i Is
Figure BDA0003899869470000241
Outputting K nth initial prediction output values Y train_i_n Calculating K nth initial prediction output values Y train_i_n And verification set Y train_i The MacroF1 value of (a) is calculated as follows:
Precision j =TP j /(TP j +FP j ),
Recall j =TP j /(TP j +FN j ),
Precision marco =∑ L j=1 Precision j /|L|,
Recall marco =∑ L j=1 Recall j /|L|,
MacroF1=2*Precision marco *Recall marco /(Precision marco +Recall marco );
and decomposing the multi-classification problem into a plurality of two-classification problems, and calculating various two-classification indexes by taking one class as a positive class and the other classes as negative classes every time. In the formula, TP j When the jth class is a positive class, the number of samples which are actually the positive class and are predicted to be the positive class; FP j The number of samples which are actually negative but predicted to be positive when the jth class is positive; FN (FN) j The number of samples which are actually positive class but predicted to be negative class when the jth class is positive class; l is the number of categories; precision j The model accuracy when the jth class is a positive class; recall j Model recall rate when category j is positive; precision marco The average accuracy of the model; recall marco Average recall for model; macroF1 is a harmonic mean of model accuracy and recall. The larger the MacroF1 is, the higher the model prediction accuracy is, and the better the model performance is.
(3) K nth initial prediction output values Y are obtained through calculation train_i_n And Y train_i The MacroF1 value of (A) is averaged to obtain MacroF1 mean_n
(4) N = N +1, randomly selecting a new hyper-parameter value in the hyper-parameter optimizing range, continuing searching, repeating the steps (2) and (3) until N > N, and entering the next step.
(5) Obtaining N MacroFs 1 after completing N times of searching mean_n From N MacroF1 mean_n Finding the maximum value in the above step, and finding the maximum value MacroF1 mean_n And the corresponding hyper-parameter combination is used as the optimal hyper-parameter combination Param _ best _ algorithms.
(6) Setting the final hyper-parameter of the algorithm as the optimal hyper-parameter combination Param _ best _ algorithm, aiming at K training sets R part_train_i Training to obtain K first-layer prediction models Model _ algorithmm i
(7) Using a first layer prediction Model _ algorithmm i Centralize verification
Figure BDA0003899869470000242
Predicting to obtain K sub-prediction values Y validpredict_model_algorithm_i Simultaneously adopting a first layer prediction Model _ algorithm i For in the test data
Figure BDA0003899869470000243
Predicting to obtain a sub-test value Y test_model_algorithm_i
(8) Predicting K sub-prediction values Y validpredict_model_algorithm_i Merging to obtain the predicted value Y of the corresponding algorithm validpredict_model_algorithm
Figure BDA0003899869470000251
In this embodiment, different weights are assigned to the K first-layer prediction models by using the difference degree between the predicted value and the true value, and the smaller the macro f1 is, the lower the weight is, which has the beneficial effects of increasing the weight of the high macro f1, reducing the weight of the low macro f1, and improving the prediction accuracy.
The weight calculation formula is:
W i =(MacroF1) i /∑ i (MacroF1) i
obtaining a test value Y test_model_algorithm Comprises the following steps:
Y test_model_algorithm =∑ K i=1 W i *Y test_model_algorithm_i
as shown in fig. 2, the specific process of step S2 includes:
training improved random forest algorithm model
Constructing a training model according to an improved random forest algorithm, wherein the process comprises the following steps:
s211, adopting a training set and super-parameter acquisitionUsing the value of random forest algorithm and setting the initial set Q of features 0 Number of initialized decision trees T 0 Initializing random forest omega 0
S212, calculating the global weight W of each feature, and sequencing the features from big to small, wherein the calculation process of the global weight W is as follows:
a1. calculating information entropy IE:
IE=-∑P(l)logP(l),
wherein l is the class, P (l) is the class probability;
b1. dividing the node m by the characteristic k, calculating the information gain RE (m, k) of the node division according to IE and the conditional information entropy of the left node and the right node of the division, and further calculating the local weight W of the characteristic k of the decision tree psi Ψ (k):W Ψ (k)=∑ M m=1 RE(m,k)/(M-1),
Wherein M is the number of all nodes in the decision tree, and psi represents a single decision tree;
c1. let e Ψ For the out-of-set error value of the decision tree Ψ, let 1/e Ψ Normalizing the weight of the error value outside the cover of each decision tree:
W Ψ =(1/e Ψ -min Ψ (1/e Ψ ))/(max Ψ (1/e Ψ )-min Ψ (1/e Ψ )),
d1. calculating a global weight W (k) for feature k:
W(k)=∑ Ψ W Ψ (k)W Ψ /max kΨ W Ψ (k)W Ψ
s213. Order
Figure BDA0003899869470000261
|Q 0 I represents the set Q 0 Number of elements of (2), Q 0 V before middle row 0 Put the features of (a) into a set CV;
s214, order I 0 =|Q 0 |-V 0 And will leave I 0 The number of features is put into a set CI;
s215, initializing n =0;
s216, setting r as the number of the selected feature subsets during node division;
s217. When I n R, the following procedure is performed:
a2. compute CI n The mean value μ and standard deviation σ of the global weights W (k) of all features in (a), and setting a threshold value t _ value = μ - σ; b2. comparative CI n The size of the global weight W (k) and t _ value of all features in (1), if W (k) < t _ value, the feature is selected from CI n Is moved out and put into the set S n The preparation method comprises the following steps of (1) performing;
c2. simultaneous comparison of CI n Global weights W (k) and CV for all features in n Minimum global weight min of middle feature CVn (W (k')), if W (k) > min CVn (W (k')), the feature is taken from CI n Move into set Z n The preparation method comprises the following steps of (1) performing;
d2. will S n From Q n In (1) removing, Q n+1 =Q n -S n
e2. Will Z n Move into CV n In, CV n+1 =CV n +Z n ,CI n+1 =CI n -S n -Z n
f2 order V n+1 =|CV n+1 |,I n+1 =|CV n+1 L, calculate Δ V = V n+1 -V n ,ΔI=I n+1 -I n
g2. Let p be the probability that there is at least one significant feature in the decision tree for node splitting, q be the probability that none of the significant features in all nodes split,
p=1-q=(r/(V+I))/(r/I);
h2. calculating the partial derivatives P of P pairs V and I V And P I
P V =-(Δq/ΔV) T =(V!(V+I-1-r)!r)/((V-r)!(V+I)!),
P I =-(Δq/ΔI) T =((I-1)!(V+I-1-r)!Vr)/((I-r)!(V+I-1)!(V+I));
i2. Calculating the correlation degree between different decision trees:
ρ=1-(r/(V+I))/(r/(V+I-r));
j2. calculate Δ T and round up:
|ΔT|=|ρ*(P V ΔV+P I ΔI)/I-1|;
k2. let T n+1 =T n +ΔT;
l2. Using T n+1 Decision tree and feature set Q n+1 Establishing random forest omega n+1
m2. Calculate global weight W and pair Q n+1 Ranking the features in (1);
n. iteration n, not satisfying I n If r is greater than r, ending the iteration process;
s218, outputting the number T of decision trees at the end of iteration end And a feature set Q end
S219, setting hyper-parameters to be optimized and the range thereof:
maximum depth of tree max _ depth: [3,16] step size 1; feature sampling ratio max _ features: [0.5,1.0] with a step size of 0.1; leaf node minimum sample number min _ samples _ leaf: [1,5] with a step size of 1; node partitioning minimum sample number min _ samples _ split: [2,10], step size 1.
The traditional random forest algorithm generates a random forest model through preset fixed features, the number of decision trees and the like, and actually feature importance and the optimal number have important significance on the performance of the model. The scheme adopts an improved random forest algorithm, selects a mode and a conventional hyper-parameter optimization method from the traditional characteristics in the region, removes unimportant characteristics by adopting an iterative correction mode, and can gradually correct the number of decision trees to gradually establish an optimal random forest model. The improved random forest algorithm has excellent robustness and accuracy, and the effect is superior to that of the traditional random forest algorithm.
After a training model is constructed according to an improved random forest algorithm, a super-parameter optimization training process is adopted to obtain an optimal super-parameter combination:
Param_best_RFC==[T end ,max_depth best ,max_features best ,min_samples_leaf best ,min_samples_split best ]obtaining K improved random forest first layer prediction models Model _ RFC i (i =1,2,3, \ 8230;, K), and improving the random forest algorithm prediction value Y validpredict_model_RFC And a test value Y test_model_RFC
Training Catboost model
A training model is constructed by adopting a Catboost algorithm, and the process is as follows:
setting the over-parameters and the range thereof to be optimized:
number of decision trees iterations: [50,500], step size 1; learning rate learning _ rate: [0.01,0.1], step size 0.01; regularization coefficient l2_ leaf _ reg: [1,5], step size 0.5; depth of tree: [3,10] step size of 1; sample sampling ratio subsample: [0.5,1.0] with a step size of 0.1; column sampling ratio rsm: [0.5,1.0], step size 0.1.
And (3) obtaining an optimal hyper-parameter combination by adopting a hyper-parameter optimization training process:
Param_best_CATB==[iterations best ,learning_rate best ,l2_leaf_reg best ,subsample best ,rsm best ]obtaining K first-layer prediction models Model _ CATB of Catboost i (i =1,2,3, \8230;, K), and the Catboost algorithm predicted value Y validpredict_model_CATB And a test value Y test_model_CATB
Training deep neural network model
The deep neural network algorithm is adopted to construct a training model, and the process is as follows:
the hidden layer is set to be 2 layers, and the hyper-parameters and the range thereof to be optimized are set:
batch size batch _ size: [32,64,128]; the first hidden layer neuron number, first _ hidden _ layer: [7,40] step size 1; the second hidden layer neuron number second _ hidden _ layer: [7,20] step size 1; optimizer: [ 'Adam', 'SGD', 'LBFGS', 'Rprop' ].
And (3) obtaining an optimal hyper-parameter combination by adopting a hyper-parameter optimization training process:
Param_best_DNN==[batch_size best ,
first_hidden_layer best ,second_hidden_layer best ,optimizer best ]obtaining K first-layer prediction models Model _ DNN of the deep neural network i (i=1,2,3,…K), and a deep neural network algorithm predicted value Y validpredict_model_DNN And a test value Y test_model_DNN
S3, outputting prediction data according to the first layer prediction model, and performing combined reconstruction on the prediction data to obtain second layer training data; the specific process comprises the following steps:
reconstructing and combining predicted values and test values obtained by predicting the three algorithm first-layer prediction models, and respectively combining the predicted values and the test values to obtain a second-layer training data comprising a second-layer training set and a second-layer test set, wherein the second-layer training set is as follows: r is train_s2 =[Y validpredict_model_RFC ,Y validpredict_model_CATB ,Y validpredict_model_DNN ,Y train ],
Wherein, Y validpredict_model_RFC ,Y validpredict_model_CATB And Y validpredict_model_DNN Inputting data for the second training set, Y train Outputting data for the second layer training set by using Y in the training data train
The second layer test set is:
R test_s2 =[Y test_model_RFC ,Y test_model_CATB ,Y test_model_DNN ,Y test ],
wherein, Y test_model_RFC ,Y test_model_CATB And Y test_model_DNN Input data for the second level test set, Y test Outputting data for the second layer test set by using Y in the test data test
S4, constructing a second-layer training model, and performing regression training according to second-layer training data to obtain a second-layer prediction model; the method specifically comprises the following steps:
will Y validpredict_model_RFC ,Y validpredict_model_CATB And Y validpredict_model_DNN And as input, a second-layer prediction model is constructed by adopting a multi-classification logistic regression model, the loss function is a cross entropy loss function, and the multi-classification logistic regression model is trained.
The cross entropy loss function is:
LOSS=-1/n∑ iL-1 c=0 y ic log(p ic ),
where n is the number of samples in the second training set, L is the number of classes, which in this example is 7,y ic Is a symbolic function, if the class Y of the sample i train_i Is equal to c, then y ic Taking 1, otherwise, taking 0; p is a radical of ic For the predicted probability that the sample i belongs to the category c, softmax is used for calculation, the probability of each category is output by softmax regression, and the probability that the fault of the sample i belongs to the c-th (c =0,1,2,3,4,5, 6) category is calculated as:
Figure BDA0003899869470000291
where θ is a parameter matrix of a multi-class logistic regression function, x i Is input as Y validpredict_model_RFC ,Y validpredict_model_CATB And Y validpredict_model_DNN
Training according to a training target of the minimum cross entropy loss function to obtain an optimal multi-classification Logistic regression Model _ Logistic, wherein the optimal parameter matrix is theta best
By R test_s2 Testing the effect of the model, and calculating the second layer test set R test_s2 The MacroF1 value of (a), ensures that the value is within a preset range.
And S5, collecting the operation data of the coal mill to carry out fault diagnosis. The specific process comprises the following steps:
collecting actual data samples;
inputting the samples into a plurality of first-layer prediction models of each algorithm, predicting to obtain a plurality of sub-prediction values, and obtaining a comprehensive prediction value through weighted calculation;
and combining the comprehensive predicted values of the algorithms to form second-layer prediction model input data, inputting the second-layer prediction model input data into the second-layer prediction model, and outputting the class with the maximum probability, namely the fault type. The method specifically comprises the following steps:
collecting actual data sample X new =(x 1 ,x 2 ,…,x m );
Denoising and labeling through Hankel filteringAfter normalization, becomes X new ′=(x′ 1 ,x′ 2 ,…,x′ m );
Mixing X new ' input into each algorithm first prediction model:
passing Model _ RFC i (i =1,2,3, \8230;, K) prediction results in K sub-prediction values Y _ RFC i And obtaining a comprehensive predicted value Y _ RFC through weighted calculation:
Y_RFC=∑ K i=1 w RFC_i *Y_RFC i
through Model _ CATB i Predicting to obtain K sub-prediction values Y _ CATB i And obtaining a comprehensive predicted value Y _ CATB through weighting calculation: y _ CATB = ∑ Σ K i=1 w CATB_i *Y_CATB i
Through Model _ DNN i Predicting to obtain K sub-prediction values Y _ DNN i And obtaining a comprehensive prediction value Y _ DNN through weighting calculation: y _ DNN = ∑ Σ K i=1 w DNN_i *Y_DNN i
The input data for constructing the second layer prediction model are as follows:
X second =[Y_RFC,Y_CATB,Y_DNN]is mixing X second And inputting the type of the second layer prediction Model _ Logistic to output the type with the highest probability, namely the fault type, so as to realize fault diagnosis.
S6, early warning is carried out according to fault diagnosis, and the process comprises the following steps:
setting a probability difference threshold Δ P min With a time threshold Δ t max When the predicted state of the second layer prediction model is no fault, the difference between the maximum probability (no fault) and the sub-maximum probability (fault i) is set as delta P 0-i
ΔP 0-i =P 0_first -P i_second Wherein, P 0_first To predict the probability value of the state being fault-free, P i_second The state probability value of the fault i with the occurrence probability of the second largest is obtained; at Δ t max Within interval Δ P 0-i ≤ΔP min And when the fault is always established, providing early warning for the occurrence of the fault i, and feeding back in time so as to remove the fault risk.
The present embodiment will be described below with reference to specific examples.
S1, collecting historical operating data of the coal mill, wherein the historical operating data comprises fault-free data and fault data, marking fault category labels on the fault data, and setting the original sample size of the data to 10000. The details of the collected operational data and fault conditions are shown in table 1 above.
The operation data is input data, 19 input variables are totally, the output data is in a fault state, and 7 types are totally.
The collected data are:
Figure BDA0003899869470000301
Figure BDA0003899869470000311
wherein, X i Represents a group of input parameter data, 19 groups are provided, each group of parameters has 10000 data, and X is a 10000 multiplied by 19 matrix; y represents fault status data and is a 10000 × 1 matrix.
S12, preprocessing the acquired data, wherein the preprocessing comprises denoising the original data by using a Hankel filtering algorithm and standardizing the denoised data by using a z-score method.
The data matrix R obtained after denoising and standardization is as follows:
Figure BDA0003899869470000312
Figure BDA0003899869470000313
s13, randomly dividing the preprocessed data matrix R into training data (Train) and Test data (Test), wherein the data quantity ratio between the training data and the Test data is 4:
Figure BDA0003899869470000314
Figure BDA0003899869470000315
R train is a 8000X 20 matrix, R test A 2000 x 20 matrix.
The training set is equally subdivided into 5-fold (Rrain _ i, i =1,2,3,4, 5), where 4-fold is taken as the sub-training set (R) part_train_i I =1,2,3,4,5, 1 fold as the sub-validation set (R) part_valid_i I =1,2,3,4, 5), and generates 5 sets of training sets and validation sets by taking each fold as a validation set, specifically:
Figure BDA0003899869470000321
Figure BDA0003899869470000322
Figure BDA0003899869470000323
Figure BDA0003899869470000324
……
Figure BDA0003899869470000331
Figure BDA0003899869470000332
R train_i is a 1600 × 20 matrix, R part_train_i Is a matrix of 6400 × 20, R part_valid_i A 1600 x 20 matrix.
S2, setting multiple algorithms to respectively construct training models, and respectively performing hyper-parameter optimization training on each training model according to a training set to obtain multiple first-layer prediction models; in the embodiment, the first-layer prediction model is obtained by training an improved random forest algorithm, a Catboost algorithm and a deep neural network algorithm respectively.
The process of carrying out the hyperparametric optimization training on each algorithm training model is a general training process, and specifically comprises the following steps:
before the initial search, setting required optimization super parameters and optimization ranges according to an algorithm, randomly selecting super parameter combinations in the optimization ranges, and setting the search times N of a grid search method;
in the nth search (N is more than or equal to 1 and less than or equal to N), algorithm algorithmm, hyper-parameter combination and training set R are utilized part_train_i (i =1,2,3, \8230K) training to obtain an initial model, and using the initial model to R part_valid_i Is/are as follows
Figure BDA0003899869470000333
Outputting K initial predicted output values Y train_i_n
Wherein the training of the initial model adopts a cross validation mode aiming at R part_train_i And 5-fold cross validation is carried out, the training set is divided into 5 parts, 4 parts are taken as training data, the rest 1 part is taken as test data, 5 times of sample circulation of 5 parts are repeated, and each 1 part of data is taken as 1 time of test data. And (3) carrying out hyperparametric optimization by adopting a grid search method, wherein the optimized objective function is the average MacroF1 value of the test set of the minimum 5-fold cross validation.
The grid search method is realized by GridSearchCV of a Sklearn library, the iteration number of the grid search method is set to be 200, and when the iteration is completed, an optimal hyper-parameter combination is output according to an optimization objective function of a test set average MacroF1 value of 5-fold cross validation maximization, so that an optimal prediction model, namely a first prediction model, is obtained through training.
Repeating the operation to obtain 5 first layer prediction models and corresponding Y validpredict_model_algorithm_i (matrix size 1600X 1) and Y test_model_algorithm_i (matrix)Size 2000 × 1), and 5Y s validpredict_model_algorithm_i Column combination is carried out to obtain a corresponding predicted value Y of the algorithm validpredict_model_algorithm (matrix size 8000X 1) for 5Y simultaneously test_model_algorithm_i Carrying out weight distribution and summation to obtain a test value Y test_model_algorithm (the matrix size is 2000 × 1).
Training improved random forest algorithm model
Constructing a training model according to an improved random forest algorithm, and setting the hyper-parameters to be optimized and the range thereof:
maximum depth of tree max _ depth: [3,16] step size of 1; feature sampling ratio max _ features: [0.5,1.0], step size 0.1; leaf node minimum sample number min _ samples _ leaf: [1,5] with a step size of 1; node partitioning minimum sample number min _ samples _ split: [2,10], step size 1.
And (3) obtaining an optimal hyper-parameter combination by adopting a hyper-parameter optimization training process:
Param_best_RFC==[T end ,max_depth best ,max_features best ,min_samples_leaf best ,min_samples_split best ]obtaining 5 improved random forest first layer prediction models Model _ RFC i (i =1,2,3, \ 8230;, 5), and improving the random forest algorithm prediction value Y validpredict_model_RFC And a test value Y test_model_RFC
Training the Catboost model
A training model is constructed by adopting a Catboost algorithm, and the process is as follows:
setting the over-parameters and the range thereof to be optimized:
decision tree quantities iterations: [50,500] step size 1; learning rate learning _ rate: [0.01,0.1], step size 0.01; regularization coefficient l2_ leaf _ reg: [1,5], step size 0.5; depth of tree: [3,10] step size 1; sample sampling ratio subsample: [0.5,1.0], step size 0.1; column sampling ratio rsm: [0.5,1.0], step size 0.1.
And (3) obtaining an optimal hyper-parameter combination by adopting a hyper-parameter optimization training process:
Param_best_CATB==[iterations best ,learning_rate best ,l2_leaf_reg best ,subsample best ,rsm best ]obtaining 5 first-layer prediction models Model _ CATB of Catboost i (i =1,2,3, \8230;, 5), and the Catboost algorithm predicted value Y validpredict_model_CATB And a test value Y test_model_CATB
Training deep neural network model
The deep neural network algorithm is adopted to construct a training model, and the process is as follows:
the hidden layer is set to be 2 layers, and the hyper-parameters and the range thereof to be optimized are set:
batch size batch _ size: [32,64,128]; the first hidden layer neuron number, first _ hidden _ layer: [7,40] step size 1; second hidden layer neuron number second _ hidden _ layer: [7,20] with a step size of 1; optimizer: [ 'Adam', 'SGD', 'LBFGS', 'Rprop' ].
And (3) obtaining an optimal hyper-parameter combination by adopting a hyper-parameter optimization training process:
Param_best_DNN==[batch_size best ,
first_hidden_layer best ,second_hidden_layer best ,optimizer best ]obtaining 5 first-layer prediction models Model _ DNN of the deep neural network i (i =1,2,3, \ 8230;, 5), and the deep neural network algorithm predicted value Y validpredict_model_DNN And a test value Y test_model_DNN
S3, outputting prediction data according to the first layer of prediction model, and performing combined reconstruction on the prediction data to obtain second layer of training data; the specific process comprises the following steps:
reconstructing and combining predicted values and test values obtained by predicting the three algorithm first-layer prediction models, and respectively combining the predicted values and the test values to obtain second-layer training data, wherein the second-layer training data comprises a second-layer training set and a second-layer test set, and the second-layer training set is as follows: r is train_s2 =[Y validpredict_model_RFC ,Y validpredict_model_CATB ,Y validpredict_model_DNN ,Y train ],
Wherein R is train_s2 A 8000 x 4 matrix.
The second layer test set is:
R test_s2 =[Y test_model_RFC ,Y test_model_CATB ,Y test_model_DNN ,Y test ],
wherein R is test_s2 A 2000 x 4 matrix.
S4, adding Y validpredict_model_RFC ,Y validpredict_model_CATB And Y validpredict_model_DNN Using a multi-classification Logistic regression Model as an input to construct a second layer prediction Model, using a second layer test set for feeding back Model hyper-parameters, and training to obtain an optimal multi-classification Logistic regression Model _ Logistic by using the principle of minimizing cross entropy loss function, wherein an optimal parameter matrix is theta best . The cross entropy loss function is:
LOSS=-1/8000∑ i6 c=0 y ic log(p ic ),
using softmax regression to output the probability of each class, the probability that the failure of sample i belongs to the c-th (c =0,1,2,3,4,5,6) class is calculated as:
Figure BDA0003899869470000351
and finally, the category with the highest output probability is the diagnosed fault type.
S6, early warning is carried out according to fault diagnosis, and the process comprises the following steps:
setting a probability difference threshold Δ P min =0.05 and time threshold Δ t max =30min, and when the state predicted by the second layer prediction model is no fault, the difference between the maximum probability (no fault) and the sub-maximum probability (fault i) is Δ P 0-i
ΔP 0-i =P 0_first -P i_second Wherein P is 0_first To predict the probability value of a state being fault-free, P i_second The state probability value of the fault i with the occurrence probability of the second largest is obtained; the fault state of the coal mill is predicted once in 5 minutes, and the delta P is continuously predicted 7 times in 30 minutes 0-i ≤ΔP min When =0.05 is always established,and providing early warning of occurrence of the fault i, and feeding back in time so as to remove fault risks.
Example 2:
the coal mill fault diagnosis system based on multi-model fusion is specially used in the method in the embodiment 1, and the structure of the system comprises a data acquisition module, a data preprocessing module, a model training module, a model deployment module and a result output module which are sequentially connected, wherein the data acquisition module comprises: collecting historical operating data of a coal mill;
a data preprocessing module: preprocessing collected historical operating data;
a model training module: constructing a training set and a verification set of cross verification by using the preprocessed data; respectively constructing training models by adopting various algorithms, and respectively carrying out hyper-parameter optimization training on each training model according to a training set to obtain a plurality of first-layer prediction models; outputting prediction data according to the first layer prediction model, and performing combined reconstruction on the prediction data to obtain second layer training data; constructing a second layer training model, and performing regression training according to second layer training data to obtain a second layer prediction model;
a model deployment module: deploying the trained second-layer prediction model to a server side by a computer;
a result output module: and inputting the real-time coal mill operation data into the deployed second-layer prediction model to obtain the real-time state of the coal mill.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (8)

1. A coal mill fault diagnosis method based on multi-model fusion is characterized in that: the method comprises the following steps:
s1, collecting historical operating data of a coal mill, and constructing a cross validation training set and a validation set;
s2, setting multiple algorithms to respectively construct training models, and respectively carrying out super-parameter optimization training on each training model according to a training set to obtain multiple first-layer prediction models;
s3, outputting prediction data according to the first layer prediction model, and performing combined reconstruction on the prediction data to obtain second layer training data;
s4, constructing a second-layer training model, and performing regression training according to second-layer training data to obtain a second-layer prediction model;
and S5, collecting the operation data of the coal mill to carry out fault diagnosis.
2. The method for diagnosing the fault of the coal mill based on the multi-model fusion as claimed in claim 1, wherein the step S2 of performing the hyper-parameter optimization training process on each training model comprises:
randomly acquiring a hyper-parameter combination of a set algorithm, training according to the set algorithm and a training set to obtain an initial model, outputting a plurality of initial prediction output values by the initial model, calculating a MacroF1 value of the initial prediction output value and a corresponding training data output value, and obtaining an average MacroF1 value;
setting the search times, repeating the steps, and selecting the maximum value from the obtained average MacroF1 values after the search is finished to obtain a corresponding hyper-parameter combination as an optimal hyper-parameter combination;
setting an optimal hyper-parameter combination as a final hyper-parameter of a set algorithm, training according to different training sets to obtain a plurality of first-layer prediction models, predicting a corresponding verification set according to the first-layer prediction models to obtain sub-prediction values, and merging the sub-prediction data to obtain each algorithm prediction value.
3. The coal mill fault diagnosis method based on multi-model fusion as claimed in claim 2, wherein the first layer prediction model is obtained by training respectively with an improved random forest algorithm, a Catboost algorithm and a deep neural network algorithm.
4. The method for diagnosing the fault of the coal mill based on the multi-model fusion as claimed in any one of claims 1 to 3, wherein the specific process of the step S4 comprises the following steps:
and (3) taking the prediction data of each algorithm as input, adopting a multi-classification logistic regression model to construct a second-layer prediction model for training, wherein the loss function is a cross entropy loss function, outputting the probability of each fault category by utilizing softmax regression, and training to obtain an optimal multi-classification logistic regression model as the second-layer prediction model according to a training target of the minimized cross entropy loss function.
5. The coal mill fault diagnosis method based on multi-model fusion as claimed in claim 4, wherein the specific process of step S5 comprises:
collecting actual data samples;
inputting a sample into a plurality of first-layer prediction models of each algorithm, predicting to obtain a plurality of sub-prediction values, and obtaining a comprehensive prediction value through weighted calculation;
and combining the comprehensive predicted values of the algorithms to form second-layer prediction model input data, inputting the second-layer prediction model input data into the second-layer prediction model, and outputting the class with the maximum probability, namely the fault type.
6. The method for diagnosing the fault of the coal mill based on the multi-model fusion as claimed in claim 1, wherein the specific process of constructing the training set and the validation set of the cross validation in the step S1 comprises:
randomly dividing data into training data and verification data according to a proportion;
equally dividing training data into K folds, taking K-1 fold as a training set, taking 1 fold as a verification set, and taking each fold as a verification set to generate K groups of training sets and verification sets.
7. The method for diagnosing the fault of the coal mill based on the multi-model fusion as claimed in claim 1 or 5, wherein the method further comprises preprocessing the data after the data acquisition, and the preprocessing comprises data de-noising and data standardization.
8. A coal mill fault diagnosis system based on multi-model fusion, applied to the method of any one of claims 1 to 8, characterized in that: comprises the steps of (a) preparing a mixture of a plurality of raw materials,
a data acquisition module: collecting historical operating data of a coal mill;
a data preprocessing module: preprocessing collected historical operating data;
a model training module: constructing a training set and a verification set of cross verification by utilizing the preprocessed data; respectively constructing training models by adopting various algorithms, and respectively carrying out hyper-parameter optimization training on each training model according to a training set to obtain a plurality of first-layer prediction models;
outputting prediction data according to the first layer prediction model, and performing combined reconstruction on the prediction data to obtain second layer training data; constructing a second layer training model, and performing regression training according to second layer training data to obtain a second layer prediction model;
a model deployment module: deploying the trained second-layer prediction model to a server side by a computer;
a result output module: and inputting the real-time coal mill operation data into the deployed second-layer prediction model to obtain the real-time state of the coal mill.
CN202211287127.7A 2022-10-20 2022-10-20 Coal mill fault diagnosis method and system based on multi-model fusion Pending CN115719033A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211287127.7A CN115719033A (en) 2022-10-20 2022-10-20 Coal mill fault diagnosis method and system based on multi-model fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211287127.7A CN115719033A (en) 2022-10-20 2022-10-20 Coal mill fault diagnosis method and system based on multi-model fusion

Publications (1)

Publication Number Publication Date
CN115719033A true CN115719033A (en) 2023-02-28

Family

ID=85254216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211287127.7A Pending CN115719033A (en) 2022-10-20 2022-10-20 Coal mill fault diagnosis method and system based on multi-model fusion

Country Status (1)

Country Link
CN (1) CN115719033A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117168608A (en) * 2023-11-02 2023-12-05 默拓(江苏)电气驱动技术有限公司 Operation early warning method and system of brushless motor

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117168608A (en) * 2023-11-02 2023-12-05 默拓(江苏)电气驱动技术有限公司 Operation early warning method and system of brushless motor
CN117168608B (en) * 2023-11-02 2024-03-08 默拓(江苏)电气驱动技术有限公司 Operation early warning method and system of brushless motor

Similar Documents

Publication Publication Date Title
Saufi et al. Gearbox fault diagnosis using a deep learning model with limited data sample
CN111237134B (en) Offshore double-fed wind driven generator fault diagnosis method based on GRA-LSTM-stacking model
CN111722145B (en) Synchronous motor excitation winding turn-to-turn short circuit mild fault diagnosis method
Xia et al. Residual-hypergraph convolution network: A model-based and data-driven integrated approach for fault diagnosis in complex equipment
Lee et al. Unsupervised anomaly detection of the gas turbine operation via convolutional auto-encoder
CN115719033A (en) Coal mill fault diagnosis method and system based on multi-model fusion
Wang et al. Deep forest based multivariate classification for diagnostic health monitoring
CN113609569A (en) Discriminant generalized zero-sample learning fault diagnosis method
Yu et al. Meticulous process monitoring with multiscale convolutional feature extraction
Saufi et al. Machinery fault diagnosis based on a modified hybrid deep sparse autoencoder using a raw vibration time-series signal
CN112163474B (en) Intelligent gearbox diagnosis method based on model fusion
Liu et al. Feature-level SMOTE: Augmenting fault samples in learnable feature space for imbalanced fault diagnosis of gas turbines
Zhao et al. A capsnet-based fault diagnosis method for a digital twin of a wind turbine gearbox
Oliveira-Santos et al. Combining classifiers with decision templates for automatic fault diagnosis of electrical submersible pumps
Zhang et al. Integration of cuckoo search and fuzzy support vector machine for intelligent diagnosis of production process quality
CN111598222B (en) Re-optimized depth automatic encoder and engine automatic detection system
Hu et al. Fault diagnosis based on multi-scale redefined dimensionless indicators and density peak clustering with geodesic distances
Shan et al. Semisupervised fault diagnosis of gearbox using weighted graph-based label propagation and virtual adversarial training
Montazeri-Gh et al. A novel approach to gas turbine fault diagnosis based on learning of fault characteristic maps using hybrid residual compensation extreme learning machine-growing neural gas model
CN116821811A (en) Coal mill unit fault diagnosis method and system based on multi-layer graph convolution neural network
Lu et al. Abnormal Condition Detection Method of Industrial Processes Based on the Cascaded Bagging-PCA and CNN Classification Network
Luo et al. Recognition and labeling of faults in wind turbines with a density-based clustering algorithm
CN112598057B (en) Coal-fired power plant boiler fault diagnosis method based on FEKNN strategy
CN114252266A (en) Rolling bearing performance degradation evaluation method based on DBN-SVDD model
CN115017978A (en) Fault classification method based on weighted probability neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination