CN115719033A

CN115719033A - Coal mill fault diagnosis method and system based on multi-model fusion

Info

Publication number: CN115719033A
Application number: CN202211287127.7A
Authority: CN
Inventors: 魏勇; 孙胡彬; 江学文; 周晓亮; 李楠; 叶君辉; 赵敏; 寿志杰; 詹港明; 卢子轩; 李锋
Original assignee: Hangzhou Jiyi Technology Co ltd
Current assignee: Hangzhou Jiyi Technology Co ltd
Priority date: 2022-10-20
Filing date: 2022-10-20
Publication date: 2023-02-28

Abstract

The invention discloses a coal mill fault diagnosis method and system based on multi-model fusion. The problem of prior art can not judge coal pulverizer trouble classification, can't in time take corresponding measure to deal with the coal pulverizer trouble is solved. The method comprises the steps of collecting, denoising and normalizing historical operating data to construct a training set and a verification set, training a first layer model through various prediction algorithms, reconstructing output data, and training to construct a second layer model as a coal mill fault diagnosis model. And a multi-model fusion method is adopted to fuse and integrate a plurality of fault prediction models, and a coal mill fault diagnosis model is established, so that the accurate diagnosis of the faults and the categories of the coal mill is realized. The established coal mill fault diagnosis model effectively prevents the over-fitting phenomenon, has excellent model generalization capability, is simple to realize, is high in calculation efficiency, and has high model correction redundancy degree. And an optimal hyper-parameter combination is found by combining a grid search method, so that the accuracy of the model is improved, and a high-performance coal mill fault diagnosis model is obtained.

Description

Coal mill fault diagnosis method and system based on multi-model fusion

Technical Field

The invention relates to the technical field of coal mill fault diagnosis, in particular to a coal mill fault diagnosis method and system based on multi-model fusion.

Background

The coal mill is an important auxiliary equipment for the combustion of the boiler of the coal-fired power plant and is a core equipment of a pulverizing system, and the running state of the coal mill directly influences the safety and the economical efficiency of the running of the boiler. The working environment of the coal mill is severe, the coal mill is in a high-load operation state for a long time, the coal quality of coal of a coal-fired power plant is complex and various, the internal structure and the working process of the coal mill are complex, and therefore faults of the coal mill occur frequently and high fault risks are faced in daily operation. Once the coal mill fails, the combustion of the boiler can be directly influenced, and even the boiler can be shut down in severe cases.

The fault types of the coal mill are various, including and not limited to abnormal vibration, internal ignition, coal blockage, coal breakage and the like, and corresponding measures need to be taken for timely intervention according to different fault risks. However, at present, the coal-fired power plant mainly carries out the conventional mode of regular inspection or data monitoring, and the fault risk is difficult to diagnose timely and effectively. Therefore, the realization of the online fault diagnosis of the coal mill is an urgent problem to be solved in a coal-fired power plant, and has important engineering application value for the safe and economic operation of a unit.

In recent years, with the development of artificial intelligence technology, a machine learning-based coal mill fault model is gradually proposed. At present, fault diagnosis for a coal mill mainly focuses on monitoring of operating parameters such as current, outlet air temperature and outlet air pressure of the coal mill, monitored real-time data are compared with data predicted by an established fault model, and when the difference value of the monitored real-time data and the data is larger than a set threshold value, fault diagnosis of the coal mill is achieved. However, the above method cannot judge the fault type, and cannot take corresponding measures to cope with the fault of the coal mill in time.

Disclosure of Invention

The invention mainly solves the problems that the prior art cannot judge the fault type of a coal mill and cannot take corresponding measures to cope with the fault of the coal mill in time, and provides a coal mill fault diagnosis method and system based on multi-model fusion.

The technical problem of the invention is mainly solved by the following technical scheme: a coal mill fault diagnosis method based on multi-model fusion is characterized by comprising the following steps: the method comprises the following steps:

s1, collecting historical operating data of a coal mill, and constructing a cross validation training set and a validation set;

s2, setting multiple algorithms to respectively construct training models, and respectively performing hyper-parameter optimization training on each training model according to a training set to obtain multiple first-layer prediction models;

s3, outputting prediction data according to the first layer prediction model, and performing combined reconstruction on the prediction data to obtain second layer training data;

s4, constructing a second-layer training model, and performing regression training according to second-layer training data to obtain a second-layer prediction model;

and S5, collecting the operation data of the coal mill to carry out fault diagnosis.

The invention adopts a multi-model fusion method to fuse and integrate a plurality of fault prediction models, establishes the fault diagnosis model of the coal mill and realizes the accurate diagnosis of the faults and the categories of the coal mill. The coal mill fault diagnosis model established by the invention can effectively prevent the over-fitting phenomenon, has excellent model generalization capability, is simple to realize and efficient in calculation, and has high model correction redundancy degree. And an optimal hyper-parameter combination is found by combining a grid search method, so that the accuracy of the model is further improved, and a high-performance coal mill fault diagnosis model is obtained.

The collected coal mill historical operating data comprises input data and output data, wherein the input data comprises and is not limited to the coal feeding amount of the coal mill, the current of the coal mill, the opening degree of a hot primary air baffle, the opening degree of a cold primary air baffle, the air temperature of an inlet of the coal mill, the air quantity of an inlet of the coal mill, the air pressure of an outlet of the coal mill, the temperature of an outlet of the coal mill, the pressure difference between the upper part and the lower part of a grinding bowl, the pressure difference between the lower part of a sealing air/grinding bowl, the rotating speed of the grinding bowl of the coal mill, the speed of a rotary separator, the speed of a rotor of the rotary separator, the current of the rotary separator, the temperature of a bearing on the rotary separator, the temperature of a lower bearing of the rotary separator, the temperature of lubricating oil of a speed reducer and the shaft temperature of an output shaft of the speed reducer of the separator.

The output data is in a fault state, including no fault, coal blockage, coal breakage, abnormal vibration, internal ignition, increased inlet air pressure and abnormal pebble coal discharge.

The input data of the operation data form a matrix of

The output data being in a fault state

Where n is the size of the data sample size and m is the number of input parameters.

As a preferred scheme, the method further comprises preprocessing the data after the data are acquired, wherein the preprocessing comprises data denoising and data normalization.

Denoising data: a Hankel filtering algorithm is adopted to carry out denoising processing on a noise signal in original data, and the method comprises the following specific steps:

(1) Let a one-dimensional signal x _i ＝(x _1i ,x _2i ,…,x _ni )＝X _i ^T I =1,2, \ 8230;, m, resulting in a Hankel matrix H _i ：

Matrix H _i I.e. a Hankel matrix, with the same elements on each anti-diagonal, H _i Can obtain the original one-dimensional signal x by cyclic shift of each column or each row element _i 。

(2) For matrix H _i Performing Singular Value Decomposition (SVD):

H _i ＝U _i ∑ _i V _i ^T ＝∑ ⁿ _j＝1 σ _ij H _ij ，

wherein, U _i And V _i Are unitary matrices of n × n, satisfying U _i ^T U _i ＝U _i U _i ^T ＝I，V _i ^T V _i ＝V _i V _i ^T I, I is the identity matrix. Sigma _i ＝[σ _i1 ,σ _i2 ,…,σ _in ]As a Hankel matrix H _i N singular values of and satisfies sigma _i1 >σ _i2 >…>σ _in 。H _ij For the j-th singular value σ _ij And (5) reconstructing a matrix.

(3) Reconstructing post-filter matrix H _i ^* ：

As the correlation of the noise signal in the row and column directions is weak and the corresponding singular value is small, the first r singular values are intercepted to reconstruct and remove the noise, and a new filtered matrix H is obtained _i ^* ：

(4) Reconstructing post-filter Hankel matrix

Will matrix H _i ^* Elements on the inverse diagonal line are summed and averaged to obtain a new Hankel matrix

Wherein, the first and the second end of the pipe are connected with each other,

calculating by the formula:

to obtain new

Namely, the new data after the original data is denoised. De-noising is carried out on each one-dimensional signal, and a de-noised input data matrix is finally obtained

Comprises the following steps:

data normalization:

denoising the original input data by adopting a z-score method

Carrying out standardization treatment, wherein the calculation formulas are respectively as follows:

in the formula (I), the compound is shown in the specification,

i is more than or equal to 1 and less than or equal to n, j is more than or equal to 1 and less than or equal to m,

for denoised data, mu _j For de-noised data

Mean value of (a) _j For de-noised data

Standard deviation of (2).

Obtaining a data matrix R after data standardization as follows:

as a preferred scheme, the specific process of constructing the cross-validation training set and the validation set in step S1 includes:

randomly dividing data into training data and verification data according to a proportion;

equally dividing training data into K folds, taking K-1 fold as a training set, taking 1 fold as a verification set, and taking each fold as a verification set to generate K groups of training sets and verification sets.

The method specifically includes the steps that a preprocessed data matrix R is randomly divided into training data (Train) and testing data (Test), the data volume ratio between the training data and the testing data is 8:

subdividing the training data equally into K-fold (R) _{train_i} I =1,2,3, \ 8230; K), where K-1 is the training set (R) _{part_train_i} I =1,2,3, \ 8230; K), 1 fold as validation set (R) _{part_valid_i} I =1,2,3, \8230Ok), and generating K groups of training sets and verification sets by taking each foldover as a verification set, specifically:

R _{train_i} is R _train I =1,2, \ 8230;, K:

training set R _{part_train_i} Respectively as follows:

verification set R _{part_valid_i} Comprises the following steps:

as a preferred scheme, an improved random forest algorithm, a Catboost algorithm and a deep neural network algorithm are respectively adopted for training to obtain a first-layer prediction model. Expressed by algorithm, the algorithm is one of RFC \ CatB \ DNN, and English letters are used for replacing description.

As a preferred scheme, the process of performing hyper-parameter optimization training on each training model in step S2 includes:

randomly acquiring a hyper-parameter combination of a set algorithm, training according to the set algorithm and a training set to obtain an initial model, outputting a plurality of initial prediction output values by the initial model, calculating a MacroF1 value of the initial prediction output value and a corresponding training data output value, and obtaining an average MacroF1 value;

setting the search times, repeating the steps, and selecting the maximum value from the obtained average MacroF1 values after the search is finished to obtain a corresponding hyper-parameter combination as an optimal hyper-parameter combination;

setting the optimal hyper-parameter combination as the final hyper-parameter of the set algorithm, training according to different training sets to obtain a plurality of first-layer prediction models, predicting the corresponding verification set according to the first-layer prediction models to obtain sub-prediction data, and merging the sub-prediction data to obtain prediction data of each algorithm.

The specific process is as follows:

(1) Before the initial search, setting required optimization super parameters and optimization ranges according to an algorithm, randomly selecting super parameter combinations in the optimization ranges, and setting the search times N of a grid search method;

(2) In the nth search (N is more than or equal to 1 and less than or equal to N), algorithm algorithmm, hyper-parameter combination and training set R are utilized _{part_train_i} (i =1,2,3, \8230; K) training resulted in the nth initial model, and R was paired with the nth initial model _{part_valid_i} Is/are as follows

Outputting K nth initial prediction output values Y _{train_i_n} Calculating K nth initial prediction output values Y _{train_i_n} And verification set Y _{train_i} The MacroF1 value of (a) is calculated as follows:

Precision _j ＝TP _j /(TP _j +FP _j )，

Recall _j ＝TP _j /(TP _j +FN _j )，

Precision _marco ＝∑ ^L _j＝1 Precision _j /|L|，

Recall _marco ＝∑ ^L _j＝1 Recall _j /|L|，

MacroF1＝2*Precision _marco *Recall _marco /(Precision _marco +Recall _marco )；

and decomposing the multi-classification problem into a plurality of two-classification problems, and calculating various two-classification indexes by taking one class as a positive class and the other classes as negative classes every time. In the formula, TP _j When the jth class is a positive class, the number of samples which are actually the positive class and are predicted to be the positive class; FP (Fabry-Perot) _j The number of samples which are actually negative but predicted to be positive when the jth class is positive; FN (FN) _j The number of samples which are actually positive class but are predicted to be negative class when the jth class is positive class; l is the number of categories; precision _j The model accuracy when the jth class is a positive class; recall _j Model recall rate when the jth class is a positive class; precision _marco The average accuracy of the model; recall _marco Average recall for the model; macroF1 is a harmonic mean of model accuracy and recall. The larger the MacroF1 is, the higher the model prediction accuracy is, and the better the model performance is.

(3) K nth initial prediction output values Y are obtained through calculation _{train_i_n} And Y _{train_i} The MacroF1 value of (A) is averaged to obtain MacroF1 _{mean_n} 。

(4) N = N +1, randomly selecting a new hyper-parameter value in the hyper-parameter optimizing range, continuing searching, repeating the steps (2) and (3) until N > N, and entering the next step.

(5) Obtaining N MacroFs 1 after completing N times of searching _{mean_n} From N MacroF1 _{mean_n} Finding the maximum value in the above step, and finding the maximum value MacroF1 _{mean_n} And the corresponding hyper-parameter combination is used as the optimal hyper-parameter combination Param _ best _ algorithms.

(6) Setting the final over-parameters of the algorithm as the optimal over-parameter combination Param _ best _ algorithm, aiming at K training sets R _{part_train_i} Training to obtain K first-layer prediction models Model _ algorithm _i 。

(7) Using a first layer prediction Model _ algorithmm _i Centralize verification

Predicting to obtain K sub-prediction values Y _{validpredict_model_algorithm_i} Meanwhile, a first layer prediction Model _ algorithm is adopted _i For in the test data

Predicting to obtain a sub-test value Y _{test_model_algorithm_i} 。

(8) Predicting K sub-prediction values Y _{validpredict_model_algorithm_i} Merging to obtain the predicted value Y of the corresponding algorithm _{validpredict_model_algorithm} ：

According to the method, different weights are distributed to the K first-layer prediction models by using the difference degree between the predicted value and the true value, and the weight is lower when the MacroF1 is smaller, so that the method has the beneficial effects of increasing the weight of the high MacroF1, reducing the weight of the low MacroF1 and improving the prediction accuracy.

The weight calculation formula is:

W _i ＝(MacroF1) _i /∑ _i (MacroF1) _i ，

obtaining a test value Y _{test_model_algorithm} Comprises the following steps:

Y _{test_model_algorithm} ＝∑ ^K _i＝1 W _i *Y _{test_model_algorithm_i} 。

as a preferred scheme, the process of constructing the training model according to the improved random forest algorithm includes:

s211, adopting a training set, adopting values of random forest algorithm for hyper-parameters, and setting an initial set Q of characteristics ₀ Number of initialized decision trees T ₀ Initializing random forest omega ₀ ；

S212, calculating the global weight W of each feature, and sequencing the features from large to small, wherein the calculation process of the global weight W is as follows:

a1. calculating information entropy IE:

IE＝-∑P(l)logP(l)，

wherein l is the class, P (l) is the class probability;

b1. dividing the node m by the characteristic k, calculating the information gain RE (m, k) of the node division according to IE and the conditional information entropy of the left node and the right node of the division, and further calculating the local weight W of the characteristic k of the decision tree psi ^Ψ (k)：W ^Ψ (k)＝∑ ^M _m＝1 RE(m,k)/(M-1)，

Wherein M is the number of all nodes in the decision tree, and psi represents a single decision tree;

c1. let e ^Ψ For the out-of-set error value of the decision tree Ψ, let 1/e ^Ψ Normalizing the weight of the error value outside the cover of each decision tree:

W ^Ψ ＝(1/e ^Ψ -min _Ψ (1/e ^Ψ ))/(max _Ψ (1/e ^Ψ )-min _Ψ (1/e ^Ψ ))，

d1. calculating a global weight W (k) for feature k:

W(k)＝∑ _Ψ W ^Ψ (k)W ^Ψ /max _k ∑ _Ψ W ^Ψ (k)W ^Ψ ；

s213. Order

|Q ₀ I represents the set Q ₀ Number of elements of (1), Q ₀ V before middle row ₀ Put the features of (a) into a set CV;

s214, order I ₀ ＝|Q ₀ |-V ₀ And will leave I ₀ The number of features is put into a set CI;

s215, initializing n =0;

s216, setting r as the number of the selected feature subsets during node division;

s217. When I _n R, the following procedure is performed:

a2. compute CI _n The mean value μ and standard deviation σ of the global weights W (k) of all features in (a), and setting a threshold t _ value = μ - σ; b2. compare CI _n The size of the global weight W (k) and t _ value of all features in (1), if W (k) < t _ value, the feature is selected from CI _n Is moved out and put into the set S _n The preparation method comprises the following steps of (1) performing;

c2. simultaneous comparison of CI _n Global weights W (k) and CV for all features in _n Minimum global weight min for mid-feature _CVn (W (k')), if W (k) > min _CVn (W (k')), the feature is taken from CI _n Move into set Z _n Performing the following steps;

d2. will S _n From Q _n In removal of, Q _n+1 ＝Q _n -S _n ；

e2. Will Z _n Move into CV _n In, CV _n+1 ＝CV _n +Z _n ，CI _n+1 ＝CI _n -S _n -Z _n ；

f2 order V _n+1 ＝|CV _n+1 |，I _n+1 ＝|CV _n+1 L, calculate Δ V = V _n+1 -V _n ，ΔI＝I _n+1 -I _n ；

g2. Let p be the probability that there is at least one significant feature in the decision tree for node splitting, q be the probability that none of the significant features in all nodes split,

p＝1-q＝(r/(V+I))/(r/I)；

h2. calculating the partial derivatives P of P pairs V and I _V And P _I ：

P _V ＝-(Δq/ΔV) _T ＝(V！(V+I-1-r)！r)/((V-r)！(V+I)！)，

P _I ＝-(Δq/ΔI) _T ＝((I-1)！(V+I-1-r)！Vr)/((I-r)！(V+I-1)！(V+I))；

i2. Calculating the correlation degree between different decision trees:

ρ＝1-(r/(V+I))/(r/(V+I-r))；

j2. calculate Δ T and round up:

|ΔT|＝|ρ*(P _V ΔV+P _I ΔI)/I-1|；

k2. let T _n+1 ＝T _n +ΔT；

l2. Using T _n+1 Decision tree and feature set Q _n+1 Establishing random forest omega _n+1 ；

m2. Calculate global weight W and pair Q _n+1 Ranking the features in (1);

n. iteration n, not satisfying I _n If r is greater than r, ending the iteration process;

s218, outputting the number T of decision trees when iteration is finished _end And a feature set Q _end ；

S219, setting the hyper-parameters to be optimized and the range thereof:

maximum depth of tree max _ depth: [3,16] step size of 1; feature sampling ratio max _ features: [0.5,1.0] with a step size of 0.1; leaf node minimum sample number min _ samples _ leaf: [1,5] with a step size of 1; node partitioning minimum sample number min _ samples _ split: [2,10] step size is 1.

The traditional random forest algorithm generates a random forest model through preset fixed features, the number of decision trees and the like, and actually feature importance and the optimal number have important significance on the performance of the model. The scheme adopts an improved random forest algorithm, selects a mode and a conventional hyper-parameter optimization method from the traditional characteristics in the region, removes unimportant characteristics by adopting an iterative correction mode, and can gradually correct the number of decision trees to gradually establish an optimal random forest model. The improved random forest algorithm has excellent robustness and accuracy, and the effect is superior to that of the traditional random forest algorithm.

After a training model is constructed according to an improved random forest algorithm, a super-parameter optimization training process is adopted to obtain an optimal super-parameter combination:

Param_best_RFC＝＝[T _end ,max_depth _best ,max_features _best ,min_samples_leaf _best ,min_samples_split _best ]obtaining K improved random forest first layer prediction models Model _ RFC _i (i =1,2,3, \ 8230;, K), and improving the random forest algorithm prediction value Y _{validpredict_model_RFC} And a test value Y _{test_model_RFC} 。

Similarly, a training model is constructed by adopting a Catboost algorithm, and the process is as follows:

setting the over-parameters and the range thereof to be optimized:

decision tree quantities iterations: [50,500] step size 1; learning rate learning _ rate: [0.01,0.1], step size 0.01; regularization coefficient l2_ leaf _ reg: [1,5], step size 0.5; depth of tree: [3,10] step size 1; sample sampling ratio subsample: [0.5,1.0] with a step size of 0.1; column sampling ratio rsm: [0.5,1.0], step size 0.1.

And (3) obtaining an optimal hyper-parameter combination by adopting a hyper-parameter optimization training process:

Param_best_CATB＝＝[iterations _best ,learning_rate _best ,l2_leaf_reg _best ,subsample _best ,rsm _best ]obtaining K first-layer prediction models Model _ CATB of Catboost _i (i =1,2,3, \8230;, K), and the Catboost algorithm predicted value Y _{validpredict_model_CATB} And a test value Y _{test_model_CATB} 。

The deep neural network algorithm is adopted to construct a training model, and the process is as follows:

the hidden layer is set to be 2 layers, and the hyper-parameters and the range thereof to be optimized are set as follows:

batch size batch _ size: [32,64,128]; the first hidden layer neuron number, first _ hidden _ layer: [7,40] step size 1; the second hidden layer neuron number second _ hidden _ layer: [7,20] step size 1; optimizer: [ 'Adam', 'SGD', 'LBFGS', 'Rprop' ].

Param_best_DNN＝＝[batch_size _best ,

first_hidden_layer _best ,second_hidden_layer _best ,optimizer _best ]obtaining K first-layer prediction models Model _ DNN of the deep neural network _i (i =1,2,3, \8230;, K), and the deep neural network algorithm predictor Y _{validpredict_model_DNN} And a test value Y _{test_model_DNN} 。

The specific process of the step S3 comprises the following steps:

reconstructing and combining predicted values and test values obtained by predicting the three algorithm first-layer prediction models, and respectively combining the predicted values and the test values to obtain a second-layer training data comprising a second-layer training set and a second-layer test set, wherein the second-layer training set is as follows: r _{train_s2} ＝[Y _{validpredict_model_RFC} ,Y _{validpredict_model_CATB} ,Y _{validpredict_model_DNN} ,Y _train ]，

Wherein Y is _{validpredict_model_RFC} ,Y _{validpredict_model_CATB} And Y _{validpredict_model_DNN} Inputting data for the second training set, Y _train Outputting data for the second layer training set by using Y in the training data _train 。

The second layer test set is:

R _{test_s2} ＝[Y _{test_model_RFC} ,Y _{test_model_CATB} ,Y _{test_model_DNN} ,Y _test ]，

wherein Y is _{test_model_RFC} ,Y _{test_model_CATB} And Y _{test_model_DNN} Input data for the second level test set, Y _test Outputting data for the second layer test set by using Y in the test data _test 。

As a preferable scheme, the specific process of step S4 includes:

and (3) taking the prediction data of each algorithm as input, adopting a multi-classification logistic regression model to construct a second-layer prediction model for training, wherein the loss function is a cross entropy loss function, outputting the probability of each fault category by utilizing softmax regression, and training to obtain an optimal multi-classification logistic regression model as a second-layer prediction model according to a training target of the minimized cross entropy loss function. The method comprises the following specific steps:

will Y _{validpredict_model_RFC} ,Y _{validpredict_model_CATB} And Y _{validpredict_model_DNN} And as input, constructing a second-layer prediction model by adopting a multi-classification logistic regression model, wherein the loss function is a cross entropy loss function, and training the multi-classification logistic regression model.

The cross entropy loss function is:

LOSS＝-1/n∑ _i ∑ ^L-1 _c＝0 y _ic log(p _ic )，

wherein n is the number of samples in the second training set, L is the number of categories, y _ic Is a symbolic function, if the class Y of the sample i _{train_i} Is equal to c, then y _ic Taking 1, otherwise, taking 0; p is a radical of _ic Calculating the prediction probability of the sample i belonging to the category c by adopting softmax, and outputting the probability of each category by utilizing softmax regression, wherein the probability of the fault of the sample i belonging to the c-th category is calculated as follows:

where θ is a parameter matrix of a multi-class logistic regression function, x _i Is input, namely Y _{validpredict_model_RFC} ,Y _{validpredict_model_CATB} And Y _{validpredict_model_DNN} 。

Training to obtain an optimal multi-classification Logistic regression Model _ Logistic according to a training target of the minimum cross entropy loss function, wherein the optimal parameter matrix is theta _best 。

As a preferable scheme, the specific process of step S5 includes:

collecting actual data samples;

inputting the samples into a plurality of first-layer prediction models of each algorithm, predicting to obtain a plurality of sub-prediction values, and obtaining a comprehensive prediction value through weighted calculation;

and combining the comprehensive predicted values of the algorithms to form second-layer prediction model input data, inputting the second-layer prediction model input data into the second-layer prediction model, and outputting the class with the maximum probability, namely the fault type. The method specifically comprises the following steps:

collecting actual data sample X _new ＝(x ₁ ,x ₂ ,…,x _m )；

De-noising and standardizing the image by Hankel filtering to obtain X _new ′＝(x′ ₁ ,x′ ₂ ,…,x′ _m )；

Mixing X _new ' input into each algorithm first prediction model:

passing through Model _ RFC _i (i =1,2,3, \8230;, K) prediction results in K sub-prediction values Y _ RFC _i And obtaining a comprehensive prediction value Y _ RFC through weighting calculation:

Y_RFC＝∑ ^K _i＝1 w _{RFC_i} *Y_RFC _i ；

through Model _ CATB _i Predicting to obtain K sub-prediction values Y _ CATB _i And obtaining a comprehensive predicted value Y _ CATB through weighting calculation: y _ CATB = ∑ Σ ^K _i＝1 w _{CATB_i} *Y_CATB _i ；

Through Model _ DNN _i Predicting to obtain K sub-prediction values Y _ DNN _i And obtaining a comprehensive predicted value Y _ DNN through weighting calculation: y _ DNN = ∑ Σ ^K _i＝1 w _{DNN_i} *Y_DNN _i 。

The input data for constructing the second layer prediction model are as follows:

X _second ＝[Y_RFC,Y_CATB,Y_DNN]，

x is to be _second And inputting the type of the second layer prediction Model _ Logistic to output the type with the highest probability, namely the fault type, so as to realize fault diagnosis.

In addition, the method also comprises a step of early warning after diagnosis, and the process comprises the following steps:

setting a probability difference threshold Δ P _min With a time threshold value Δ t _max When the predicted state of the second layer prediction model is no fault, the difference value between the maximum probability (no fault) and the next maximum probability (fault i) is made to be delta P _0-i ：

ΔP _0-i ＝P _{0_first} -P _{i_second} Wherein P is _{0_first} To predict the probability value of the state being fault-free, P _{i_second} The state probability value of the fault i with the occurrence probability of the second largest is obtained; at Δ t _max Δ P within the interval _0-i ≤ΔP _min And when the fault is always established, providing early warning for the occurrence of the fault i, and feeding back in time so as to remove the fault risk.

A coal mill fault diagnosis system based on multi-model fusion is applied to a coal mill fault diagnosis method based on multi-model fusion: comprises the steps of (a) preparing a mixture of a plurality of raw materials,

a data acquisition module: collecting historical operating data of a coal mill;

a data preprocessing module: preprocessing collected historical operating data;

a model training module: constructing a training set and a verification set of cross verification by utilizing the preprocessed data; respectively constructing training models by adopting various algorithms, and respectively carrying out hyper-parameter optimization training on each training model according to a training set to obtain a plurality of first-layer prediction models; outputting prediction data according to the first layer prediction model, and performing combined reconstruction on the prediction data to obtain second layer training data; constructing a second layer training model, and performing regression training according to second layer training data to obtain a second layer prediction model;

a model deployment module: deploying the trained second-layer prediction model to a server side by a computer;

a result output module: and inputting the real-time coal mill operation data into the deployed second-layer prediction model to obtain the real-time state of the coal mill.

Therefore, the invention has the advantages that:

1. and a multi-model fusion method is adopted to fuse and integrate a plurality of fault prediction models, a coal mill fault diagnosis model is established, and accurate diagnosis of faults and categories of the coal mill is realized.

2. The established coal mill fault diagnosis model can effectively prevent the over-fitting phenomenon, has excellent model generalization capability, is simple to realize, is high in calculation efficiency, and has high model correction redundancy degree. And an optimal hyper-parameter combination is found by combining a grid search method, so that the accuracy of the model is further improved, and a high-performance coal mill fault diagnosis model is obtained.

3. According to the predicted real-time state of the coal mill, corresponding measures can be rapidly implemented according to different fault types, and potential safety hazards and economic losses brought by faults are reduced to the maximum extent. Meanwhile, the model can realize an early warning function, and by setting a threshold value, when the difference value between the probability that the category is failure-free and the probabilities of other failure categories is smaller than the set threshold value in a continuous period, failure early warning feedback is timely carried out, and failure risk elimination is carried out.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic diagram of the algorithm first prediction model building process in the present invention.

Detailed Description

The technical scheme of the invention is further specifically described by the following embodiments and the accompanying drawings.

Example 1:

the method for diagnosing the fault of the coal mill based on the multi-model fusion comprises the following steps as shown in fig. 1:

s1, collecting historical operating data of a coal mill, and constructing a cross validation training set and a validation set; the method specifically comprises the following steps:

s11, collected coal mill historical operating data comprises input data and output data, wherein the input data comprises and is not limited to coal feeding quantity of a coal mill, coal mill current, hot primary air baffle opening, cold primary air baffle opening, coal mill inlet air temperature, coal mill inlet air quantity, coal mill inlet air pressure, coal mill outlet temperature, upper and lower grinding bowl differential pressure, sealing air/grinding bowl lower differential pressure, coal mill grinding bowl rotating speed, rotary separator rotor speed, rotary separator current, rotary separator upper bearing temperature, rotary separator lower bearing temperature, separator reducer lubricating oil temperature and separator reducer output shaft temperature. The output data is in a fault state, including no fault: 0. coal plugging: 1. coal breaking: 2. abnormal vibration: 3. internal ignition: 4. the inlet wind pressure is increased: 5. abnormal discharge of pebble coal: 6. as shown in table 1:

the input data of the operation data form a matrix of

The output data being in a fault state

And S12, preprocessing the acquired data, wherein the preprocessing comprises data denoising and data standardization.

Matrix H _i I.e. a Hankel matrix, with the same elements on each anti-diagonal, H _i Each column or each row element of (a) can obtain the original one-dimensional signal x by cyclic shift _i 。

(2) For matrix H _i Performing Singular Value Decomposition (SVD):

H _i ＝U _i ∑ _i V _i ^T ＝∑ ⁿ _j＝1 σ _ij H _ij ，

wherein, U _i And V _i Are unitary matrices of n × n, satisfying U _i ^T U _i ＝U _i U _i ^T ＝I，V _i ^T V _i ＝V _i V _i ^T I, I is the identity matrix. Sigma _i ＝[σ _i1 ,σ _i2 ,…,σ _in ]As a Hankel matrix H _i N singular values of and satisfies sigma _i1 >σ _i2 >…>σ _in 。H _ij For the j-th singular value σ _ij And reconstructing a matrix.

(3) Reconstructing post-filter matrix H _i ^* ：

Because the correlation of the noise signal in the row and column directions is weak, the corresponding singular value is small, therefore, the first r singular values are intercepted to reconstruct and remove the noise, and a new filtered matrix H is obtained _i ^* ：

(4) Reconstructing post-filter Hankel matrix

Will matrix H _i ^* Elements on the anti-diagonal line are summed and averaged to obtain a new Hankel matrix

Wherein the content of the first and second substances,

calculating by the formula:

to obtain new

Namely, the new data after the original data is denoised. Denoising each one-dimensional signal to finally obtain a denoised input data matrix

Comprises the following steps:

data normalization:

denoising the original input data by adopting a z-score method

The normalization processing is carried out, and the calculation formulas are respectively as follows:

in the formula (I), the compound is shown in the specification,

for denoised data, mu _j For de-noised data

Mean value of (a) _j For de-noised data

Standard deviation of (d).

Obtaining a data matrix R after data standardization as follows:

s13, randomly dividing the preprocessed data matrix R into training data (Train) and Test data (Test), wherein the data quantity ratio between the training data and the Test data is 8:

subdividing the training data equally into K-fold (R) _{train_i} I =1,2,3, \ 8230; K), where K-1 is the training set (R) _{part_train_i} I =1,2,3, \8230k), 1 fold as validation set (R) _{part_valid_i} I =1,2,3, \ 8230; K), and K groups of training sets and validation sets are generated with each foldover as a validation set, specifically:

R _{train_i} is R _train I =1,2, \ 8230;, K:

training set R _{part_train_i} Respectively as follows:

verification set R _{part_valid_i} Comprises the following steps:

s2, setting multiple algorithms to respectively construct training models, and respectively performing hyper-parameter optimization training on each training model according to a training set to obtain multiple first-layer prediction models; in the embodiment, the first-layer prediction model is obtained by training an improved random forest algorithm, a Catboost algorithm and a deep neural network algorithm respectively. Expressed by algorithm, the algorithm is one of RFC \ CatB \ DNN, and the description is replaced by English letters. By fusing three different model algorithms, the precision of the fused model is greatly improved, and the knowledge learned by each model is transferred to the simple classifier after fusion.

The process of carrying out the hyperparametric optimization training on each algorithm training model is a general training process, and specifically comprises the following steps:

method for carrying out hyper-parameter optimization by adopting cross validation and grid search method

(1) Before the initial search, setting required optimization super parameters and optimization ranges according to algorithm, randomly selecting super parameter combinations in the optimization ranges, and setting the search times N of a grid search method;

(2) In the nth search (N is more than or equal to 1 and less than or equal to N), algorithm algorithmm, hyper-parameter combination and training set R are utilized _{part_train_i} (i =1,2,3, \8230; K) training resulted in the nth initial model, and R was paired with the nth initial model _{part_valid_i} Is

Precision _j ＝TP _j /(TP _j +FP _j )，

Recall _j ＝TP _j /(TP _j +FN _j )，

Precision _marco ＝∑ ^L _j＝1 Precision _j /|L|，

Recall _marco ＝∑ ^L _j＝1 Recall _j /|L|，

and decomposing the multi-classification problem into a plurality of two-classification problems, and calculating various two-classification indexes by taking one class as a positive class and the other classes as negative classes every time. In the formula, TP _j When the jth class is a positive class, the number of samples which are actually the positive class and are predicted to be the positive class; FP _j The number of samples which are actually negative but predicted to be positive when the jth class is positive; FN (FN) _j The number of samples which are actually positive class but predicted to be negative class when the jth class is positive class; l is the number of categories; precision _j The model accuracy when the jth class is a positive class; recall _j Model recall rate when category j is positive; precision _marco The average accuracy of the model; recall _marco Average recall for model; macroF1 is a harmonic mean of model accuracy and recall. The larger the MacroF1 is, the higher the model prediction accuracy is, and the better the model performance is.

(6) Setting the final hyper-parameter of the algorithm as the optimal hyper-parameter combination Param _ best _ algorithm, aiming at K training sets R _{part_train_i} Training to obtain K first-layer prediction models Model _ algorithmm _i 。

Predicting to obtain K sub-prediction values Y _{validpredict_model_algorithm_i} Simultaneously adopting a first layer prediction Model _ algorithm _i For in the test data

Predicting to obtain a sub-test value Y _{test_model_algorithm_i} 。

In this embodiment, different weights are assigned to the K first-layer prediction models by using the difference degree between the predicted value and the true value, and the smaller the macro f1 is, the lower the weight is, which has the beneficial effects of increasing the weight of the high macro f1, reducing the weight of the low macro f1, and improving the prediction accuracy.

The weight calculation formula is:

W _i ＝(MacroF1) _i /∑ _i (MacroF1) _i ，

obtaining a test value Y _{test_model_algorithm} Comprises the following steps:

as shown in fig. 2, the specific process of step S2 includes:

training improved random forest algorithm model

Constructing a training model according to an improved random forest algorithm, wherein the process comprises the following steps:

s211, adopting a training set and super-parameter acquisitionUsing the value of random forest algorithm and setting the initial set Q of features ₀ Number of initialized decision trees T ₀ Initializing random forest omega ₀ ；

S212, calculating the global weight W of each feature, and sequencing the features from big to small, wherein the calculation process of the global weight W is as follows:

a1. calculating information entropy IE:

IE＝-∑P(l)logP(l)，

wherein l is the class, P (l) is the class probability;

d1. calculating a global weight W (k) for feature k:

W(k)＝∑ _Ψ W ^Ψ (k)W ^Ψ /max _k ∑ _Ψ W ^Ψ (k)W ^Ψ ；

s213. Order

|Q ₀ I represents the set Q ₀ Number of elements of (2), Q ₀ V before middle row ₀ Put the features of (a) into a set CV;

s215, initializing n =0;

s217. When I _n R, the following procedure is performed:

a2. compute CI _n The mean value μ and standard deviation σ of the global weights W (k) of all features in (a), and setting a threshold value t _ value = μ - σ; b2. comparative CI _n The size of the global weight W (k) and t _ value of all features in (1), if W (k) < t _ value, the feature is selected from CI _n Is moved out and put into the set S _n The preparation method comprises the following steps of (1) performing;

c2. simultaneous comparison of CI _n Global weights W (k) and CV for all features in _n Minimum global weight min of middle feature _CVn (W (k')), if W (k) > min _CVn (W (k')), the feature is taken from CI _n Move into set Z _n The preparation method comprises the following steps of (1) performing;

d2. will S _n From Q _n In (1) removing, Q _n+1 ＝Q _n -S _n ；

p＝1-q＝(r/(V+I))/(r/I)；

h2. calculating the partial derivatives P of P pairs V and I _V And P _I ：

P _V ＝-(Δq/ΔV) _T ＝(V！(V+I-1-r)！r)/((V-r)！(V+I)！)，

P _I ＝-(Δq/ΔI) _T ＝((I-1)！(V+I-1-r)！Vr)/((I-r)！(V+I-1)！(V+I))；

i2. Calculating the correlation degree between different decision trees:

ρ＝1-(r/(V+I))/(r/(V+I-r))；

j2. calculate Δ T and round up:

|ΔT|＝|ρ*(P _V ΔV+P _I ΔI)/I-1|；

k2. let T _n+1 ＝T _n +ΔT；

m2. Calculate global weight W and pair Q _n+1 Ranking the features in (1);

s218, outputting the number T of decision trees at the end of iteration _end And a feature set Q _end ；

S219, setting hyper-parameters to be optimized and the range thereof:

maximum depth of tree max _ depth: [3,16] step size 1; feature sampling ratio max _ features: [0.5,1.0] with a step size of 0.1; leaf node minimum sample number min _ samples _ leaf: [1,5] with a step size of 1; node partitioning minimum sample number min _ samples _ split: [2,10], step size 1.

Training Catboost model

A training model is constructed by adopting a Catboost algorithm, and the process is as follows:

setting the over-parameters and the range thereof to be optimized:

number of decision trees iterations: [50,500], step size 1; learning rate learning _ rate: [0.01,0.1], step size 0.01; regularization coefficient l2_ leaf _ reg: [1,5], step size 0.5; depth of tree: [3,10] step size of 1; sample sampling ratio subsample: [0.5,1.0] with a step size of 0.1; column sampling ratio rsm: [0.5,1.0], step size 0.1.

Training deep neural network model

the hidden layer is set to be 2 layers, and the hyper-parameters and the range thereof to be optimized are set:

Param_best_DNN＝＝[batch_size _best ,

first_hidden_layer _best ,second_hidden_layer _best ,optimizer _best ]obtaining K first-layer prediction models Model _ DNN of the deep neural network _i (i＝1,2,3,…K), and a deep neural network algorithm predicted value Y _{validpredict_model_DNN} And a test value Y _{test_model_DNN} 。

S3, outputting prediction data according to the first layer prediction model, and performing combined reconstruction on the prediction data to obtain second layer training data; the specific process comprises the following steps:

reconstructing and combining predicted values and test values obtained by predicting the three algorithm first-layer prediction models, and respectively combining the predicted values and the test values to obtain a second-layer training data comprising a second-layer training set and a second-layer test set, wherein the second-layer training set is as follows: r is _{train_s2} ＝[Y _{validpredict_model_RFC} ,Y _{validpredict_model_CATB} ,Y _{validpredict_model_DNN} ,Y _train ]，

Wherein, Y _{validpredict_model_RFC} ,Y _{validpredict_model_CATB} And Y _{validpredict_model_DNN} Inputting data for the second training set, Y _train Outputting data for the second layer training set by using Y in the training data _train 。

The second layer test set is:

wherein, Y _{test_model_RFC} ,Y _{test_model_CATB} And Y _{test_model_DNN} Input data for the second level test set, Y _test Outputting data for the second layer test set by using Y in the test data _test 。

S4, constructing a second-layer training model, and performing regression training according to second-layer training data to obtain a second-layer prediction model; the method specifically comprises the following steps:

will Y _{validpredict_model_RFC} ,Y _{validpredict_model_CATB} And Y _{validpredict_model_DNN} And as input, a second-layer prediction model is constructed by adopting a multi-classification logistic regression model, the loss function is a cross entropy loss function, and the multi-classification logistic regression model is trained.

The cross entropy loss function is:

LOSS＝-1/n∑ _i ∑ ^L-1 _c＝0 y _ic log(p _ic )，

where n is the number of samples in the second training set, L is the number of classes, which in this example is 7,y _ic Is a symbolic function, if the class Y of the sample i _{train_i} Is equal to c, then y _ic Taking 1, otherwise, taking 0; p is a radical of _ic For the predicted probability that the sample i belongs to the category c, softmax is used for calculation, the probability of each category is output by softmax regression, and the probability that the fault of the sample i belongs to the c-th (c =0,1,2,3,4,5, 6) category is calculated as:

where θ is a parameter matrix of a multi-class logistic regression function, x _i Is input as Y _{validpredict_model_RFC} ,Y _{validpredict_model_CATB} And Y _{validpredict_model_DNN} 。

Training according to a training target of the minimum cross entropy loss function to obtain an optimal multi-classification Logistic regression Model _ Logistic, wherein the optimal parameter matrix is theta _best 。

By R _{test_s2} Testing the effect of the model, and calculating the second layer test set R _{test_s2} The MacroF1 value of (a), ensures that the value is within a preset range.

And S5, collecting the operation data of the coal mill to carry out fault diagnosis. The specific process comprises the following steps:

collecting actual data samples;

collecting actual data sample X _new ＝(x ₁ ,x ₂ ,…,x _m )；

Denoising and labeling through Hankel filteringAfter normalization, becomes X _new ′＝(x′ ₁ ,x′ ₂ ,…,x′ _m )；

Mixing X _new ' input into each algorithm first prediction model:

passing Model _ RFC _i (i =1,2,3, \8230;, K) prediction results in K sub-prediction values Y _ RFC _i And obtaining a comprehensive predicted value Y _ RFC through weighted calculation:

Y_RFC＝∑ ^K _i＝1 w _{RFC_i} *Y_RFC _i ；

Through Model _ DNN _i Predicting to obtain K sub-prediction values Y _ DNN _i And obtaining a comprehensive prediction value Y _ DNN through weighting calculation: y _ DNN = ∑ Σ ^K _i＝1 w _{DNN_i} *Y_DNN _i 。

X _second ＝[Y_RFC,Y_CATB,Y_DNN]is mixing X _second And inputting the type of the second layer prediction Model _ Logistic to output the type with the highest probability, namely the fault type, so as to realize fault diagnosis.

S6, early warning is carried out according to fault diagnosis, and the process comprises the following steps:

setting a probability difference threshold Δ P _min With a time threshold Δ t _max When the predicted state of the second layer prediction model is no fault, the difference between the maximum probability (no fault) and the sub-maximum probability (fault i) is set as delta P _0-i ：

ΔP _0-i ＝P _{0_first} -P _{i_second} Wherein, P _{0_first} To predict the probability value of the state being fault-free, P _{i_second} The state probability value of the fault i with the occurrence probability of the second largest is obtained; at Δ t _max Within interval Δ P _0-i ≤ΔP _min And when the fault is always established, providing early warning for the occurrence of the fault i, and feeding back in time so as to remove the fault risk.

The present embodiment will be described below with reference to specific examples.

S1, collecting historical operating data of the coal mill, wherein the historical operating data comprises fault-free data and fault data, marking fault category labels on the fault data, and setting the original sample size of the data to 10000. The details of the collected operational data and fault conditions are shown in table 1 above.

The operation data is input data, 19 input variables are totally, the output data is in a fault state, and 7 types are totally.

The collected data are:

wherein, X _i Represents a group of input parameter data, 19 groups are provided, each group of parameters has 10000 data, and X is a 10000 multiplied by 19 matrix; y represents fault status data and is a 10000 × 1 matrix.

S12, preprocessing the acquired data, wherein the preprocessing comprises denoising the original data by using a Hankel filtering algorithm and standardizing the denoised data by using a z-score method.

The data matrix R obtained after denoising and standardization is as follows:

s13, randomly dividing the preprocessed data matrix R into training data (Train) and Test data (Test), wherein the data quantity ratio between the training data and the Test data is 4:

R _train is a 8000X 20 matrix, R _test A 2000 x 20 matrix.

The training set is equally subdivided into 5-fold (Rrain _ i, i =1,2,3,4, 5), where 4-fold is taken as the sub-training set (R) _{part_train_i} I =1,2,3,4,5, 1 fold as the sub-validation set (R) _{part_valid_i} I =1,2,3,4, 5), and generates 5 sets of training sets and validation sets by taking each fold as a validation set, specifically:

……

R _{train_i} is a 1600 × 20 matrix, R _{part_train_i} Is a matrix of 6400 × 20, R _{part_valid_i} A 1600 x 20 matrix.

S2, setting multiple algorithms to respectively construct training models, and respectively performing hyper-parameter optimization training on each training model according to a training set to obtain multiple first-layer prediction models; in the embodiment, the first-layer prediction model is obtained by training an improved random forest algorithm, a Catboost algorithm and a deep neural network algorithm respectively.

before the initial search, setting required optimization super parameters and optimization ranges according to an algorithm, randomly selecting super parameter combinations in the optimization ranges, and setting the search times N of a grid search method;

in the nth search (N is more than or equal to 1 and less than or equal to N), algorithm algorithmm, hyper-parameter combination and training set R are utilized _{part_train_i} (i =1,2,3, \8230K) training to obtain an initial model, and using the initial model to R _{part_valid_i} Is/are as follows

Outputting K initial predicted output values Y _{train_i_n} 。

Wherein the training of the initial model adopts a cross validation mode aiming at R _{part_train_i} And 5-fold cross validation is carried out, the training set is divided into 5 parts, 4 parts are taken as training data, the rest 1 part is taken as test data, 5 times of sample circulation of 5 parts are repeated, and each 1 part of data is taken as 1 time of test data. And (3) carrying out hyperparametric optimization by adopting a grid search method, wherein the optimized objective function is the average MacroF1 value of the test set of the minimum 5-fold cross validation.

The grid search method is realized by GridSearchCV of a Sklearn library, the iteration number of the grid search method is set to be 200, and when the iteration is completed, an optimal hyper-parameter combination is output according to an optimization objective function of a test set average MacroF1 value of 5-fold cross validation maximization, so that an optimal prediction model, namely a first prediction model, is obtained through training.

Repeating the operation to obtain 5 first layer prediction models and corresponding Y _{validpredict_model_algorithm_i} (matrix size 1600X 1) and Y _{test_model_algorithm_i} (matrix)Size 2000 × 1), and 5Y s _{validpredict_model_algorithm_i} Column combination is carried out to obtain a corresponding predicted value Y of the algorithm _{validpredict_model_algorithm} (matrix size 8000X 1) for 5Y simultaneously _{test_model_algorithm_i} Carrying out weight distribution and summation to obtain a test value Y _{test_model_algorithm} (the matrix size is 2000 × 1).

Training improved random forest algorithm model

Constructing a training model according to an improved random forest algorithm, and setting the hyper-parameters to be optimized and the range thereof:

maximum depth of tree max _ depth: [3,16] step size of 1; feature sampling ratio max _ features: [0.5,1.0], step size 0.1; leaf node minimum sample number min _ samples _ leaf: [1,5] with a step size of 1; node partitioning minimum sample number min _ samples _ split: [2,10], step size 1.

Param_best_RFC＝＝[T _end ,max_depth _best ,max_features _best ,min_samples_leaf _best ,min_samples_split _best ]obtaining 5 improved random forest first layer prediction models Model _ RFC _i (i =1,2,3, \ 8230;, 5), and improving the random forest algorithm prediction value Y _{validpredict_model_RFC} And a test value Y _{test_model_RFC} 。

Training the Catboost model

setting the over-parameters and the range thereof to be optimized:

decision tree quantities iterations: [50,500] step size 1; learning rate learning _ rate: [0.01,0.1], step size 0.01; regularization coefficient l2_ leaf _ reg: [1,5], step size 0.5; depth of tree: [3,10] step size 1; sample sampling ratio subsample: [0.5,1.0], step size 0.1; column sampling ratio rsm: [0.5,1.0], step size 0.1.

Param_best_CATB＝＝[iterations _best ,learning_rate _best ,l2_leaf_reg _best ,subsample _best ,rsm _best ]obtaining 5 first-layer prediction models Model _ CATB of Catboost _i (i =1,2,3, \8230;, 5), and the Catboost algorithm predicted value Y _{validpredict_model_CATB} And a test value Y _{test_model_CATB} 。

Training deep neural network model

batch size batch _ size: [32,64,128]; the first hidden layer neuron number, first _ hidden _ layer: [7,40] step size 1; second hidden layer neuron number second _ hidden _ layer: [7,20] with a step size of 1; optimizer: [ 'Adam', 'SGD', 'LBFGS', 'Rprop' ].

Param_best_DNN＝＝[batch_size _best ,

first_hidden_layer _best ,second_hidden_layer _best ,optimizer _best ]obtaining 5 first-layer prediction models Model _ DNN of the deep neural network _i (i =1,2,3, \ 8230;, 5), and the deep neural network algorithm predicted value Y _{validpredict_model_DNN} And a test value Y _{test_model_DNN} 。

S3, outputting prediction data according to the first layer of prediction model, and performing combined reconstruction on the prediction data to obtain second layer of training data; the specific process comprises the following steps:

reconstructing and combining predicted values and test values obtained by predicting the three algorithm first-layer prediction models, and respectively combining the predicted values and the test values to obtain second-layer training data, wherein the second-layer training data comprises a second-layer training set and a second-layer test set, and the second-layer training set is as follows: r is _{train_s2} ＝[Y _{validpredict_model_RFC} ,Y _{validpredict_model_CATB} ,Y _{validpredict_model_DNN} ,Y _train ]，

Wherein R is _{train_s2} A 8000 x 4 matrix.

The second layer test set is:

wherein R is _{test_s2} A 2000 x 4 matrix.

S4, adding Y _{validpredict_model_RFC} ,Y _{validpredict_model_CATB} And Y _{validpredict_model_DNN} Using a multi-classification Logistic regression Model as an input to construct a second layer prediction Model, using a second layer test set for feeding back Model hyper-parameters, and training to obtain an optimal multi-classification Logistic regression Model _ Logistic by using the principle of minimizing cross entropy loss function, wherein an optimal parameter matrix is theta _best . The cross entropy loss function is:

LOSS＝-1/8000∑ _i ∑ ⁶ _c＝0 y _ic log(p _ic )，

using softmax regression to output the probability of each class, the probability that the failure of sample i belongs to the c-th (c =0,1,2,3,4,5,6) class is calculated as:

and finally, the category with the highest output probability is the diagnosed fault type.

setting a probability difference threshold Δ P _min =0.05 and time threshold Δ t _max =30min, and when the state predicted by the second layer prediction model is no fault, the difference between the maximum probability (no fault) and the sub-maximum probability (fault i) is Δ P _0-i ：

ΔP _0-i ＝P _{0_first} -P _{i_second} Wherein P is _{0_first} To predict the probability value of a state being fault-free, P _{i_second} The state probability value of the fault i with the occurrence probability of the second largest is obtained; the fault state of the coal mill is predicted once in 5 minutes, and the delta P is continuously predicted 7 times in 30 minutes _0-i ≤ΔP _min When =0.05 is always established,and providing early warning of occurrence of the fault i, and feeding back in time so as to remove fault risks.

Example 2:

the coal mill fault diagnosis system based on multi-model fusion is specially used in the method in the embodiment 1, and the structure of the system comprises a data acquisition module, a data preprocessing module, a model training module, a model deployment module and a result output module which are sequentially connected, wherein the data acquisition module comprises: collecting historical operating data of a coal mill;

a data preprocessing module: preprocessing collected historical operating data;

a model training module: constructing a training set and a verification set of cross verification by using the preprocessed data; respectively constructing training models by adopting various algorithms, and respectively carrying out hyper-parameter optimization training on each training model according to a training set to obtain a plurality of first-layer prediction models; outputting prediction data according to the first layer prediction model, and performing combined reconstruction on the prediction data to obtain second layer training data; constructing a second layer training model, and performing regression training according to second layer training data to obtain a second layer prediction model;

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. A coal mill fault diagnosis method based on multi-model fusion is characterized in that: the method comprises the following steps:

s2, setting multiple algorithms to respectively construct training models, and respectively carrying out super-parameter optimization training on each training model according to a training set to obtain multiple first-layer prediction models;

2. The method for diagnosing the fault of the coal mill based on the multi-model fusion as claimed in claim 1, wherein the step S2 of performing the hyper-parameter optimization training process on each training model comprises:

setting an optimal hyper-parameter combination as a final hyper-parameter of a set algorithm, training according to different training sets to obtain a plurality of first-layer prediction models, predicting a corresponding verification set according to the first-layer prediction models to obtain sub-prediction values, and merging the sub-prediction data to obtain each algorithm prediction value.

3. The coal mill fault diagnosis method based on multi-model fusion as claimed in claim 2, wherein the first layer prediction model is obtained by training respectively with an improved random forest algorithm, a Catboost algorithm and a deep neural network algorithm.

4. The method for diagnosing the fault of the coal mill based on the multi-model fusion as claimed in any one of claims 1 to 3, wherein the specific process of the step S4 comprises the following steps:

and (3) taking the prediction data of each algorithm as input, adopting a multi-classification logistic regression model to construct a second-layer prediction model for training, wherein the loss function is a cross entropy loss function, outputting the probability of each fault category by utilizing softmax regression, and training to obtain an optimal multi-classification logistic regression model as the second-layer prediction model according to a training target of the minimized cross entropy loss function.

5. The coal mill fault diagnosis method based on multi-model fusion as claimed in claim 4, wherein the specific process of step S5 comprises:

collecting actual data samples;

inputting a sample into a plurality of first-layer prediction models of each algorithm, predicting to obtain a plurality of sub-prediction values, and obtaining a comprehensive prediction value through weighted calculation;

and combining the comprehensive predicted values of the algorithms to form second-layer prediction model input data, inputting the second-layer prediction model input data into the second-layer prediction model, and outputting the class with the maximum probability, namely the fault type.

6. The method for diagnosing the fault of the coal mill based on the multi-model fusion as claimed in claim 1, wherein the specific process of constructing the training set and the validation set of the cross validation in the step S1 comprises:

7. The method for diagnosing the fault of the coal mill based on the multi-model fusion as claimed in claim 1 or 5, wherein the method further comprises preprocessing the data after the data acquisition, and the preprocessing comprises data de-noising and data standardization.

8. A coal mill fault diagnosis system based on multi-model fusion, applied to the method of any one of claims 1 to 8, characterized in that: comprises the steps of (a) preparing a mixture of a plurality of raw materials,

a data acquisition module: collecting historical operating data of a coal mill;

a data preprocessing module: preprocessing collected historical operating data;

a model training module: constructing a training set and a verification set of cross verification by utilizing the preprocessed data; respectively constructing training models by adopting various algorithms, and respectively carrying out hyper-parameter optimization training on each training model according to a training set to obtain a plurality of first-layer prediction models;

outputting prediction data according to the first layer prediction model, and performing combined reconstruction on the prediction data to obtain second layer training data; constructing a second layer training model, and performing regression training according to second layer training data to obtain a second layer prediction model;