CN110309608B

CN110309608B - Blast furnace molten iron silicon content forecasting method aiming at time lag uncertain information

Info

Publication number: CN110309608B
Application number: CN201910605443.6A
Authority: CN
Inventors: 王玉涛; 赵俊哲; 宫喜鹏; 杨钢
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2019-07-05
Filing date: 2019-07-05
Publication date: 2022-12-23
Anticipated expiration: 2039-07-05
Also published as: CN110309608A

Abstract

The invention discloses a method for forecasting the silicon content of blast furnace molten iron aiming at time lag uncertain information. The method considers the influence of the fluctuation of each process parameter on the silicon content during the iron-making period and the uncertainty of the time lag of the process parameter action, and takes the mean value and the variance of the process parameter in the corresponding time lag range as the input; meanwhile, the characteristic of more blast furnace ironmaking process parameters is considered, a random gradient lifting decision tree is used as a basic model, and lower model complexity is maintained under the condition that the input dimension is not changed; and by combining the characteristic that the distribution of various factories in high-silicon and low-silicon areas is less at present, the sample deflection problem is compensated by adopting a repeatable random sub-sampling strategy, so that the modeling and prediction of the content of the molten iron silicon in the blast furnace are completed. The method is suitable for predicting the silicon content of the molten iron of the blast furnace with uncertain variable action lag time, higher variable dimension and less part of samples, and has higher hit rate and higher training and predicting speed compared with the prior method.

Description

Blast furnace molten iron silicon content forecasting method aiming at time lag uncertain information

Technical Field

The invention relates to the technical field of prediction of silicon content of blast furnace molten iron, in particular to a method for predicting the silicon content of the blast furnace molten iron aiming at uncertain time-lag information.

Background

Because most parameters influencing the silicon content of the molten iron and the temperature change of the molten iron have certain time lag in the production of the blast furnace, the existing models select a certain determined lag time of the maximum correlation by combining with a correlation coefficient or adopting a nonlinear regression method according to the experience of workers. However, in the actual production process, the lag time of the parameter has uncertainty at different stages or under different working conditions, and the parameter varies within a certain range, and the parameter fluctuates in different degrees within the range. Therefore, the existing method models by determining the lag time point to obtain the corresponding instantaneous value, which causes inaccurate lag time and missing fluctuation information, thereby causing the phenomenon of poor prediction performance. And the blast furnace parameters are more, so that the characteristic dimension is higher, the complexity of the existing models such as a neural network and the like is greatly increased, so that the training is difficult, and the reduction of the complexity of the models is very necessary under the condition of ensuring that the characteristic information is not lost. In addition, the silicon content of the molten iron is mainly concentrated in a certain range, samples outside the range are few, the existing model is not optimized aiming at the sample deflection problem, and therefore the model prediction result is inaccurate when the silicon content exceeds the range.

Disclosure of Invention

According to the problems in the prior art, the invention discloses a method for forecasting the silicon content of blast furnace molten iron aiming at uncertain time lag information, which specifically comprises the following steps:

s1, carrying out denoising treatment on relevant process parameters collected in actual blast furnace ironmaking to obtain denoised process parameter time sequences;

and S2, carrying out sectional treatment on the time sequence of each process parameter, obtaining the maximum information coefficient of each parameter of each stage and the silicon content of the molten iron of the blast furnace under different lag times as a correlation index, selecting the lag time with the maximum correlation with the silicon content for each parameter of each stage, namely the lag time of the parameter acting on the silicon content in the current stage, and then carrying out statistics on each stage to obtain the lag time range of each process parameter.

S3, finding the time range t-t 2-t 1 of the process parameter of the silicon content acting on the current moment (t) according to the time lag range t 1-t 2 of the process parameter (different process parameter time lag ranges are different) obtained in the step S2; and calculating the mean value and the variance of the process parameters in the action time range t-t 2-t 1; and taking the obtained mean value and variance of each process parameter as characteristic variables.

And S4, analyzing the acquired silicon content values to find a centralized distribution range, wherein samples in the range are divided into large samples, samples out of the range are divided into small samples, and the sampling frequency and the sampling ratio are determined by adopting a grid search method.

S5, according to the sampling frequency determined in the S4, repeatedly and randomly sampling the large samples after each specific iteration, taking the sampling results of the large samples and all the small samples as input samples of the gradient lifting tree at the current stage, iterating by adopting a gradient descent algorithm, gradually reducing the prediction error of the gradient lifting tree model, and stopping iterating and outputting the silicon content prediction model until the prediction error meets a set error range;

and S6, calculating the real-time variance and mean value of the process parameters acquired on line according to the actual production process, and performing on-line silicon content prediction by adopting the gradient lifting tree model trained in the step S5.

The S2 specifically comprises the following steps:

s21: solving the process parameters of each time sequence and the maximum information coefficient MIC of the silicon content in the production state of the blast furnace;

s22: obtaining the lag time of the process parameters of each stage acting on the silicon content under the production state of the blast furnace;

s23: obtaining the maximum and minimum lag time and the lag time range of the process parameters acting on the silicon content in the production state of the blast furnace.

Further, S4 specifically adopts the following manner:

s41: calculating the upper quartile, the lower quartile and the interquartile distance of the training sample silicon content value, finding out the concentrated distribution range of the silicon content according to the information, setting the partition boundary of the large sample and the small sample, and dividing the samples into the large sample and the small sample through the partition boundary;

s42: and searching the optimal large-class sample by adopting a method of combining cross validation and a grid search method, and determining the sampling frequency and the sampling ratio of the large-class sample.

S5, the specific steps are as follows:

s51, performing down-sampling according to the classification result obtained in the S41 and the sampling ratio obtained in the S42, performing random sampling on the large samples according to the sampling ratio, and combining the sampling result with all the small samples to be used as basic model training data;

s52, fitting a gradient lifting tree prediction model;

s53: determining a loss function of a gradient lifting tree prediction model, solving a negative gradient of a current model loss function, and taking the negative gradient as a residual error approximate value of a current least square tree;

s54: training and iterating the least square tree, combining a leaf-wise leaf generation strategy, selecting the optimal node in the same layer of the least square tree for segmentation each time, taking other nodes as final leaf nodes of the decision tree, and stopping current tree iteration to obtain a current gradient lifting tree prediction model when the termination condition of a single least square tree is met;

s55: judging whether the current iteration frequency reaches the iteration frequency corresponding to the sampling frequency obtained in the step S42, and if so, jumping to the step S52 to continue training the gradient lifting tree prediction model; otherwise, jumping to S51 to perform downsampling and updating training data, and then performing training until the termination condition is met;

s57: and obtaining a final gradient lifting tree prediction model when the termination condition is met.

Due to the adoption of the technical scheme, the method for forecasting the silicon content of the blast furnace molten iron aiming at the time lag uncertain information considers the influence of the fluctuation of each process parameter on the silicon content when the furnace condition is unstable during the iron making, the uncertainty of the lag time of the parameter acting on the silicon content and other factors, divides the data into multiple stages, calculates the lag time of the parameter through the Maximum Information Coefficient (MIC), and counts the lag time range of each parameter. During modeling and prediction, the mean value and the variance of the variables in the action time range of the corresponding variables of the silicon content are used as model input, and the predicted mean square error is obviously reduced compared with that of a certain lag time selected.

The method also considers that more process parameters in the blast furnace ironmaking process have certain influence on silicon content, adopts a gradient lifting tree based on a regression tree as a base model, inputs the process parameters as the model without dimension reduction or characteristic selection, can maintain lower model complexity on the premise of avoiding information loss caused by data dimension reduction, adopts the variables after dimension reduction and the variables without dimension reduction as the input of the model in the experiment, has better effect when the variables without dimension reduction are used as the input, and explains that the model is inaccurate when the information is lost in the dimension reduction.

Secondly, for the condition that the silicon content numerical value is not uniformly distributed, adding a repeatable random down-sampling strategy into the gradient lifting tree, carrying out repeatable random sampling on the large-class samples according to a certain sampling frequency and sampling ratio, and carrying out model training together with all the small-class samples. When the trained model is used for prediction, the prediction effect on the silicon content outside the range of the large-class sample is better. Meanwhile, the Welford algorithm is applied to online prediction of the silicon content of the molten iron of the blast furnace, and time consumption is less when variance and mean values are calculated.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the description below are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of modeling and on-line prediction of the silicon content of molten iron in a blast furnace according to the present invention

FIG. 2 is a flow chart of empirical mode decomposition

FIG. 3 is a schematic diagram of time lag analysis

FIG. 4 is a diagram of an integrated model of a gradient lifting decision tree incorporating a repeatable random downsampling strategy

FIG. 5 is a diagram of the effect of model training

FIG. 6 is a graph showing the effect of predicting the silicon content

Detailed Description

In order to make the technical solutions and advantages of the present invention clearer, the following makes a clear and complete description of the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention:

as shown in fig. 1, the method for forecasting the silicon content of the blast furnace molten iron aiming at the uncertain time-lag information specifically comprises the following steps:

step 1: denoising each parameter time sequence through empirical mode decomposition and effective information reconstruction: step 1-1: in the method shown in FIG. 2, each parameter time series is decomposed into n eigenmode functions from high to low according to frequency and a residual r _n (t):

Where x (t) starts a time series. r is _n (t) is the residual component, c _i (t) is the eigenmode function (IMF), which must satisfy the following two conditions:

(1) In the whole time range, the number of local extreme points and zero-crossing points must be equal, or at most, have one difference;

at any point in time, the envelope of the local maxima (upper envelope) and the envelope of the local minima (lower envelope) must on average be zero.

Step 1-2: reconstructing the low-frequency part of the decomposition information:

in the formula (I), the compound is shown in the specification,

represents the reconstruction of the low frequency part, i.e. the estimation of the effective signal x (t); to obtain the best estimate of the effective signal, the mean square error (CMSE) is found for two successive reconstructed signals:

wherein Δ is CMSE, k =1, 2. The energy density of the k-th order IMF component is characterized. When the mean square error between all the continuous reconstruction signals is obtained by the above formula, the reconstruction signal corresponding to the global minimum value is the best estimation of the effective signal.

Step 2: observing the time series of each process parameter, dividing the data into time series with step-changed S value, and calculating the time lag between each parameter variable and silicon content in each time series according to the graph shown in FIG. 3 to obtain the maximum value T _max And a minimum value T _min To obtain the hysteresis ranges of the process variables:

step 2-1: and (3) solving the maximum information coefficient MIC of each variable and the silicon content of each time sequence:

wherein, X _ki Is the time series of the kth time segment, the ith variable, Y _kj The lag time is the time series of the silicon content of j units (10 min) for the kth time period. And B is empirically selected to be 0.6 power of the total amount of data in the current stage. MIC [ X ] _ki ,Y _kj ]The maximum information coefficient of the ith variable and the silicon content when the time lag of the kth stage is j.

Step 2-2: the lag time for each phase, during which the variables act on the silicon content, is determined:

wherein, MIC [ X _ki ,Y _k ]For the kth stage, the ith variable corresponds to the maximum information coefficient, J, found for different time lags, J _ki And obtaining the lag time j corresponding to the maximum value of the maximum information coefficient in the kth stage, namely the lag time of the ith variable acting on the silicon content in the kth stage.

Step 2-3: obtaining the maximum and minimum lag time of each variable acting on the silicon content, and obtaining the lag time range T _imin ～T _imax :

T _imax ＝max(J _i )

T _imin ＝min(J _i )

Wherein, J _i Representing the lag time, T, of the ith variable at different stages _imax Maximum lag time of the ith variable in S stages, T _imin Minimum hysteresis in S stages for ith variableThe latter time.

And step 3: and (3) solving the mean value and the variance of the variables in the corresponding time range of the silicon content at each moment according to the time lag range of each variable solved in the step (2), and arranging to obtain N groups of model training samples:

y _t ＝y _t

wherein xmean _ti ,xvar _ti The mean and variance of the ith variable in the action time range corresponding to the time t, namely the input data corresponding to the model at the time t.

And 4, step 4: analyzing the value distribution of the silicon content of the training samples, finding out the centralized distribution range of the silicon content, dividing the large-class samples and the small-class samples, repeatedly and randomly sampling, inputting the current sampling result as a model, and training a gradient lifting tree model as shown in figure 4.

Step 4-1: and calculating a lower quartile Q1 and an upper quartile Q2 of the silicon content of the training sample to obtain a quartile distance IQR = Q2-Q1, wherein the upper limit of the normal value is Q2+1.5 IQR, and the lower limit of the normal value is Q1-1.5 IQR. Samples in the range of Q1-1.3 × IQR-Q2 +1.3 × IQR are classified into large samples, and samples in the range of Q1-1.5 × IQR-Q1-1.3 × IQR and Q2+1.3 × IQR-Q2 +1.5 × IQR are classified into small samples (wherein 1.3 is a value selected according to sample distribution, and correction can be performed according to different furnace conditions).

Step 4-2: setting the search range of sampling frequency to be 1-60, the interval to be 2, the search range of sampling ratio to be 0.1-1 and the interval to be 0.05, and carrying out 5-fold cross validation and grid search optimization by taking the mean square error of a validation set as an evaluation index to obtain the sampling frequency and the sampling ratio of a large class of samples.

S5, according to the sampling frequency determined in the S4, repeatedly and randomly sampling the large samples after each specific iteration, taking the sampling results of the large samples and all the small samples as input samples of the gradient lifting tree at the current stage, iterating by adopting a gradient descent algorithm, gradually reducing the prediction error of the gradient lifting tree model, and stopping iterating and outputting the silicon content prediction model until the prediction error meets a set error range; step 5-1: and 4, performing down-sampling according to the sampling ratio obtained by the large-class samples and the small-class samples obtained in the step 4, performing random sampling on the large-class samples, and combining the sampling result with all the small-class samples to be used as training data of the base model.

Step 5-2: and fitting a gradient lifting decision tree silicon content prediction model according to the training data obtained in the step 5-1, and initializing the model:

step 5-3: solving the negative gradient of the current model loss function:

L(y,f(x))＝[y-f(x)] ²

wherein L (-) is a model loss function, r _mi Loss function at f (x) for the current model _i ) Negative gradient above as the residual approximation r of the least squares regression tree _mi 。

Step 5-4: by using r _mi Fitting the mth regression tree:

traversing the variable j, scanning a cut point s for the fixed cut variable v, and selecting a pair (v, s) which enables the following formula to reach the minimum value:

dividing the region by the selected pair (v, s) and determining the corresponding output value:

R ₁ (ν,s)＝{x|x ^(ν) ≤s}

R ₂ (ν,s)＝{x|x ^(ν) ＞s}

and generating a strategy by leaf-wise leaves, continuously calculating optimal segmentation variables and optimal segmentation points for the two sub-regions respectively, selecting an optimal node region in the same layer for segmentation, and regarding other nodes as final leaf nodes of the decision tree. The above operation is repeated.

When the single tree stopping condition is met, J leaf node regions R of the mth regression tree are obtained ₁ ,R ₁ ,...,R _J And solving the decision tree parameters that minimize the loss function

The following formula:

and 5-5: the model obtained in the mth step is as follows:

f _m (x)＝f _m-1 (x)+T(x,c _m )

and 5-6: judging whether the current iteration times reach the iteration times corresponding to the sampling frequency obtained in the step S4, and jumping to the step 5-2 to continue training if the current iteration times reach the iteration times corresponding to the sampling frequency obtained in the step S4; otherwise, jumping to the step 5-1 to perform downsampling and training data updating, and then performing training until the termination condition is met.

And 5-7: when the termination condition is satisfied, a final integrated model is obtained, which is represented by linear combination of a decision tree, namely a fitted silicon content prediction model, and fig. 5 is an example of a model training result:

step 6: calculating real-time variance and mean value of online iron-making data by adopting Welford algorithm:

V _n ＝V _n-1 +(x _n -M _n-1 )×(x _n -M _n )

wherein, M _n ,V _n And (5) predicting the current prediction input value by the model obtained by training in the step 5.

Example (b):

in the actual blast furnace iron making process, the temperature in the blast furnace needs to be strictly controlled in order to ensure the quality of pig iron. The blast furnace is a closed system and has great difficulty in acquiring the temperature in the furnace. The silicon content in molten iron, which is generally used to reflect the temperature level in the furnace, can reflect not only the thermal state of the production process but also the quality of pig iron. Therefore, future information on the silicon content in the molten iron is very important for a blast furnace operator to judge the internal state of the blast furnace and the quality of pig iron. If the change of the silicon content can be predicted, the blast furnace operator can take accurate control measures in advance to control the production process, and the quality of the blast furnace molten iron can be effectively improved and stabilized.

The accuracy of the proposed model is verified by researching the actual production data of a certain steel mill. The following steps are described with reference to specific procedures:

step 1: and (4) carrying out time sequence denoising on the actual blast furnace ironmaking data to obtain each denoised parameter time sequence. The process includes 27 process parameters as shown in the following table:

1.CO/％	10. top temperature (northeast)/°c	19. Air permeability
			2.CO ₂ /％	11. Top temperature (southwest)/° c	20. Top pressure 1/kPa
3.H ₂ /％	12. Top temperature (northwest)/°c	21. Top pressure 2/kPa
			4. Cold air flow/(m) ³ ·min ^-1 )	13. Gas flow/m ³	22. Top pressure 3/kPa
5. Oxygen content/(m) ³ ·h ^-1 )	14. Gas index/MPa	23. Standard wind speed/(m.s) ^-1 )
			6. Blast kinetic energy/kJ	15. Cold air pressure/MPa	24. Actual wind speed/(m.s) ^-1 )
7. Blast humidity/(g/m) ³ )	16. Oxygen-enriched pressure/MPa	25. Coal injection quantity per hour/(t.h) ^-1 )
			9. Coefficient of drag/cd	17. Hot air pressure/MPa	26. Soft water inflow rate/(t.h) ^-1 )
8. Top temperature (southeast)/° c	18. Total differential pressure/kPa	27. Temperature of soft water inlet/° c

And 2, step: and segmenting each parameter time sequence, obtaining the maximum information coefficient of each process parameter and the content of the molten iron and the silicon in the blast furnace under different lag times in each stage, and selecting the lag time with the maximum correlation between each parameter and the content of the silicon in each stage. The analysis results in a range of time lags for each parameter.

And step 3: and according to the time lag of each parameter, solving the mean value and the variance in the corresponding action time range as model input.

And 4, step 4: taking the first 80% data of a certain continuous time sequence as training data, analyzing the value range of the silicon content of the sample, dividing a large sample and a small sample, repeatedly and immediately sampling, taking the current sampling result as model input, and combining a leaf-wise leaf generation strategy to train a gradient lifting tree model.

And 5: and outputting a silicon content prediction model.

And 6: the last 20% of this continuous time series was chosen as test data to simulate online prediction.

Selecting hit rate and mean square error as model evaluation indexes:

wherein, y _i And

respectively representing the real value and the model predicted value of the ith sample. MSE is the mean square error of the true and predicted values, H _i Is the Heaviside function, J is the model hit rate.

The model of the invention is used for modeling, the hit rate on the training set can reach 98 percent, the mean square error of the test set is 0.0024, and the hit rate reaches 94 percent. And when the value and the fluctuation of the silicon content of a small sample amount are obvious, compared with the existing model, the method has higher tracking capability, and the effect is shown in fig. 5 and 6.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims

1. A method for forecasting the silicon content of blast furnace molten iron aiming at uncertain time-lag information is characterized by comprising the following steps:

s2, carrying out sectional processing on the time sequence of each process parameter, obtaining the maximum information coefficient of the silicon content of the blast furnace molten iron of each parameter of each stage under different lag time, taking the maximum information coefficient as a correlation index, analyzing each parameter of each stage, selecting the lag time with the maximum correlation with the silicon content as the time lag of the current parameter of the current stage, finding out the maximum time lag and the minimum time lag aiming at the time lag of each stage of each parameter, and taking the maximum time lag and the minimum time lag as the time lag range of the current parameter to obtain the respective time lag range of each parameter;

s3, finding the time range of the process parameters acting on the silicon content at the current moment according to the time lag range of each parameter, solving the mean value and the variance of the process parameters acting in the time range, and taking the solved mean value and variance of each process parameter as characteristic variables;

s4, analyzing the acquired silicon content values to find a concentrated distribution range, dividing samples in the concentrated distribution range into large samples, dividing samples out of the concentrated distribution range into small samples, and determining sampling frequency and the proportion of the number of the acquired samples to the total number of the large samples, namely determining sampling ratio by adopting a grid search method to determine the number of the large samples after each iteration in the training process;

s5, according to the sampling frequency determined in the S4, repeatedly and randomly sampling the large samples after each specific iteration, taking the sampling results of the large samples and all the small samples as input samples of the gradient lifting tree model at the current stage, iterating by adopting a gradient descent algorithm, and stopping iteration and outputting the silicon content gradient lifting tree model until a prediction error meets a set error range;

and S6, calculating the real-time variance and mean value of the process parameters acquired on line by adopting a Welford algorithm according to the actual production process, and performing on-line silicon content prediction by adopting the gradient lifting tree model trained in the S5.

2. The method for forecasting the silicon content of the blast furnace molten iron aiming at the uncertain information of time lag as claimed in claim 1, further characterized by comprising the following steps: the S2 specifically comprises the following steps:

s21: obtaining the technological parameters of each time sequence and the maximum information coefficient MIC of the silicon content under the production state of the blast furnace;

3. The method for forecasting the silicon content of the blast furnace molten iron aiming at the uncertain information of time lag as claimed in claim 1, further characterized by comprising the following steps: s4 specifically adopts the following steps

s42: and searching the optimal large-class sample by adopting a method combining cross validation and a grid search method, and determining the sampling frequency and the sampling ratio of the large-class sample.

4. The method for forecasting the silicon content of the blast furnace molten iron aiming at time lag uncertain information as recited in claim 3, further characterized by comprising the following steps: s5 specifically adopts the following steps

s52, fitting a gradient lifting tree prediction model;

s53: determining a loss function of a gradient lifting tree prediction model, solving a negative gradient of the loss function of the current model, and taking the negative gradient as a residual error approximate value of the current least square tree;

s55: judging whether the current iteration times reach the iteration times corresponding to the sampling frequency obtained in the step S42, and if so, jumping to a step S52 to continue training the gradient lifting tree prediction model; otherwise, jumping to S51 to perform downsampling and updating training data, and then performing training until the termination condition is met;