CN115049123A

CN115049123A - Prediction method for silicon content of molten iron in blast furnace based on GA-XGboost model

Info

Publication number: CN115049123A
Application number: CN202210641526.2A
Authority: CN
Inventors: 王德全; 田铁磊; 刘燕军; 李涛; 邓勇; 李丽红; 杨佳毅; 王艾军; 王杰; 张书楼; 张通亮
Original assignee: North China University of Science and Technology; Delong Steel Ltd
Current assignee: North China University of Science and Technology; Delong Steel Ltd
Priority date: 2022-06-07
Filing date: 2022-06-07
Publication date: 2022-09-13

Abstract

A blast furnace molten iron silicon content prediction method based on a GA-XGboost model comprises the following steps: firstly, collecting historical smelting data of a blast furnace; standardizing the data set; dividing the data into different clusters; fourthly, eliminating the characteristic variables with the relation numbers larger than the set values, and dividing the data into a training set and a testing set; utilizing the data in the training set to train the GA-XGboost model; testing the trained GA-XGboost model by using the data concentrated by the test; and seventhly, predicting the content of the silicon in the molten iron of the blast furnace by using a GA-XGboost model which is qualified in test. The method utilizes the genetic algorithm to optimize and improve on the basis of the XGboost algorithm, and divides the prediction data set into a plurality of data subsets through the KMeans + + algorithm before prediction, so that high prediction accuracy can be obtained under the complicated and changeable blast furnace conditions, and the prediction efficiency is greatly improved.

Description

Prediction method for silicon content of molten iron in blast furnace based on GA-XGboost model

Technical Field

The invention relates to a method for predicting the silicon content of blast furnace molten iron, which can improve the prediction efficiency and ensure the accuracy of a prediction result and belongs to the technical field of metal smelting.

Background

At present, the combination of blast furnace smelting and machine learning fields is in a vigorous development stage, the combination of the two is beneficial to making the production parameters of the blast furnace more accurate and accelerating the automatic process of smelting, and the advantages make the blast furnace smelting development trend in the future certainly. However, the difference between China and the international level is still large, the prediction technology in the blast furnace smelting process is still not mature in all aspects, the problems of mismatch between a mechanical model and the actual situation, poor prediction precision and generalization capability exist in the aspect of prediction of the silicon content of the blast furnace, and particularly, no efficient and mature prediction method and model exist in the aspect of prediction of the silicon content under the condition of large coal injection smelting, so that the silicon content and the furnace temperature state of the blast furnace cannot be fed back in time after the furnace material structure and the blast furnace operation parameters change, and the operation of the blast furnace is unstable.

The traditional big data prediction model usually adopts a neural network or a statistical method to predict a target result, although the predicted value and the true value are good in approximation effect, the prediction process of the model has inexplicability and the influence degree of input parameters in a blast furnace on the silicon content prediction result is difficult to judge; the latter applies a mature statistical analysis method to enable the prediction result of the data to have interpretability, but when the data set is too large, the prediction efficiency is greatly reduced, and in the face of complex and variable blast furnace conditions, higher prediction accuracy cannot be obtained. Therefore, it is necessary to find an efficient and accurate method for predicting the silicon content of the molten iron in the blast furnace.

Disclosure of Invention

The invention aims to provide a blast furnace molten iron silicon content prediction method based on a GA-XGboost model aiming at the defects of the prior art so as to improve the prediction efficiency of the blast furnace molten iron silicon content and ensure the accuracy of the prediction result.

The problem of the invention is solved by the following technical scheme:

a blast furnace molten iron silicon content prediction method based on a GA-XGboost model comprises the following steps:

firstly, collecting historical smelting data of a blast furnace, and preprocessing the collected data set;

standardizing the data set;

thirdly, dividing the data in the data set into different clusters through a KMeans + + clustering algorithm;

analyzing the correlation among the characteristic parameters in each cluster by using a Pearson correlation coefficient, eliminating characteristic variables of which the correlation number is larger than a set value, and dividing the data in each cluster into a training set and a test set;

utilizing the data in the training set to train the GA-XGboost model;

testing the trained GA-XGboost model by using the data concentrated by the test;

and seventhly, predicting the content of the silicon in the molten iron of the blast furnace by using a GA-XGboost model which is qualified in test.

According to the prediction method for the silicon content of the molten iron of the blast furnace based on the GA-XGboost model, the GA-XGboost model is trained by using data in a training set, the characteristic of population search of a genetic algorithm is utilized, the parameter value of the XGboost is used as an individual of the genetic algorithm, the currently optimized parameter value is transmitted to the XGboost for prediction in a set parameter combination interval, the result is used as a parameter of a fitness function of the genetic algorithm for multiple iterations, and the optimal parameter combination of the XGboost is finally obtained, and the method specifically comprises the following steps:

a. setting each initial parameter and selectable parameter interval of the genetic algorithm;

b. the fitness of the current model parameter is calculated by adopting a genetic algorithm, and the calculation formula of a fitness function is as follows:

where fitness represents the fitness, m represents the number of samples of the sub data set,

representing the true value of the ith sample in the test set;

c. setting the number of retained parents, selecting data with highest fitness as the retained parents, and randomly crossing genes of the two parents to generate new filial generations;

d. forming a new individual by randomly mutating a single gene of all the individuals of the filial generation, and taking the new individual as a parent of the next iteration of the genetic algorithm;

e. and (d) repeating the steps b to d until the specified iteration times or fitness reaches the specified requirement.

According to the prediction method for the silicon content of the blast furnace molten iron based on the GA-XGboost model, when the GA-XGboost model is trained by utilizing data in a training set, the number of parents is kept to be set to be 3, the genes of the parents are crossed by adopting a uniform crossing method, and each gene of a filial generation is independently selected from the parents to be crossed based on the independent characteristics of the filial generation.

According to the method for predicting the silicon content of the blast furnace molten iron based on the GA-XGboost model, when all individuals of filial generation form new individuals through randomly mutating a single gene, the mutation is to randomly select a parameter in a selectable range to replace the original parameter, and only one gene of the filial generation is changed during each mutation.

The prediction method of the silicon content of the blast furnace molten iron based on the GA-XGboost model comprises the following steps of preprocessing the collected data set, including processing of vacancy values and processing of abnormal values: eliminating the sample by adopting a subtraction method for data with more than half of the vacancy value of the characteristic parameter, and filling the vacancy value with the average value of data in a week before and after the vacancy value for the rest of data with the vacancy value; screening and cleaning abnormal values through a box type graph;

the prediction method of the content of the silicon in the molten iron of the blast furnace based on the GA-XGboost model is characterized in that a data set is standardized by using the following formula:

where X is the data before normalization, X is the data after normalization, μ is the data mean, and σ is the data standard deviation.

The method for predicting the silicon content of the molten iron in the blast furnace based on the GA-XGboost model comprises the following specific steps of dividing data in a data set into different clusters through a KMeans + + clustering algorithm:

a data set X containing n t-dimensional data is set as X ₁ ，x ₂ ，…，x _n }(x _i ∈R _t ) Division into a plurality of non-intersecting clusters, where R _t Representing t-dimensional data, x _i The ith t-dimension data, the number of clusters is determined by the size of the contour coefficient, and the calculation formula of the contour coefficient s is as follows:

wherein b is the average Euclidean distance between the data and the non-local cluster data, a is the average Euclidean distance between the data and other data in the local cluster, and when the contour coefficient is maximum when dividing into k clusters, the k clusters are divided into kClustering:

X _j denotes the jth cluster, j ═ 1,2, …, k, x _ji Representing ith t-dimensional data in jth cluster, n _j Indicating the number of t-dimensional data in the jth cluster.

According to the prediction method for the silicon content of the molten iron of the blast furnace based on the GA-XGboost model, when characteristic variables with the correlation coefficient larger than a set value are removed, the set value of the correlation coefficient is 0.9, when data in each cluster are divided into a training set and a testing set, the ratio of the number of the data in the training set to the number of the data in the testing set is 7: 3.

according to the prediction method of the silicon content of the blast furnace molten iron based on the GA-XGboost model, collected historical smelting data of the blast furnace comprise air temperature, air volume, oxygen content, oxygen enrichment rate, coke ratio, coal ratio, pressure difference, top pressure, top temperature, transparent finger, coal GAs CO content, coal GAs CO2 content, slag SiO2 content, slag binary alkalinity, slag ternary alkalinity, slag quaternary alkalinity, utilization coefficient, molten iron sulfur content, air-blowing kinetic energy, sintered ore silicon content, common pellet silicon content, acid pellet silicon content, lump ore 1 silicon content, lump ore 2 silicon content, magnesium acid pellet silicon content and calcium carbonate content, wherein seven characteristic parameters related to furnace burden proportion are respectively the sintered ore silicon content, the common pellet silicon content, the acid pellet silicon content, the lump ore 1 silicon content, the lump ore 2 silicon content, the magnesium acid pellet silicon content and the calcium carbonate content.

According to the prediction method of the content of molten iron and silicon in the blast furnace based on the GA-XGboost model, the collected data set relates to characteristic parameters of raw materials entering the blast furnace to form a sparse matrix, and after the data in the data set are divided into different clusters, a plurality of characteristic parameters related to charge mixture ratio are compressed into one-dimensional characteristics through a PCA algorithm: the silicon content of the furnace charge is improved so as to improve the prediction precision of the silicon content of the molten iron of the blast furnace.

The method utilizes the genetic algorithm to optimize and improve on the basis of the XGboost algorithm, and divides the prediction data set into a plurality of data subsets through the KMeans + + algorithm before prediction, thereby not only eliminating the influence of furnace conditions on the silicon content prediction, but also obtaining higher prediction accuracy under complicated and variable blast furnace conditions and greatly improving the prediction efficiency.

Drawings

The present invention will be described in further detail with reference to the accompanying drawings.

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic diagram showing random crossing of genes in two parents, wherein FIG. 2(a) shows a state before crossing and FIG. 2(b) shows a state after crossing.

The symbols in the text are: fitness represents the fitness, m represents the number of samples of the sub data set,

the true value of the ith sample in the test set is represented, X is the data before normalization, X is the data after normalization, μ is the data mean, σ is the data standard deviation, R _t Representing t-dimensional data, x _i Is the ith t-dimensional data, s is the contour coefficient, b is the average Euclidean distance between the data and the non-local cluster data, a is the average Euclidean distance between the data and other data in the local cluster, and X _j Denotes the jth cluster, x _ji Represents ith t-dimensional data in jth cluster, n _j Indicating the number of t-dimensional data in the jth cluster.

Detailed Description

The invention provides a prediction method of the silicon content of blast furnace molten iron based on a GA-XGboost model, which is characterized in that a genetic algorithm is utilized to carry out optimization and improvement on the basis of an XGboost algorithm, and a prediction data set is divided into a plurality of data subsets according to the similarity degree through a KMeans + + algorithm before prediction so as to distinguish the blast furnace condition in the smelting process and improve the accuracy of a prediction result. Because the environment of a blast furnace hearth is severe, the online measurement of the temperature of molten iron is difficult to realize, smelting data is not fed back timely, and a corresponding regulation and control measure cannot be taken in advance to stabilize a blast furnace thermal system, and a plurality of mechanism models and data-driven models need to meet strong mathematical assumptions, so that a better prediction result can be obtained when the blast furnace condition is stable and smooth, but when the furnace condition is unstable, the prediction result often has larger deviation from the actual condition. The invention aims to solve the problem of predicting the silicon content in blast furnace smelting and realize accurate prediction of the silicon content under various blast furnace conditions.

The method comprises the following steps:

s1, collecting historical smelting data of a blast furnace, wherein each sample comprises m sample characteristics, and preprocessing the collected data set.

Specifically, 26 variables of air temperature, air quantity, oxygen enrichment ratio, coke ratio, coal ratio, pressure difference, top pressure, top temperature, permeable index, coal gas CO content, coal gas CO2 content, slag SiO2 content, slag binary alkalinity, slag ternary alkalinity, slag quaternary alkalinity, utilization coefficient, molten iron sulfur content, blast kinetic energy, sintered ore silicon content, common pellet silicon content, acid pellet silicon content, lump ore 1 silicon content, lump ore 2 silicon content, magnesium acid pellet silicon content and calcium carbonate content are selected as input characteristics of the prediction model.

The preprocessing process comprises two parts, namely, a vacancy value and an abnormal value, wherein the vacancy value adopts a subtraction method and a filling method, the vacancy value of the characteristic parameter exceeds more than half of the number, the sample is eliminated by the subtraction method, and the filling method is to fill the vacancy value by using the average value of data of a week before and after the vacancy value; outliers were screened and cleaned by boxplot.

S2, the data set is normalized after being preprocessed, and is subjected to centering according to a data mean value mu and then is zoomed according to a data standard deviation sigma, so that the data set follows a normal distribution, namely the normalization is performed by using the following formula:

where X is the data before normalization and X is the data after normalization.

S3, dividing data in the data set into different clusters through a KMeans + + clustering algorithm, eliminating the influence of different furnace conditions on silicon content prediction, improving the accuracy of a prediction result, and dividing a data set X containing n t-dimensional data into { X ═ X + ₁ ，x ₂ ，…，x _n }(x _i ∈R _t ) Divided into a plurality of non-intersecting clusters, whichIn R _t Representing t-dimensional data, x _i Is the ith t-dimensional data. The number of clusters is determined by the size of the contour coefficients, which are formulated as follows:

where b is an average euclidean distance (euclidean distance) between the data and the data not in the cluster, and a is an average euclidean distance between the data and other data in the cluster. When the contour coefficient is maximum when the partition is performed into k clusters, the partition is performed into k clusters:

wherein X _j Denotes the jth cluster, j ═ 1,2, …, k, x _ji Representing ith t-dimensional data in jth cluster, n _j Representing the number of t-dimensional data in the jth cluster, each corresponding to a furnace condition.

S4, the characteristic parameters of the data set relating to the blast furnace charging raw material are a sparse matrix, which influences the accuracy of a prediction model, so that seven characteristic parameters related to furnace burden proportion, including sinter silicon content, common pellet silicon content, acid pellet silicon content, lump ore 1 silicon content, lump ore 2 silicon content, magnesium acid pellet silicon content and calcium carbonate content, are compressed into one-dimensional characteristics through a PCA algorithm: and (4) the silicon content of the furnace charge.

And S5, analyzing the correlation among the characteristic parameters of the divided clusters by using a Pearson correlation coefficient, eliminating the characteristic variables with the correlation number larger than 0.9, and taking the residual characteristic parameters as input values of a prediction model. And the data in each cluster is determined according to the following relation of 7: the ratio of 3 is divided into a training set and a test set.

And S6, training the GA-XGboost model by using the data in the training set in each cluster. The method uses the characteristic of genetic algorithm group search, takes the parameter value of the XGboost as an individual of the genetic algorithm, transmits the currently optimized parameter value to the XGboost for prediction from a set parameter combination interval, and takes the result as a parameter of a fitness function of the genetic algorithm for multiple iterations to finally obtain the optimal parameter combination of the XGboost.

The method comprises the following specific steps:

b. the fitness of the current model parameter is calculated by a genetic algorithm, and the calculation formula of a fitness function is as follows:

representing the true value of the ith sample in the test set.

c. By setting the number of the reserved parents, the parent with the highest fitness is selected as the reserved parent, and the genes (data) of the two parents are randomly crossed to generate a new offspring (see fig. 2).

d. And (4) randomly mutating a single gene of all the individuals of the filial generation to form a new individual, and taking the new individual as a parent of the next iteration of the genetic algorithm.

e. And (e) repeating the steps b to d until the specified iteration times or fitness is completed to reach the specified requirement.

In step a, the initial parameter is set as a random value in a set parameter interval. Seven parameters which mainly affect the XGboost algorithm are selected for adjustment, the parameters are used as genes of a population of the genetic algorithm, the optimal parameters are searched in modes of cross variation and the like, and the training time of the algorithm is shortened due to the characteristic of high convergence speed.

In the step c, the number of the reserved parent individuals is set to be 3, and the number of the reserved parents is set to be 3 through experimental verification, so that the difference between the parent and the child is not reduced due to the fact that the number of the reserved parents is too high, the convergence speed is reduced, the training time is prolonged, and the difference between the parent and the child is not too large due to the fact that the number of the reserved parents is too small, and the optimal value cannot be converged easily.

In step c, the gene crossing of the parent adopts a uniform crossing method, and each gene of the offspring is independently selected from the parent for crossing based on the independent characteristics of the genes. In step d, mutation is to randomly select a parameter in a selectable range, to introduce the diversity of offspring by changing the value through random quantity, and to change only one gene of the offspring in each mutation. The cross variation mode acts on the filial generation and generates difference with the parent generation, not only the individuals with high fitness in the parent generation are reserved, but also the optimization is continuously carried out to iterate the optimal value, and the difficulty that the genetic algorithm falls into the local optimal value in the iterative convergence process is avoided due to the addition of the mutation.

And S7, inputting the data of the test set into the trained GA-XGboost model to obtain a group of predicted values, drawing a visual graph by comparing the predicted values with real values, testing the trained GA-XGboost model, and observing the model effect.

And S8, predicting the content of the molten iron silicon in the blast furnace by using a GA-XGboost model which is qualified in testing.

Experiments show that after the sample set is distinguished into different furnace conditions through a clustering algorithm, the hit rate of the prediction result is remarkably improved, the hit rate can reach 100% under the condition that the error is less than 0.1, and the hit rate can also reach more than 90% on average under the condition that the error is less than 0.05. Compared with the prediction result of the furnace condition which is not distinguished, the hit rate can be effectively improved after the furnace condition is distinguished, and meanwhile, the accuracy of the prediction result is improved after the sample set is divided according to the similarity degree. In addition, the XGboost model optimized by the genetic algorithm can be converged to the optimal condition quickly, the process of manually adjusting parameters is reduced, and the prediction efficiency of the model is improved.

The GA-XGboost prediction method not only enables prediction results to have good interpretability, but also has the problems that a huge data set can complete prediction within an acceptable time range, the accuracy is high, and manual parameter adjustment is reduced.

At present, the domestic blast furnace smelting technology of the large coal injection is still in a development stage, and the data accumulation degree cannot be compared with the traditional smelting method. Due to the lack of a mature technical scheme in the process of applying the large coal injection blast furnace smelting technology, the situation in the blast furnace is more complicated and changeable than that under the traditional smelting technology due to the increase of the coal ratio, and the change of the blast furnace operation parameters is influenced. The characteristic parameters of the prediction data set are greatly different due to the problems, and the data in the prediction data set are divided into a plurality of sub data sets (clusters) according to the similarity degree through a KMeans + + algorithm so as to distinguish different conditions in the blast furnace. Similar characteristic parameters can be clustered together by a method for distinguishing furnace conditions, and the accuracy of a prediction result is greatly improved.

Interpretation of terms:

XGboost (eXtreme Gradient boosting) is a Gradient boosting algorithm and a residual decision tree, and the basic idea is as follows: one tree is gradually added into the model, and the whole effect (the objective function is reduced) is improved when each CRAT decision tree is added. A plurality of decision trees (a plurality of single weak classifiers) are used to form a combined classifier, and each leaf node is given a certain weight.

The GA-XGboost is an optimized and improved XGboost algorithm through a genetic algorithm.

KMeans + +, KMeans + + is a clustering algorithm modified from the KMeans algorithm.

Claims

1. A blast furnace molten iron silicon content prediction method based on a GA-XGboost model is characterized by comprising the following steps:

standardizing the data set;

utilizing the data in the training set to train the GA-XGboost model;

2. The method for predicting the silicon content of the molten iron of the blast furnace based on the GA-XGboost model as claimed in claim 1, wherein the training of the GA-XGboost model by utilizing data in a training set is characterized in that the characteristic of population search of a genetic algorithm is utilized, the parameter value of the XGboost is taken as an individual of the genetic algorithm, the currently optimized parameter value is transmitted to the XGboost for prediction in a set parameter combination interval, the result is taken as a parameter of a fitness function of the genetic algorithm for multiple iterations, and finally the optimal parameter combination of the XGboost is obtained, and the method comprises the following specific steps:

representing the true value of the ith sample in the test set;

3. The method for predicting the silicon content of the molten iron of the blast furnace based on the GA-XGboost model as claimed in claim 2, wherein when the GA-XGboost model is trained by using data in a training set, the number of parents is kept to be 3, a uniform crossing method is adopted for gene crossing of the parents, and each gene of filial generations is independently selected from the parents for crossing based on independent characteristics of each gene.

4. A prediction method of Si content in molten iron of blast furnace based on GA-XGboost model as claimed in claim 3, wherein when all the individuals of the filial generation are transformed into new ones by randomly mutating single gene, the mutation is to randomly select a parameter to replace the original parameter in a selectable range, and each mutation changes only one gene of the filial generation.

5. A prediction method of Si content in molten iron of a blast furnace based on GA-XGboost model according to claims 1-4, characterized in that the pre-processing of the collected data set comprises the processing of vacancy values and abnormal values: eliminating the sample by adopting a subtraction method for data with more than half of the vacancy value of the characteristic parameter, and filling the vacancy value with the average value of data in a circle before and after the vacancy value for the rest data with the vacancy value; outliers were screened and cleaned by boxplot.

6. A prediction method of the content of molten iron and silicon in a blast furnace based on a GA-XGboost model according to claim 5, characterized in that the data set is normalized by the following formula:

7. The prediction method of the content of silicon in molten iron of the blast furnace based on the GA-XGboost model as claimed in claim 6, wherein the specific method for dividing the data in the data set into different clusters by the KMeans + + clustering algorithm is as follows:

where b is an average euclidean distance between the data and non-local cluster data, a is an average euclidean distance between the data and other data in the local cluster, and when the contour coefficient is maximum when dividing into k clusters, the k clusters are divided:

8. The method for predicting the silicon content of the molten iron of the blast furnace based on the GA-XGboost model according to claim 7, wherein when characteristic variables with the correlation coefficient larger than a set value are removed, the set value of the correlation coefficient is 0.9, and when data in each cluster are divided into a training set and a testing set, the ratio of the number of the data in the training set to the number of the data in the testing set is 7: 3.

9. the method for predicting the silicon content of the molten iron of the blast furnace based on the GA-XGboost model as claimed in claim 3, wherein the collected historical smelting data of the blast furnace comprise air temperature, air volume, oxygen enrichment ratio, coke ratio, coal ratio, pressure difference, top pressure, top temperature, permeability index, CO content of GAs, CO2 content of GAs, SiO2 content of slag, binary basicity of slag, ternary basicity of slag, quaternary basicity of slag, utilization coefficient, sulfur content of molten iron, air-blasting kinetic energy, silicon content of sintered ore, silicon content of common pellets, silicon content of acid pellets, silicon content of lump ore 1, silicon content of lump ore 2, silicon content of magnesium acid pellets and calcium carbonate content, wherein seven characteristic parameters related to furnace burden proportioning are respectively the silicon content of sintered ore, the silicon content of common pellets, the silicon content of acid pellets, the silicon content of lump ore 1, the silicon content of lump ore 2, the silicon content of magnesium acid pellets and the silicon content of calcium carbonate, Calcium carbonate content.

10. The method for predicting the content of molten iron and silicon in the blast furnace based on the GA-XGboost model as claimed in claim 9, wherein the collected data set relates to characteristic parameters of raw materials entering the blast furnace to form a sparse matrix, and after the data in the data set is divided into different clusters, a plurality of characteristic parameters related to charge mixture ratio are compressed into one-dimensional characteristics through a PCA algorithm: the silicon content of the furnace charge is improved so as to improve the prediction precision of the silicon content of the molten iron of the blast furnace.