MCS synthesis device reaction temperature prediction method based on machine learning model
Technical Field
The invention relates to the field of industrial device data mining, in particular to a reaction temperature prediction method of an MCS (modulation and coding scheme) synthesis device based on a machine learning model.
Background
Dimethyl dichlorosilane in methyl chlorosilane monomers is the most important raw material for synthesizing organic silicon materials, and the organic silicon materials play an important role in the tip field due to the special performance of the organic silicon materials. In the operation process of a Methyl Chlorosilane (MCS) synthesis device, the synthesis reaction of Methyl Chlorosilane (MCS) is a strong exothermic reaction, the thermal conductivity of reactant silicon powder is poor, and a reaction target product, namely, a first-class chlorosilane monomer, is easy to decompose at an excessively high temperature to generate high-boiling substances, otherwise, when the reaction temperature is excessively low, the target yield is excessively low, the fly ash amount is increased, so that the waste production cost of raw materials is increased, and therefore, the control of the reaction temperature is a key technology in the operation process of the Methyl Chlorosilane (MCS) synthesis device. Therefore, the prediction of the reaction temperature in the running process of the Methyl Chlorosilane (MCS) synthesis device has very important significance for the fluctuation of the reaction temperature in the stable running process, the fluctuation of the methyl chlorosilane monomer as a target substance and the extension of the whole reaction running period time.
The reaction temperature data in the running process of the MCS synthesis device is time sequence type data, the literature of the reaction temperature prediction method of the MCS synthesis device is less at present, and the prediction method of the time sequence type data mainly comprises the following types:
the first is to use ash prediction model for time series data prediction, usually using GM (1, 1) model. The method comprises the steps of taking time sequence historical data as an original sequence, accumulating and preprocessing the original sequence to generate a more regular new data sequence, establishing a first-order single-variable differential equation by using the new data sequence to find the rule of the new data sequence, obtaining a data response sequence by solving the differential equation, differentiating a calculation result to obtain an original sequence fitting value, and finally checking to ensure the prediction accuracy of a model. The method has the characteristics of small data demand, simple and convenient calculation, no regular distribution of original data and the like, but also has the defects of small fault tolerance, high influence of parameter estimation on robustness, inapplicability to long-term analysis and the like.
The second is to predict time series data by a time series method. The time sequence method is generally to arrange historical data according to time sequence to obtain a series of original sequences, extrapolate the sequences to predict future development trend of the sequences, and can rapidly analyze continuous change characteristics of time sequence data to reflect unidirectional linear connection of data objects.
The third is data prediction based on multiple regression analysis model. The method comprises the steps of assuming that a group of variable factors and target variables are associated, establishing a sample data set, taking a target value as a dependent variable and other factors as explanatory variables by the data set, estimating parameters of a regression model according to observed values of data set samples to obtain a regression equation, carrying out significance test on the regression equation, and carrying out data prediction according to the regression equation. The method can reflect the linear relation between the target variable and the explanatory variable, but the prediction accuracy is greatly influenced by the data quality of the related explanatory variable, and the nonlinear relation is difficult to reflect.
Disclosure of Invention
In order to solve the problems, the invention provides a reaction temperature prediction method of an MCS synthesis device based on a machine learning model.
The method for predicting the reaction temperature of the MCS synthesis device based on the machine learning model comprises the following steps:
screening variable factors related to reaction temperature from variables related to the operation of the MCS synthesis device;
acquiring historical data corresponding to variable factors related to the reaction temperature and preprocessing the historical data;
establishing an SVR reaction temperature change rate prediction model by adopting an epsilon-SVR machine learning algorithm based on the preprocessed historical data as training data;
and acquiring current data corresponding to variable factors related to the reaction temperature, inputting a trained SVR reaction temperature change rate prediction model, calculating to obtain a reaction temperature change rate predicted value of a future period, and accumulating the current reaction temperature value and the reaction temperature change rate predicted value to obtain the reaction temperature predicted value of the MCS synthesis device of the future period.
Preferably, the variable factor related to the reaction temperature includes:
the method comprises the following steps of conducting oil flow, fluidized bed density, silicon powder feeding rate, catalyst feeding rate, fluidized bed reactor top pressure, methyl chloride feeding rate, conducting oil inlet fluidized bed temperature change rate, conducting oil outlet fluidized bed temperature change rate, conducting oil inlet fluidized bed temperature, conducting oil outlet fluidized bed temperature and methyl chloride side spray gun pressure.
Preferably, the obtaining and preprocessing the historical data corresponding to the variable factor related to the reaction temperature includes:
and carrying out first-order differential processing on historical data corresponding to the reaction temperature change rate and the silicon powder feeding rate, and smoothing the data by adopting a moving average method on historical data corresponding to the heat conducting oil flow and the fluidized bed reactor top pressure so as to clean data noise.
Preferably, the obtaining and preprocessing the historical data corresponding to the variable factor related to the reaction temperature includes:
and carrying out data alignment processing on the historical data with time delay.
Preferably, the data alignment processing for the historical data with time delay includes:
adopting pearson correlation coefficient analysis to judge the residence time of each variable factor and the reaction temperature data;
calculating a time point with the strongest correlation between the reaction temperature and a factor variable in a correlation coefficient sequence of the set hysteresis time to be used as the hysteresis time;
historical data for the existence of time delays is aligned based on the delay time.
Preferably, the establishing the SVR reaction temperature change rate prediction model based on the preprocessed historical data as training data by using an epsilon-SVR machine learning algorithm includes:
setting up a insensitive loss function:
c(x,y,f(x))=|y-f(x)| ε
|y-f(x)| ε =max{0,|y-f(x)|-ε},
wherein epsilon is a set positive number, and when the difference between the observed value y of the x point and the predicted value f (x) does not exceed the preset epsilon, the predicted value f (x) of the point is considered to be lossless;
establishing a sample training set D as an input vector:
D={(x 1 ,y 1 ),(x 2 ,y 2 ),...,(x n ,y n )},x i ∈X,y i ∈R,(i=1,2,...,n),
wherein the historical data is defined as a data set X, the data corresponding to the reaction temperature change rate is defined as a data set R, and n is the total number of the data;
obtaining a regression function based on the sample training set D:
wherein x is i For support vectors in sample training set D, K (x, x i ) As a kernel function, alpha (*) And (2) the value of not less than 0 is Lagrange function multiplier, and b is model parameter.
Preferably, the obtaining the regression function based on the sample training set D includes:
and (3) non-linearly mapping the sample training set D to a high-dimensional Hilbert space H by using a non-linear mapping function phi, and performing linear regression in the high-dimensional space H, wherein under the nuclear function mapping, the training set corresponding to the H space of the sample training set D is:
D'={(z 1 ,y 1 ),(z 2 ,y 2 ),...,(z n ,y n )}={(φ(x 1 ),y 1 ),(φ(x 2 ),y 2 ),...,(φ(x 3 ),y n )},
a nonlinear mapping in which z is x; linear regression is performed in the high-dimensional feature space H, so that the effect of performing linear regression in the original dimension is obtained, and the estimation function is as follows:
f(x)=(w,φ(x))+b,
wherein w and b are model parameters;
punishment is performed on sample points falling outside epsilon-band by using punishment coefficient C, and the punishment is expressed as the following original optimization problem:
wherein, xi i ,Is a relaxation variable;
substituting w into the estimation function equation to obtain a regression function:
preferably, the method further comprises:
and selecting a plurality of historical data closest to the current time as verification data, wherein the root mean square error threshold value of the verification data is controlled within a set threshold value, otherwise, retraining the SVR reaction temperature change rate prediction model according to a batch of the historical data which is selected as training data until the root mean square error of a verification set is within the set threshold value.
Preferably, the number of times of re-selecting the training data is not greater than a set number of times, and if the root mean square error of the verification set is greater than a set threshold value, the data set with the smallest root mean square error of the verification set is selected as the training data.
Preferably, the method further comprises:
performing predictive value error correction according to the historical predictive error of the past set time;
and correcting the trend direction of the predicted value according to the temperature of the heat conducting oil entering the fluidized bed and the temperature of the heat conducting oil exiting the fluidized bed, which have the highest correlation with the reaction temperature.
By using the invention, the following effects can be achieved: screening variable factors related to reaction temperature from variables related to the operation of the MCS synthesis device; acquiring historical data corresponding to variable factors related to the reaction temperature and preprocessing the historical data; establishing an SVR reaction temperature change rate prediction model by adopting an epsilon-SVR machine learning algorithm based on the preprocessed historical data as training data; and acquiring current data corresponding to variable factors related to the reaction temperature, inputting a trained SVR reaction temperature change rate prediction model, calculating to obtain a reaction temperature change rate predicted value of a future period, and accumulating the current reaction temperature value and the reaction temperature change rate predicted value to obtain the reaction temperature predicted value of the MCS synthesis device of the future period. The stability and the change trend of the reaction temperature of the Methyl Chlorosilane (MCS) synthesis device can be analyzed by predicting the running reaction temperature of the Methyl Chlorosilane (MCS) synthesis device, and an operator can be controlled to control operation in advance by the prediction assistance device of the reaction temperature of the device so as to achieve the effect of optimizing the reaction temperature control.
Drawings
The invention will be described in further detail with reference to the drawings and the detailed description.
FIG. 1 is a schematic flow chart of a reaction temperature prediction method of an MCS synthesis device based on a machine learning model in an embodiment of the invention;
FIG. 2 is a schematic flow chart of a reaction temperature prediction method step S5 of an MCS synthesis device based on a machine learning model according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of a reaction temperature prediction method step S6 of an MCS synthesis apparatus based on a machine learning model according to an embodiment of the present invention.
Detailed Description
The technical scheme of the present invention will be further described with reference to the accompanying drawings, but the present invention is not limited to these examples.
The invention provides a reaction temperature prediction method of an MCS synthesis device based on a machine learning model. A Methyl Chlorosilane (MCS) device is a methyl chlorosilane monomer synthesis device based on a direct method, and the device is characterized in that methyl chloride is directly introduced into silicon powder containing a catalyst to perform gas-solid phase catalytic reaction to obtain methyl chlorosilane. According to the invention, the relation between variable factors such as the temperature change rate of the heat conducting oil, the flow rate of the heat conducting oil, the density of the fluidized bed, the feeding rate of silicon powder, the top pressure of a fluidized bed reactor, the feeding rate of methyl chloride below the fluidized bed and the reaction temperature change rate in the operation process of the device is analyzed, and the Support Vector Regression (SVR) model is established to predict the reaction temperature in the operation process of the Methyl Chlorosilane (MCS) synthesis device for a period of time in the future. The stability and the change trend of the reaction temperature of the Methyl Chlorosilane (MCS) synthesis device can be analyzed by predicting the running reaction temperature of the Methyl Chlorosilane (MCS) synthesis device, and an operator can be controlled to control operation in advance by the prediction assistance device of the reaction temperature of the device so as to achieve the effect of optimizing the reaction temperature control.
The embodiment of the invention provides a reaction temperature prediction method of an MCS synthesis device based on a machine learning model, which is shown in figure 1 and comprises the following steps:
s1: and (3) screening variable factors related to the reaction temperature from the variables related to the operation of the MCS synthesis device.
Before establishing the SVR reaction temperature change rate prediction model, the variable factors related to the reaction temperature are needed to be screened from the related variables of the operation of a plurality of devices, and based on the technology of the MCS synthesis device and the exploration and analysis of historical operation data, the variable factors considered in the establishment of the SVR reaction temperature change rate prediction model comprise: the method comprises the following steps of conducting oil flow, fluidized bed density, silicon powder feeding rate, catalyst feeding rate, fluidized bed reactor top pressure, methyl chloride feeding rate, conducting oil inlet fluidized bed temperature change rate, conducting oil outlet fluidized bed temperature change rate, conducting oil inlet fluidized bed temperature, conducting oil outlet fluidized bed temperature and methyl chloride side spray gun pressure.
S2: historical data corresponding to variable factors related to the reaction temperature are obtained and preprocessed.
The data preprocessing mainly aims at variable data which cannot be directly obtained such as reaction temperature change rate, silicon powder feeding rate, heat conducting oil temperature change rate and the like of the MCS synthesis device and variable data with large data noise such as heat conducting oil flow rate, fluidized bed density and the like, and mainly carries out first-order differential processing on the raw data for the variables such as the reaction temperature change rate, the silicon powder feeding rate and the like, and carries out smoothing processing on the data by adopting a moving average method for coarse data such as the heat conducting oil flow rate, the fluidized bed reactor jacking and the like so as to clean the data noise.
In the running process of the MCS synthesis device, different degrees of time delay exist on the influence of variable factors such as heat conduction oil flow, heat conduction inlet oil temperature, heat conduction oil outlet temperature, silicon powder feeding rate and the like on the reaction temperature, so that data alignment processing is needed for data with time delay before model training is carried out. According to the invention, the residence time of each variable factor and the reaction temperature data is judged by adopting the Pearson correlation coefficient analysis, the time point with the highest correlation is searched for as the hysteresis time by calculating the correlation coefficient sequence of the reaction temperature and the factor variable at the hysteresis time of 0-30 minutes, and then the data set is aligned based on the hysteresis time.
S3: and establishing an SVR reaction temperature change rate prediction model by adopting an epsilon-SVR machine learning algorithm based on the preprocessed historical data as training data.
The invention builds a model based on an epsilon-SVR algorithm, wherein epsilon-SVR is a method for solving the regression problem based on a Support Vector Machine (SVM), a linear regression equation is sought to fit all sample points, and the optimal hyperplane sought by the linear regression equation is to minimize the total variance of the sample points from the hyperplane.
The support vector machine is a machine learning algorithm based on statistical theory, can be well applied to classification and regression problems of high-dimensional data, and uses a subset of training data to represent decision boundaries, wherein the subset is called support vector, and the support vector regression model (SVR) is a method for solving the regression problems by using the support vector machine algorithm. SVR can be fine solve sample size little, nonlinear, high dimension, local minimum scheduling problem, still has advantages such as calculation speed is fast, model stability, generalization ability reinforce.
The structural risk replaces experience risk to be used as an expected risk by introducing an epsilon insensitive loss function, and the sparse property of the support vector machine is maintained; the insensitivity loss function is as follows:
c(x,y,f(x))=|y-f(x)| ε
|y-f(x)| ε =max{0,|y-f(x)|-ε},
wherein epsilon is a positive number which is taken in advance, and when the difference between the observed value y of the x point and the predicted value f (x) does not exceed the preset epsilon, the predicted value f (x) of the point is considered to be lossless.
According to the invention, data such as heat conducting oil flow rate, fluidized bed density, silicon powder feeding rate, catalyst feeding rate, fluidized bed reactor top pressure, methyl chloride feeding rate, heat conducting oil inlet fluidized bed temperature change rate, heat conducting oil outlet fluidized bed temperature change rate, heat conducting oil inlet fluidized bed temperature, heat conducting oil outlet fluidized bed temperature, methyl chloride side spray gun pressure and the like are defined as a data set X, reaction temperature change rate data are defined as a data set R, and a data sample training set D is established as an input vector:
D={(x 1 ,y 1 ),(x 2 ,y 2 ),...,(x n ,y n )},x i ∈X,y i ∈R,(i=1,2,...,n)
firstly, a sample training set D is mapped to a high-dimensional Hilbert space H in a nonlinear mode by utilizing a nonlinear mapping function phi, and then linear regression is carried out in the high-dimensional space H, wherein under the nuclear function mapping, the training set corresponding to the H space by D is:
D'={(z 1 ,y 1 ),(z 2 ,y 2 ),...,(z n ,y n )}={(φ(x 1 ),y 1 ),(φ(x 2 ),y 2 ),...,(φ(x 3 ),y n )},
wherein z is a nonlinear mapping of x; then, linear regression is carried out in the high-dimensional feature space H, so that the effect of linear regression in the original dimension is achieved, and the estimation function is as follows:
f(x)=(w,φ(x))+b,
wherein w and b are model parameters;
the invention adopts a soft epsilon-support vector regression method (epsilon-SVR), when a linear function is used for regressing sample points, most sample points are in epsilon-band, a small number of sample points are out of epsilon-band, and the sample points falling out of epsilon-band are punished by using a punishment coefficient C, which is expressed as the following original optimization problem (namely minimizing structural risks):
wherein, xi i ,Is a relaxation variable.
To solve the original optimization problem described above, it is converted into a dual problem using lagrangian (Lagrange) multiplier method. The Lagrange function was introduced:
wherein alpha is (*) ≥0;η (*) And more than or equal to 0 is Lagrange multiplier.
Solving Lagrange function about w, b and xi (*) Is a minimum value of (2). The extremum conditions are:
substituting the above into Lagrange function, the original optimization problem is converted into the following dual problem:
solving the dual problem to obtain alpha (*) . The parameter w, b can be calculated by:
selecting a certain alpha j >0 orAnd (b) calculating:
b=y i -(w·z j )-ε,
substituting w into the estimation function equation to obtain a regression function:
wherein x is i Called support vectors in training set D, K (x, x i ) As a kernel function, K (x, x i ) The Mercer theorem is satisfied depending only on the inner product of the nonlinear mapping function Φ. The invention adopts a linear kernel function as a kernel function:
K(x i ,x j )=x i ·x j 。
s4: and acquiring current data corresponding to variable factors related to the reaction temperature, inputting a trained SVR reaction temperature change rate prediction model, calculating to obtain a reaction temperature change rate predicted value of a future period, and accumulating the current reaction temperature value and the reaction temperature change rate predicted value to obtain the reaction temperature predicted value of the MCS synthesis device of the future period.
The current time of variable factor data such as heat conducting oil flow, fluidized bed density, silicon powder feeding rate, catalyst feeding rate, fluidized bed reactor top pressure, methyl chloride feeding rate, heat conducting oil inlet fluidized bed temperature change rate, heat conducting oil outlet fluidized bed temperature change rate, heat conducting oil inlet fluidized bed temperature, heat conducting oil outlet fluidized bed temperature, methyl chloride side spray gun pressure and the like are input into a trained SVR reaction temperature change rate prediction model, a reaction temperature change rate predicted value in the future time can be calculated and obtained, and the reaction temperature predicted value in the future time can be obtained by accumulating the current reaction temperature value and the reaction temperature change rate predicted value, wherein the predicted value length is dependent on the reaction temperature and the delay time length of variable factors.
In some embodiments, as shown in fig. 2, the present invention further comprises the steps of:
s5: and verifying the SVR reaction temperature change rate prediction model.
Because of the interference of uncertainty change of the silicon powder raw material and the reactivity, a set of optimization mechanism is formulated in the running process of the model. The method comprises the steps that a reaction temperature prediction model of an MCS synthesis device is built in a timing training mode in the prediction process, namely the model needs to be trained before each prediction, wherein training data are selected to be running data of 72 hours which are the latest in a running period, the interval time of each piece of data is 10 seconds, 30% of data which are the latest in the current time are selected as verification data in the training model, the root mean square error threshold of the verification data is controlled to be within 0.1, otherwise, a batch of 48-hour data are selected backwards for retraining according to 1 hour intervals until the root mean square error of a verification set is within 0.1, the number of times of the retraining of the training data is not more than 6 times, and if the root mean square error of the verification set is larger than 0.1, the data set with the smallest root mean square error of the verification set is selected as the training data.
In some embodiments, as shown in fig. 3, the present invention further comprises the steps of:
s6: and correcting the predicted reaction temperature value of the MCS synthesis device for a period of time in the future.
In order to improve the prediction accuracy and the directional reliability of the reaction temperature prediction model of the MCS synthesizer, it is necessary to correct the prediction value, and first, error correction is performed based on the past 10 minutes of the historical prediction error. And then correcting the direction of the predicted value trend according to the heat conducting oil inlet temperature and the heat conducting oil outlet temperature with the highest correlation with the reaction temperature, judging the direction of the predicted value when the heat conducting oil inlet temperature and the heat conducting oil outlet temperature show obvious change of the same trend, increasing the change rate weight of the predicted value with the same trend as the heat conducting oil temperature trend under the condition, and reducing the change rate weight of the predicted value with the opposite trend to the heat conducting oil temperature trend.
Those skilled in the art may make various modifications or additions to the described embodiments or substitutions thereof without departing from the spirit of the invention or exceeding the scope of the invention as defined in the accompanying claims.