CN114117891A

CN114117891A - Rotary kiln energy consumption prediction method based on deep belief network

Info

Publication number: CN114117891A
Application number: CN202111291054.4A
Authority: CN
Inventors: 周杭翱; 秦岭; 杨小健
Original assignee: Nanjing Tech University
Current assignee: Nanjing Tech University
Priority date: 2021-11-02
Filing date: 2021-11-02
Publication date: 2022-03-01

Abstract

The invention discloses a rotary kiln prediction method based on a deep belief network. And (3) directly training the multivariate data by adopting a DBN model of a nonlinear mapping function, learning the internal representation of the multivariate data, capturing the nonlinear correlation between the multivariate data and the internal representation of the multivariate data, and establishing a sliding window model of a time sequence input layer. On the basis, a multi-index energy consumption early warning model SW-DBN is established, and the multi-index energy consumption early warning model SW-DBN can learn the hierarchical structure of time-varying time delay and nonlinear characteristics. These features are useful for energy consumption prediction. And predicting the energy consumption of the rotary kiln in real time through a sliding window model and a multi-index energy consumption early warning model. For the cement production plan, the analysis of an energy consumption prediction system can be simulated, the most effective and cost-saving production parameters can be determined, the electricity and coal demand required in the operation process can be predicted in advance, the production efficiency is effectively improved, and the large error condition caused by only depending on the experience of workers in the past is reduced. Meanwhile, the method ensures that the rotary kiln can be in a stable running state for a long time, effectively improves the shutdown times of the rotary kiln, reduces the loss caused by shutdown maintenance, and can greatly increase the production benefits of enterprises.

Description

Rotary kiln energy consumption prediction method based on deep belief network

Technical Field

The invention relates to a prediction method based on a deep belief network in a neural network.

Background

The cement industry is dominated by electricity and fossil fuels and has the characteristic of high energy consumption. On a global scale, governments of various countries and public and private sectors are striving to seek ways to improve cement energy performance for energy safety, economic and environmental concerns. The effect of cement energy consumption prediction is discussed from the perspective of energy conservation and consumption reduction. On one hand, the power consumption and the coal consumption are key reference indexes of the energy consumption in the cement calcination process. Traditionally, the operator can only read the current energy consumption from the sensors, and cannot know the trend of the energy consumption. Generally, when the energy consumption of the calcination process is abnormal, the operator can adjust the production equipment by depending on experience. The operation has strong hysteresis, causes energy waste and is not beneficial to energy conservation and emission reduction in the production process. Therefore, operator adjustments to production equipment require urgent predictive information on energy consumption trends. On the other hand, for the cement production plan, the analysis of an energy consumption prediction system can be simulated, and the most effective and cost-saving production parameters can be determined. Accurate prediction of energy consumption provides important reference information for cement production planning and energy control. Therefore, the prediction of the energy consumption in the cement calcination process is of great significance. Because the cement calcination process is complicated and changeable, the relation among all variables is nonlinear. In addition, the raw materials are required to be completely calcined into clinker in about 50-60 minutes, so that delay time exists between the raw material feeding quantity data and the corresponding energy consumption data. While all other variables related to energy consumption also have time-lag characteristics. Due to different influences, the delay time of each variable is different and is dynamically changed. Thus, the calcination process has three main features: uncertainty, nonlinearity, and time-varying skew.

Traditionally, the data-driven models for solving nonlinear predictions are support vector machines, artificial neural networks, statistical regression, and the like. However, the variables associated with cement energy consumption are numerous and all delay times are too complex to calculate. In addition, the delay time during cement calcination is not consistent and is dynamically variable. It is difficult to calculate the delay time for all variables. Therefore, this method cannot achieve the best effect in the field of cement calcination. Frank et al teach how to predict time series using a sliding window technique that incorporates more information into the artificial neural network input layer. The time series are predicted using a sliding window technique that incorporates more information into the artificial neural network input layer. In the cement calcination process, not only the variables at the present time are related to energy consumption, but also the variables in the past time affect the trend of energy consumption. In this case, the use of sliding window technology to solve the cement calcination delay problem has certain advantages. In addition, in the energy optimization model, the electricity consumption prediction value and the coal consumption prediction value must be applied at the same time. If these energy consumption values are predicted separately, different time lengths may be required due to differences in data characteristics and complexity. In addition, the cement calcination process consumes both electricity and heat in the same step, most variables being related to electricity and coal consumption. Therefore, there is a coupling relationship between the used amount and the used amount. Therefore, the two indexes should be predicted synchronously in the same model. To our knowledge, most studies predicted only one of them. At present, no one can predict the electricity consumption and the coal consumption in the cement calcination process at the same time. Therefore, a multi-index energy consumption prediction model based on a sliding window deep confidence network is provided. Firstly, the model combines the past variable data and the current variable data into time series data through a sliding window method, so that the problem of complex time-varying and time-delaying research is avoided. And then establishing a deep confidence network capable of capturing nonlinear features among variables to learn time series information.

Disclosure of Invention

The invention aims to provide a prediction method aiming at the problem of energy consumption in the rotary kiln operation process.

In order to achieve the above object, the present invention employs a method based on a deep belief network, the method comprising the following steps:

the method comprises the following steps: and (3) directly training the multivariate data by adopting a DBN model of a nonlinear mapping function, learning the internal representation of the multivariate data, capturing nonlinear correlation between the multivariate data and the multivariate data, and finally establishing a sliding window model of a time sequence input layer.

Step two: and establishing a multi-index energy consumption early warning model on the basis of the time sequence of the step one. The SW-DBN is capable of learning a hierarchy of time-varying time-lags and non-linear features. These features are useful for energy consumption prediction.

Step three: and predicting the energy consumption of the rotary kiln in real time through a sliding window model and a multi-index energy consumption early warning model, and further feeding back and guiding the production process, wherein the core flow is shown as an attached figure 3, and an accurate prediction result is finally obtained through a training algorithm in the step one and a prediction algorithm in the step two.

2. The rotary kiln prediction method based on the deep belief network as claimed in claim 1, wherein the sliding window model of the time series input layer in the first step is divided into the following 2 steps:

the method comprises the following steps: each data sample of each time instance in the sliding window model of the time series input layer is independent, so the method ignores important time information inherent in the time series data. Therefore, this will result in a non-ideal prediction effect. To address this problem, a sliding window technique is applied to process each input variable. It can integrate data relating to the entire production cycle into the model at once. The operating conditions of the cement calcination process in normal operation are dynamically variable and the delay time of the variables is difficult to determine. All relevant information is contained in the time series as long as the variation range of the delay time does not exceed the size of the sliding window. Therefore, the SW-DBN model can still capture all the characteristics of the time-varying delay and obtain an accurate prediction result.

Step two: the input variable data being a matrix of values, x_i(t) is the value of the first variable at the time instant. Time series data s obtained by sliding window_i(t) the following:

s_i(t)＝[x_i(t-ω+1)，x_i(t-ω+2)，...x_i(t)] (1)

t, where T is the number of sample data. The time series data of the input layer comprises time-varying delay characteristics between the energy consumption variable and each input variable. Therefore, the time series data will provide more time series features for the DBN model.

3. The deep belief network-based rotary kiln prediction method as claimed in claim 1, wherein the SW-DBN training process in step two is divided into the following 2 steps:

the method comprises the following steps: bottom-up unsupervised greedy, layer-by-layer pre-training in the first section, the SW-DBN is composed of several stacked RBMs, as shown in FIG. 1. Visible layer V⁰And adjacent hidden layer H¹Form an RBM¹，H¹And h²Constitute RBM². In the same way, only one RBM network is trained from the bottom layer each time, and then the next RBM network is trained by fixing the trained RBM parameters. All RMBs were trained using a contrast divergence algorithm. Figure 2 shows the structure of an RBM. An RBM is composed of n visible units v ═ v₁，...v_n]And m hidden units h ═ h₁，...h_n]Composition of wherein v_iAnd h_jBinary states, a, of visible units i and hidden units j, respectively_iAnd b_jRespectively, are their deviations, w_ijAre real-valued weights between them. Each cell of the visible layer is connected to all cells of the hidden layer. There is no link between units on the same layer. By giving a visible layer, the activation probability of a hidden layer is obtained

The activation probability of the visible layer is found in the known hidden layer.

Where σ (x) is 1/1+ e^-xIs an activation function.

Given training input data, training an RBM means training samples by adjusting the parameter θ (a, b, w) against the divergence algorithm. The formula is as follows:

Δw_ij＝p(h_j＝1|v⁽⁰⁾)v_i ⁽⁰⁾-p(h_j＝1|v^(k))v_i ^k (4)

Δa_j＝v_i ⁽⁰⁾-v_i ^(k) (5)

Δb_j＝p(h_j＝1|v⁽⁰⁾)-p(h_j＝1|v^(k)) (6)

where k is the number of cycles typically 1. And training all RBMs layer by layer according to the method until the specified iteration times are reached, and finishing the bottom-up unsupervised layer-by-layer greedy training. Finally, initial parameters representing time-varying time lag and nonlinear characteristics are obtained in the pre-training process.

Step two: and according to the label of the output data Y, using a supervised backward fitting algorithm to carry out weight adjustment and error correction. Meanwhile, the feature hierarchy is associated with energy consumption data. As with most common training methods, the DBN is considered a traditional deep neural network, with gradient descent algorithms to update parameters. As the amount of data processed by the sliding window increases, the parameter size of the DBN becomes larger, and the computational complexity increases. Therefore, we replace the commonly used gradient descent method with Adam's algorithm. The Adam algorithm is well suited to handle large data and parameter problems. The method has good demonstration effect, and is superior to other SGD methods. Adam designs independent adaptive learning rates for different parameters by computing first and second moment estimates of the gradient. b1 and b2 are the exponential decay rates of the second moment estimates. f (w, b) whether the random objective function contains w and b parameters. At time step t, the gradient with respect to the random target is

Finally, under the weight adjustment value and the error correction of the fitting algorithm, the error range of the obtained predicted value and the actual data is greatly reduced.

The invention has the beneficial effects that: the energy consumption in the rotary kiln operation process is predicted by adopting a sliding window model and a supervised backward fitting algorithm, the electricity and coal demand required in the operation process is predicted in advance, preparation is made, the production efficiency is effectively improved, and the large error condition caused by only depending on the experience of workers in the past is reduced. Meanwhile, the method ensures that the rotary kiln can be in a stable running state for a long time, effectively improves the shutdown times of the rotary kiln, reduces the loss caused by shutdown maintenance, and can greatly increase the production benefit of enterprises.

Drawings

FIG. 1 is an architecture of a SW-DBN.

Fig. 2 is a structure of an RBM.

Fig. 3 is the overall flow of the algorithm.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

In order to achieve the above object, the present invention adopts a method based on a deep belief network, which comprises the following steps:

the method comprises the following steps: and (3) directly training the multivariate data by adopting a DBN model of a nonlinear mapping function, learning the internal representation of the multivariate data, capturing nonlinear correlation between the multivariate data and the multivariate data, and finally establishing a sliding window model of a time sequence input layer. Each data sample of each time instance in the sliding window model of the time series input layer is independent, so the method ignores important time information inherent in the time series data. Therefore, this will result in a non-ideal prediction effect. To address this problem, a sliding window technique is applied to process each input variable. It can integrate data relating to the entire production cycle into the model at once. The operating conditions of the cement calcination process during normal operation are dynamically variable, and the delay time of the variables is difficult to determine. All relevant information is contained as long as the variation range of the delay time does not exceed the size of the sliding windowIn a time series. Therefore, the SW-DBN model can still capture all the characteristics of the time-varying delay and obtain an accurate prediction result. The input variable data being a matrix of values, x_i(t) is the value of the first variable at the time instant. Time series data s obtained by sliding window_i(t) the following:

s_i(t)＝[x_i(t-ω+1)，x_i(t-ω+2)，...x_i(t)] (1)

Step two: and establishing a multi-index energy consumption early warning model on the basis of the time sequence of the step one. The SW-DBN is capable of learning a hierarchy of time-varying time-lags and non-linear features. These features are useful for energy consumption prediction. In the first section, the bottom up unsupervised greedy, layer-by-layer pre-training, the SW-DBN is composed of several stacked RBMs, as shown in FIG. 1. Visible layer V⁰And adjacent hidden layer H¹Form an RBM¹，H¹And h²Form an RBM². In the same way, only one RBM network is trained from the bottom layer each time, and then the next RBM network is trained by fixing the trained RBM parameters. All RMBs were trained using a contrast divergence algorithm. Figure 2 shows the structure of an RBM. An RBM is composed of n visible units v ═ v₁，...v_n]And m hidden units h ═ h₁，...h_n]Composition of wherein v_iAnd h_jBinary states, a, of visible units i and hidden units j, respectively_iAnd b_jRespectively, their deviation, w_ijAre real-valued weights between them. Each cell of the visible layer is connected to all cells of the hidden layer. There is no link between units on the same layer. By giving a visible layer, the activation probability of a hidden layer is obtained

Where σ (x) is 1/1+ e^-xIs an activation function.

Δw_ij＝p(h_j＝1|v⁽⁰⁾)v_i ⁽⁰⁾-p(h_j＝1|v^(k))v_i ^k (4)

Δa_j＝v_i ⁽⁰⁾-v_i ^(k) (5)

Δb_j＝p(h_j＝1|v⁽⁰⁾)-p(h_j＝1|v^(k)) (6)

where k is the number of cycles typically 1. And training all RBMs layer by layer according to the method until the specified iteration times are reached, and finishing the bottom-up unsupervised layer-by-layer greedy training. Finally, initial parameters representing time-varying time lag and nonlinear characteristics are obtained in the pre-training process. And (4) according to the label of the output data Y, using a supervised backward fitting algorithm to carry out weight adjustment and error correction. Meanwhile, the feature hierarchy is associated with energy consumption data. As with most common training methods, the DBN is considered a traditional deep neural network, with gradient descent algorithms to update parameters. As the amount of data processed by the sliding window increases, the parameter size of the DBN becomes larger, and the computational complexity increases. Therefore, we replace the commonly used gradient descent method with Adam's algorithm. The Adam algorithm is well suited to handle large data and parameter problems. The method has good demonstration effect, and is superior to other SGD methods. Adam designs independent adaptive learning rates for different parameters by computing first and second moment estimates of the gradient. b1 and b2 are the exponential decay rates of the second moment estimates. f (w, b) whether the random objective function contains w and b parameters. At time step t, the gradient with respect to the random target is

Claims

1. A rotary kiln energy consumption prediction method based on a deep belief network is characterized by comprising the following steps:

the method comprises the following steps: directly training the multivariate data by adopting a DBN model of a nonlinear mapping function, learning the internal representation of the multivariate data, capturing the nonlinear correlation between the multivariate data and the internal representation of the multivariate data, and finally establishing a sliding window model of a time sequence input layer;

step two: establishing a multi-index energy consumption early warning model on the basis of the time sequence of the step one, wherein the SW-DBN can learn the hierarchical structure of time-varying time delay and nonlinear characteristics, and the characteristics are favorable for energy consumption prediction;

step three: and predicting the energy consumption of the rotary kiln in real time through a sliding window model and a multi-index energy consumption early warning model, and then feeding back and guiding the production process, wherein the core flow is shown as figure 3, and an accurate prediction result is finally obtained through a training algorithm in the step one and a prediction algorithm in the step two.

2. The deep belief network-based rotary kiln prediction method as claimed in claim 1, wherein the sliding window model of the time series input layer in the step one is divided into the following 2 steps:

the method comprises the following steps: in order to solve the problem that each data sample of each time instance in the sliding window model of the time sequence input layer is independent, so that the method ignores important time information inherent in time sequence data, the prediction effect is not ideal, a sliding window technology is applied to process each input variable, the data related to the whole production period can be integrated into the model at one time, the operating condition of the cement calcination process in normal work is dynamically changed, the delay time of the variable is difficult to determine, and all related information is contained in the time sequence as long as the variation range of the delay time does not exceed the size of the sliding window, so that the SW-DBN model can still capture all characteristics of time-varying delay and obtain an accurate prediction result;

step two: the input variable data being a matrix of values, x_i(t) time series data s obtained by sliding window and the value of the first variable at time_i(t) the following:

s_i(t)＝[x_i(t-ω+1)，x_i(t-ω+2)，...x_i(t)] (1)

t, where T is the number of sample data, and the time-series data of the input layer includes time-varying delay characteristics between the energy consumption variable and each input variable, so that the time-series data provides more time-series characteristics for the DBN model.

3. The deep confidence network-based rotary kiln prediction method as claimed in claim 1, wherein the SW-DBN training process in the second step is divided into the following 2 steps:

the method comprises the following steps: bottom-up unsupervised greedy layer-by-layer pretraining in the first section, the SW-DBN is composed of several stacked RBMs, as shown in FIG. 1, visible layer V⁰And adjacent hidden layer H¹Form an RBM¹，H¹And h²Form an RBM²In the same way, only one RBM network is determined to be trained from the bottom layer each time, then the parameters of the trained RBM are fixed, the next RBM network is trained, all RMBs are trained by adopting a contrast divergence algorithm, and the attached figure 2 shows the RBM of one RBMStructure, one RBM is composed of n visible units v ═ v₁，...v_n]And m hidden units h ═ h₁，...h_n]Composition of wherein v₁And h_jBinary states, a, of visible units i and hidden units j, respectively_iAnd b_jRespectively, are their deviations, w_ijThe real value weight value between them, each unit of the visible layer is connected with all units of the hidden layer, there is no relation between units of the same layer, and the activation probability of the hidden layer is obtained by giving the visible layer

The activation probability of the visible layer is found in the known hidden layer,

where σ (x) is 1/1+ e^-xIs the function of the activation of the function,

given training input data, training an RBM means training samples by comparing the divergence algorithm tuning parameter θ ═ a, b, w, as shown below:

Δw_ij＝p(h_j＝1|v⁽⁰⁾)v_i ⁽⁰⁾-p(h_j＝1|v^(k))v_i ^k (4)

Δa_j＝v_i ⁽⁰⁾-v_i ^(k) (5)

Δb_j＝p(h_j＝1|v⁽⁰⁾)-p(h_j＝1|v^(k)) (6)

the number of cycles of k is usually 1, training all RBMs layer by layer according to the method until reaching the specified iteration times, completing unsupervised layer-by-layer greedy training from bottom to top, and finally obtaining initial parameters representing time-varying time lag and nonlinear characteristics in the pre-training process;

step two: according to the label of the output data Y, a supervised backward fitting algorithm is used for carrying out weight adjustment and error correction, meanwhile, the feature level is associated with energy consumption data, like most common training methods, a DBN is regarded as a traditional deep neural network, a gradient descent algorithm is adopted for updating parameters, as the data volume processed by a sliding window is increased, the parameter size of the DBN is increased, and the calculation complexity is increased, therefore, an Adam algorithm is used for replacing a common gradient descent method, the Adam algorithm is very suitable for processing the problem of large data and parameters, the method has a good empirical effect and is superior to other SGD methods, the Adam designs independent self-adaptive learning rates of different parameters by calculating first moment estimation and second moment estimation of the gradient, b1 and b2 are exponential attenuation rates of second moment estimation, and f (w, b) random objective function contains w and b parameters, at time step t, the gradient with respect to the random target is