WO2016101182A1

WO2016101182A1 - Interval type indicator forecasting method based on bayesian network and extreme learning machine

Info

Publication number: WO2016101182A1
Application number: PCT/CN2014/094839
Authority: WO
Inventors: 刘民; 宁克锋; 董明宇; 吴澄
Original assignee: 清华大学
Priority date: 2014-12-23
Filing date: 2014-12-24
Publication date: 2016-06-30
Also published as: CN104537033B; CN104537033A

Abstract

An interval type indicator forecasting method based on a Bayesian network and an extreme learning machine, which relates to the fields of automatic control, information technologies and advanced manufacturing, and particularly relates to learning of parameters of an asymmetric Gaussian distribution Bayesian ELM model and adaptive adjustment of asymmetric weights. The method is characterized by comprising the following steps: as for the characteristic of the uncertainty of a complex production process, describing production indicators by using interval numbers; using asymmetric Gaussian distribution as output distribution in an ELM model, and acquiring the Bayesian ELM model having the weights; and learning parameters of the Bayesian ELM model under an experience Bayesian frame by using actual running data in the complex production process; on the basis, learning a pair of reciprocal weights by using an adaptive adjustment method; and finally, acquiring a forecast value of the interval type indicators. By means of the interval type indicator forecasting method, production indicators in the practical production process can be forecast, and the interval type indicator forecasting method can be used for guiding operation optimization and dynamic scheduling in the production process.

Description

Interval indicator forecasting method based on Bayesian network and extreme learning machine

Technical field

The invention belongs to the fields of automatic control, information technology and advanced manufacturing, and particularly relates to a Bayesian network and an extreme learning machine (ELM)-based interval type for a complex industrial production process in which it is difficult to establish a mechanism model and has a large amount of historical production data. Indicator forecasting method.

Background technique

Production index forecasting is one of the key technologies involved in the operation optimization and dynamic scheduling of production processes. However, in the actual complex production process of steel, microelectronics and other industries, production data often contains various uncertainties, based on neural networks and support vectors. The forecast value of the index given by the conventional predictive model and the actual measured value of the index often have large deviations, which affects the operation optimization and dynamic scheduling effect. The use of interval-type index forecasting method is one of the effective ways to solve the above-mentioned index forecasting problem. .

Summary of the invention

The present invention is directed to a complex production process in which it is difficult to establish a mechanism model and has a large amount of historical production data, and proposes an interval type index prediction method based on Bayesian network and extreme learning machine (ELM). The invention aims at the uncertainty characteristics of complex production process, uses interval numbers to describe production indexes, utilizes actual operation data in complex production processes, uses asymmetric Gaussian distribution Bayesian and ELM methods to model interval indicators, and adopts The pair of mutually reciprocal weights are adaptively adjusted to obtain the upper boundary model and the lower boundary model as the forecast interval of the production index. The above-mentioned interval type indicator forecasting method can predict the production index in the actual production process, and is used to guide the operation optimization and dynamic scheduling of the production process.

An interval type index prediction method based on Bayesian network and extreme learning machine (ELM), characterized in that the method is implemented in the following steps:

Step (1): Data acquisition and preprocessing

Data acquisition system is used to collect data from actual industrial production processes, and the above data is processed into Training data:

x _i =(x _i,1 ,...,x _i,n )

Where x _i and t _i are the input and output of the i-th training sample, N is the number of training data samples, and n is the dimension of the input variable;

Step (2): Construct a double ELM model based on asymmetric Gaussian distribution Bayes

Step (2.1): The ELM model can be expressed as follows:

t=h(x)β+ε

Where h(x) is the hidden layer node function of ELM, β is the output layer weight, and ε is the model error;

Step (2.2): The output of the ELM model can be assumed to be an asymmetric Gaussian distribution as follows:

Where b is the variance parameter of the asymmetric Gaussian distribution, and w is the weight of the asymmetric Gaussian distribution;

Step (2.3): The likelihood function of the training data can be written as:

Where H ₁ and t ₁ are the hidden layer output matrix and the output vector of the sample set satisfying t<hβ, respectively, and H ₂ and t ₂ are the hidden layer output matrix and the output vector of the sample set satisfying t≥hβ, respectively;

Step (2.4): using a Gaussian prior distribution on the output weight β, ie

Where M is the number of hidden layer nodes, and a and β _k are parameters of the Gaussian distribution;

Step (2.5): Using a pair of reciprocal weights (w, 1/w), denoted as (w ₁ , w ₂ ), and appropriately adjust them to obtain two weighted Bayesian ELM models (ie double ELM based on asymmetric Gaussian distribution Bayesian):

p(t|a ₁ , b ₁ )=∫p(t|β ₁ , b ₁ , w ₁ )p(β ₁ |a ₁ )dβ ₁

p(t|a ₂ , b ₂ )=∫p(t|β ₂ , b ₂ , w ₂ )p(β ₂ |a ₂ )dβ ₂

Step (3): Initialization of a double ELM model based on asymmetric Gaussian distribution Bayesian

Step (3.1): Initialization of the ELM model

The number of selected input layer neural nodes is the same as the training sample dimension n, the number of output neural nodes is 1, and the number of hidden layer nodes of the single hidden layer limit learning machine is M;

The excitation function h(x,o _l ,r _l ) of the hidden layer node can adopt Gaussian function/Sigmoid function/sine function/triangle base function/Hard Limit function;

According to the original N samples

Training the limit learning machine, randomly determining the center o _l and the width r _{l of} each hidden layer node (when the excitation function h(x, o _l , r _l ) of the hidden layer node adopts a Gaussian function) or randomly determining each hidden layer node The weight o _l and the offset r _l (when the excitation function h(x, o _l , r _l ) of the hidden layer node uses the Sigmoid function / sine function / triangular basis function / Hard Limit function), l = 1, 2, ...M, using the ordinary extreme learning machine to calculate the initial value of the initial hidden layer output matrix H and the output layer connection matrix

among them,

Step (3.2): Initialization of the adaptive adjustment algorithm for weights (w ₁ , w ₂ )

The initialization weight w=w ₁ =w ₂ =1, set the prediction interval CI _trained =0, set the weight adjustment unit value to δ _w =0.05, set the minimum value of the weight w _min =0.001, set the learning rate of the weight For r _w =1, the stopping criterion for setting the weight ε _w =0.00001;

Step (4): Parameter learning of the Bayesian ELM model with weight w ₁ :

Step (4.1): Using the Bayesian formula, the posterior distribution p(β ₁ |t) can be expressed as follows:

make

Have

Where H _1,1 and t _1,1 are the hidden layer output matrix and output values corresponding to the training samples with ε<0 _, respectively _, and H _1,2 and t _1,2 are the hidden layers corresponding to the training samples with ε>0 _, respectively. Output matrix and output value, H ₁ =[H _1,1 ;H _1,2 ], t=[t _1,1 ;t _1,2 ];

Step (4.2): Using the Bayesian formula, the edge likelihood function p(t|a ₁ , b ₁ ) can be expressed as follows:

among them,

then,

Step (4.3): Order

Solutions have to,

among them,

Step (4.4): similar, order

Solutions have to,

Step (4.5): repeating steps (4.1), (4.2), and (4.3) until a ₁ and b ₁ converge;

Step (5): Parameter learning of the Bayesian ELM model with weight w ₂ :

This step is similar to step (4), and the conclusion is directly given here;

Step (5.1): Calculate the output weight of the ELM model using the following formula,

Where H _2,1 and t _2,1 are the hidden layer output matrix and output values corresponding to the training samples of ε<0, respectively, and H _2,2 and t _2,2 are the hidden layers corresponding to the training samples with ε>0, respectively. Output matrix and output value, H ₂ = [H _2,1 ; H _2,2 ], t=[t _2,1 ;t _2,2 ];

Step (5.2): Calculate a ₂ and b ₂ using the following formulas, respectively

among them,

Step (5.3): repeating steps (5.1) and (5.2) until a ₂ and b ₂ converge;

Step (6): Adaptive adjustment of weights (w ₁ , w ₂ )

Step (6.1): Calculate the average of the prediction interval of the upper bound model and the lower bound model:

Step (6.2): Calculate the difference between the average value of the prediction interval and the target value of the interval:

CI _err =CI _exp _{ected -CI} _trained

Step (6.3): According to the difference between the average value of the prediction interval of the interval model and the target value of the interval, the weight adjustment is performed as follows

w ^new =w-CI _err ×(ww _min )×δ _w

w ₁ =w ^new ,w ₂ =1/w ^new

Step (7): repeating step (4), step (5), and step (6) until CI _err satisfies the stop condition;

Step (8): On the basis of the completion of the above-mentioned model parameter learning, the interval type index prediction is performed as follows, assuming that the input variable is x,

Where t ₁ and t ₂ are the lower bound and upper bound of the predicted value of the interval type indicator, respectively;

DRAWINGS

Figure 1: Block diagram of the algorithm for the interval-based indicator prediction method based on Bayesian network and extreme learning machine.

Fig. 2 is a graph showing the comparison between the model output and the actual output for the prediction of the molten steel temperature in the LF production process. The abscissa is the sample number, the blue small dot on the ordinate is the actual molten steel temperature value, and the green curve and the red curve are the predicted upper bound value and the predicted lower bound value of the prediction model, respectively.

Fig. 3 is a diagram showing the weight adaptive adjustment process and the corresponding prediction interval change diagram of the present invention for the prediction of the molten steel temperature in the LF production process. The abscissa is the number of iterations of the model learning, and the blue curve and the red curve in the ordinate are the adaptive adjustment processes of the weights of the upper bound model and the lower bound model respectively, and the green curve is the corresponding predicted interval value in the adjustment process.

detailed description

In order to verify the application effect of the interval-based interval extreme learning machine modeling method on the processing interval number modeling problem, the present invention has done a lot of simulation experiments. Due to the limited space, only the above method is given in a steel mill LF production. Detailed implementation steps for the prediction of the process steel temperature and the prediction of the thickness of the chemical mechanical polishing process in a microelectronics factory:

(1) Prediction of molten steel temperature in refining furnace

The first step: refinery production data collection

Collect production data between every two measurements of molten steel, the previous molten steel temperature measurement, ladle condition, heating gear position, heating time, treatment interval time, argon blowing flow rate, wall temperature, flue gas temperature, flue gas flow rate and As the input, the ambient temperature and the like were taken as the output, and a total of 579 training data were obtained. Step 2: Conduct AB-TELM model training

According to the initialization method given by the step (3) in the specification, the Bayesian ELM model of the weight w ₁ in the AB-TELM model (hereinafter referred to as the upper bound model) and the Bayesian ELM model of the weight w ₂ (hereinafter referred to as the lower bound model) The parameter and the parameters in the weight adaptive algorithm are initialized; on the basis of the initialization, the upper bound model and the given w ₁ and w ₂ are given according to steps (4) and (5) in the specification respectively. Parameter learning of the lower bound model; using the method in the specification of the present invention, adaptively adjusting w ₁ and w ₂ according to step (6); repeating the parameter learning process of the upper bound model, the lower bound model, and the self of w ₁ and w ₂ Adapt to the adjustment process until the model converges. The optimal hidden layer node's excitation function and hidden layer node number need to be determined by cross-validation method.

The third step: using the AB-TELM model for interval index prediction

In the actual industrial production process, the data acquisition system is used to collect the actual industrial production data of the refining furnace site, and the data is processed into the input data required by the AB-TELM model according to the first step of processing the training data, and the test samples are obtained. 578, and then use the AB-TELM model parameters obtained in the second step to calculate the interval type index prediction value according to step (8).

The actual effect diagram is shown in the figure below. Figure (2) is the prediction result of the model when the interval is 10 degrees. The red curve represents the lower bound value of the temperature prediction, and the green curve represents the upper bound value of the temperature prediction. It can be seen from Fig. 2 that in the prediction results of the AB-TELM model, the predicted values of the upper bound model are larger than the predicted values of the lower bound model, and most of the actual data are located in the prediction interval of the AB-TELM model, indicating The feasibility of the model. Figure (3) is its corresponding weight adaptive adjustment process and its corresponding prediction interval change graph, in which the green curve is the change process of the prediction interval, and the blue curve is the adaptive adjustment process of the lower bound model weight w ₁ , the red curve The adaptive adjustment process for the upper bound model weight w ₂ . It can be seen from the diagram (3) that after setting the expected prediction interval to 10 degrees, the lower bound model weight w ₁ and the upper bound model weight w ₂ can be self-according to the error between the actual predicted interval value of the model and the expected prediction interval. Adapt to the adjustment, and after 10 steps of iteration, can achieve the desired prediction interval value. Table 1 compares the simulation results of the proposed algorithm AB-TELM with the common ELM and the dual model based on the support vector machine (including the linear kernel TSVR-1 and the Gaussian kernel TSVR-g). The performance index is the mean square error. (RMSE). In Table 1, #Nodes is the number of hidden layer nodes of the ELM category model, and C and ε are the error penalty coefficients and insensitive coefficients of the TSVR category model. It can be seen from Table 1 that the test accuracy of AB-TELM is greatly improved compared with the ELM, TSVR-1, and TSVR-g models, indicating the effectiveness of the AB-TELM model proposed by the present invention.

(2) Prediction of grinding thickness of microelectronic chemical mechanical grinding process

The first step: refinery production data collection

Collect the grinding time, grinding thickness, product type of each wafer, and the inspection standard value information of the grinding equipment, and group the data according to the product variety information. In each group of data, the grinding time and grinding equipment inspection standard will be The value is used as the model input data, and the grinding thickness is used as the model output data, and a total of 1276 training data are obtained.

Step 2: Conduct AB-TELM model training

The third step: using the AB-TELM model for interval index prediction

In the actual industrial production process, the data acquisition system is used to collect the actual industrial production data of the CMP site, and the data is processed into the input data required by the AB-TELM model according to the first step of processing the training data, and the test sample is obtained. Then, using the AB-TELM model parameters obtained in the second step, the interval type index prediction value is calculated according to step (8).

The performance comparison of AB-TELM and other models on the microelectronic CMP film thickness prediction problem is shown in Table 2. It can be seen from the table that the performance of TSVR-1 is significantly worse than AB-TELM and TSVR-g. In addition, from the simulation time performance, AB-TELM is significantly better than TSVR-l and TSVR-g.

Table 1 Comparison of performance of AB-TELM and other models in prediction of molten steel temperature in refining furnace

Table 2 Comparison of performance of AB-TELM and TSVR interval models in microelectronic CMP film thickness prediction

	AB-TELMAB-TELM	TSVR-lTSVR-l	TSVR-gTSVR-g
	AB-TELMAB-TELM	TSVR-lTSVR-l	TSVR-gTSVR-g	RMSERMSE	171.3005171.3005	245.113245.113	172.072172.072
仿真时间(秒)Simulation time (seconds)	3.5119673.511967	25.3891825.38918	32.9090532.90905	RMSERMSE	171.3005171.3005	245.113245.113	172.072172.072

Claims

An interval type index modeling method based on a Bayesian network and an extreme learning machine, wherein the method is implemented in the following steps:

Step (1): Data acquisition and preprocessing

The data acquisition system is used to collect data from the actual industrial production process, and the above data is processed into the following training data:

x i =(x i,1 ,...,x i,n )

Where x i and t i are the input and output of the i-th training sample, N is the number of training data samples, and n is the dimension of the input variable;

Step (2): Construct a double ELM model based on asymmetric Gaussian distribution Bayes

Step (2.1): The ELM model can be expressed as follows:

t=h(x)β+ε

Where h(x) is the hidden layer node function of ELM, β is the output layer weight, and ε is the model error;

Step (2.2): The output of the ELM model can be assumed to be an asymmetric Gaussian distribution as follows:

Where b is the variance parameter of the asymmetric Gaussian distribution, and w is the weight of the asymmetric Gaussian distribution;

Step (2.3): The likelihood function of the training data can be written as:

Where H 1 and t 1 are the hidden layer output matrix and the output vector of the sample set satisfying t<hβ, respectively, and H 2 and t 2 are the hidden layer output matrix and the output vector of the sample set satisfying t≥hβ, respectively;

Step (2.4): using a Gaussian prior distribution on the output weight β, ie

Where M is the number of hidden layer nodes, and a and β k are parameters of the Gaussian distribution;

Step (2.5): Using a pair of reciprocal weights (w, 1/w), denoted as (w 1 , w 2 ), and appropriately adjusting them, can obtain Bayesian based on asymmetric Gaussian distribution Double ELM:

p(t|a 1 , b 1 )=∫p(t|β 1 , b 1 , w 1 )p(β 1 |a 1 )dβ 1

p(t|a 2 , b 2 )=∫p(t|β 2 , b 2 , w 2 )p(β 2 |a 2 )dβ 2

Step (3): Initialization of a double ELM model based on asymmetric Gaussian distribution Bayesian

Step (3.1): Initialization of the ELM model

The number of selected input layer neural nodes is the same as the training sample dimension n, the number of output neural nodes is 1, and the number of hidden layer nodes of the single hidden layer limit learning machine is M;

The excitation function h(x,o l ,r l ) of the hidden layer node can adopt Gaussian function/Sigmoid function/sine function/triangle base function/Hard Limit function;

According to the original N samples
The training limit learning machine randomly determines the parameters o l and r l , l=1, 2, . . . M of the excitation function of each hidden layer node, and calculates the initial hidden layer output matrix H and the output layer connection matrix by using an ordinary extreme learning machine. Initial value
among them,

Step (3.2): Initialization of the adaptive adjustment algorithm for weights (w 1 , w 2 )

The initialization weight w=w 1 =w 2 =1, set the prediction interval CI trained =0, set the weight adjustment unit value to δ w =0.05, set the minimum value of the weight w min =0.001, set the learning rate of the weight For r w =1, the stopping criterion for setting the weight ε w =0.00001;

Step (4): Parameter learning of the Bayesian ELM model with weight w 1 :

Step (4.1): Using the Bayesian formula, the posterior distribution p(β 1 |t) can be expressed as follows:

make
Have

Where H 1,1 and t 1,1 are the hidden layer output matrix and output values corresponding to the training samples with ε<0 , respectively , and H 1,2 and t 1,2 are the hidden layers corresponding to the training samples with ε>0 , respectively. Output matrix and output value, H 1 =[H 1,1 ;H 1,2 ], t=[t 1,1 ;t 1,2 ];

Step (4.2): Using the Bayesian formula, the edge likelihood function p(t|a 1 , b 1 ) can be expressed as follows:

among them,

then,

Step (4.3): Order

Solutions have to,

among them,

Step (4.4): similar, order

Solutions have to,

Step (4.5): repeating steps (4.1), (4.2), and (4.3) until a 1 and b 1 converge;

Step (5): Parameter learning of the Bayesian ELM model with weight w 2 :

This step is similar to step (4), and the conclusion is directly given here;

Step (5.1): Calculate the output weight of the ELM model using the following formula,

Where H 2,1 and t 2,1 are the hidden layer output matrix and output values corresponding to the training samples of ε<0, respectively, and H 2,2 and t 2,2 are the hidden layers corresponding to the training samples with ε>0, respectively. Output matrix and output value, H 2 = [H 2,1 ; H 2,2 ], t=[t 2,1 ;t 2,2 ];

Step (5.2): Calculate a 2 and b 2 using the following formulas, respectively

among them,

Step (5.3): repeating steps (5.1) and (5.2) until a 2 and b 2 converge;

Step (6): Adaptive adjustment of weights (w 1 , w 2 )

Step (6.1): Calculate the average of the prediction interval of the upper bound model and the lower bound model:

Step (6.2): Calculate the difference between the average value of the prediction interval and the target value of the interval:

CI err =CI expected -CI trained

Step (6.3): According to the difference between the average value of the prediction interval of the interval model and the target value of the interval, the weight adjustment is performed as follows

w new =w-CI err ×(ww min )×δ w

w 1 =w new ,w 2 =1/w new

Step (7): repeating step (4), step (5), and step (6) until CI err satisfies the stop condition;

Step (8): On the basis of the completion of the above-mentioned model parameter learning, the interval type index prediction is performed as follows, assuming that the input variable is x,

Where t 1 and t 2 are the lower bound and upper bound of the predicted value of the interval type indicator, respectively;
According to the previous interval index forecasting method based on Bayesian network and extreme learning machine, according to the actual problem of molten steel temperature prediction in refining furnace, the temperature range prediction of refining furnace based on Bayesian network and extreme learning machine is further proposed. Method; the method takes the actual refining furnace molten steel temperature between the previous temperature measurement of the previous molten steel measurement temperature, ladle condition, heating gear position, heating time, treatment interval time, argon blowing flow rate, wall temperature, flue gas temperature The flue gas flow rate and the ambient temperature are used as model input training data, and the latter measured temperature value is used as the model to output the training data, and the interval type index forecasting model based on Bayesian network and extreme learning machine is trained, and the training is good. The model can be used for the prediction of molten steel temperature; the method is implemented on the computer by the following steps:

Step (1): Collect data between each temperature measurement of each molten steel. In each set of data, measure the temperature of the previous molten steel, the condition of the ladle, the heating gear position, the heating time, the treatment interval time, the argon blowing flow rate, The wall temperature, the flue gas temperature, the flue gas flow rate and the ambient temperature are used as model input training data, and the latter molten steel measurement temperature is used as the model output data;

Step (2): selecting the number of input node neural nodes, outputting the number of neural nodes, the number of hidden layer nodes of the single hidden layer limit learning machine, the excitation function of the hidden layer node, the asymmetric weight, and the interval target value;

Step (3): using the interval type index prediction method based on the Bayesian network and the extreme learning machine of claim 1, and training with the data collected in the step (2), thereby obtaining a molten steel temperature prediction model of the refining furnace.
The invention is based on the previous interval type index prediction method based on Bayesian network and extreme learning machine, and further proposes a Bayesian network based on the actual problem of the wafer grinding thickness prediction in the microelectronic chemical mechanical polishing process. The chemical mechanical polishing thickness interval prediction method; the method is to input the training data of the actual microelectronic chemical mechanical polishing process for each wafer grinding time and the grinding equipment inspection standard value, and the wafer grinding thickness is used as the model output training. The data is trained on the interval-based indicator prediction model based on Bayesian network and extreme learning machine, and the trained model can be used for interval prediction of grinding thickness. The method is implemented on the computer by the following steps:

Step (1): collecting the grinding time, the grinding thickness, the product type of each wafer, and the inspection standard value information of the grinding equipment, and grouping the data according to the product variety information, and in each group of data, the grinding time is The grinding equipment inspection standard value is used as the model input data, and the grinding thickness is used as the model output data;

Step (2): selecting the number of input node neural nodes, outputting the number of neural nodes, the number of hidden layer nodes of the single hidden layer limit learning machine, the excitation function of the hidden layer node, the asymmetric weight, and the interval target value;

Step (3): The interval type index prediction method based on the Bayesian network and the extreme learning machine of claim 1 is used, and the data collected in the step (2) is used for training, thereby obtaining a microelectronic chemical mechanical polishing thickness prediction model.