CN101807046A

CN101807046A - Online modeling method based on extreme learning machine with adjustable structure

Info

Publication number: CN101807046A
Application number: CN 201010119408
Authority: CN
Inventors: 刘民; 李国虎; 董明宇; 吴澄
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2010-03-08
Filing date: 2010-03-08
Publication date: 2010-08-18
Anticipated expiration: 2030-03-08
Also published as: CN101807046B

Abstract

The invention discloses an online modeling method based on an extreme learning machine with adjustable structure, belongs to the fields of automatic control, information technology and advanced manufacturing, in particular to a method for adjusting the structure and parameters of an extreme learning machine to hold newly acquired data in the online learning process of the extreme learning machine. The method is characterized by comprising the following steps: defining the concept of a category ball; judging whether the newly acquired data is out of the category ball and leads to reduction of modeling accuracy or not in every learning process; if yes, adding a new hidden node; if not, only adjusting the center and the radius of the category ball, and updating the weight of the output layer of the extreme learning machine at last. The method firstly introduces the concept of the category ball for holding the used data in a previous training process, when determining the parameters of the newly added hidden node, the output of the node at a point nearest to the category ball is made to small enough so as to guarantee the output value of the node on the used data is zero, and a formula for updating the weight of the output layer is given. The method can enhance the online modeling accuracy by adding the hidden node.

Description

Online modeling method based on structure-adjustable extreme learning machine

Technical Field

The invention belongs to the fields of automatic control, information technology and advanced manufacturing, and particularly relates to a method for adjusting the structure and parameters of an extreme learning machine in the online learning process of the extreme learning machine to accommodate newly acquired data.

Background

In many modeling environments oriented to actual industrial process detection, control and optimization, data required for modeling often have the characteristic of arriving sequentially. In view of the above, the academia and the industry have proposed online modeling methods (or online learning methods), such as RAN, RANEKF, MRAN, GAP-RBF, GGAP-RBF, which can adjust model structures and parameters online according to newly generated data to accommodate new data information without re-modeling for all acquired data. However, most of the methods have the defects of more parameters to be adjusted, low training speed and the like, and the actual application effect of the methods is seriously influenced. The recently emerging OS-ELM method, although reducing the to-be-adjusted parameter to one, lacks structural adjustability, so that its ability to accommodate new information is relatively limited and the model accuracy cannot be further improved.

Disclosure of Invention

In order to solve the problem of online modeling, the invention provides an online modeling method (SAO-ELM for short) based on a structure-adjustable extreme learning machine. In the SAO-ELM, the basic structure of the network is the same as that of an ELM (Extreme Learning Machine) network, but the number of hidden nodes thereof can be adjusted in the online modeling process. The main difficulty of adding hidden nodes in the modeling process is that the training goal of the SAO-ELM is to minimize the error sum of the adjusted model with respect to all training data, but in each online learning process, the previously used training data must be discarded, which makes the output of the newly added hidden nodes on those discarded data unknown. To this end, the present invention defines a ball classification concept that encompasses all the used training data and records and updates the center and radius of the ball at any time based on the new data. When the hidden layer node is added, the excitation function is selected as a Gaussian function, and then the center and the width of the appropriate excitation function are selected, so that the output of the node at the closest point to the class sphere is small enough, and the output of the newly added hidden layer node on the discarded data can be regarded as 0. Under the conditions, an iterative output layer weight updating formula when hidden layer nodes are added can be deduced, and therefore online modeling based on the structure adjustable extreme learning machine is achieved.

An online modeling method based on a structure-adjustable extreme learning machine is characterized by being realized according to the following steps:

step (1): model selection and parameter initialization

Setting the number M of hidden layer nodes of a single hidden layer extreme learning machine, wherein the number of input layer nodes is the same as the dimension n of a training sample, and the number of output nodes is the same as the dimension M of a training target;

excitation function G (a) of hidden node_i，b_iX) determining the center a of each hidden node randomly by adopting a Gaussian function_iAnd width b_i，i＝1，2，…M；

Based on the first N samples

Training extreme learning machine to obtain initial hidden layer output matrix H₀And output layer connection matrix beta₀Wherein

β₀＝(H₀ ^TH₀)^-1H₀T₀

the matrix K is initialized so that

ComputingPreservation of beta₀、K₀And P₀；

Enclosing the initial training sample set X with a classification ball O₀So that the ball just will be X₀All the sample points in (a) are enclosed in the sphere, and the sphere center C of the sphere is determined₀And a radius R₀；

Step (2): online learning process

Newly added training data x₁＝(x_N+1，t_N+1) At the time of arrival, the ELM is trained as follows, so that X can be stored₀Can accommodate x₁New knowledge contained

Step (2.1): maintaining the network structure unchanged according to x only₁Adjusting output layer connection matrix beta₀The updated output layer weight connection matrix is beta₁Updating the matrix K simultaneously₀And P₀Is K₁And P₁，

P_{1} = K_{1}^{- 1} = P_{0} - P_{0} H_{1}^{T} {(I + H_{1} P_{0} H_{1}^{T})}^{- 1} H_{1} P_{0}

K_{1} = K_{0} + H_{1}^{T} H_{1}

Wherein H₁And T₁New sample x for ELM pairs, respectively₁Hidden layer output matrix and training target matrix of (i.e. training target matrix)

H₁＝[G(a₁，b₁，x_N+1)…G(a_M，b_M，x_N+1)]_1×M

Step (2.2): ELM after calculating adjustment parameters is used for newly added sample x₁E, judging a new sample x₁If the ball is not outside the ball O and e is larger than the set threshold value, abandoning all the adjustments, and turning to the step (2.3), otherwise, turning to the step (3);

step (2.3): is increased by oneHidden node, setting its center a as x₁The width b is determined by

Wherein epsilon is a preset threshold value, x_cThe coordinates of the closest point on the classification sphere O to the new sample point x1 can be determined by

x_c＝x_o+λ₁(x_a-x_o)

Wherein x is_oIs the sphere center coordinate, λ, of the sphere O₁Can be determined by

X in the above formula_aI.e. new sample point x₁The coordinates of (a); readjusting output layer connection matrix beta₀Is beta₁And updating the matrix K accordingly₀And P₀Is K₁And P₁So that

Wherein H₀₁And H₁₁Respectively adding hidden layer nodes to the original sample set X₀And new sample point x₁Hidden layer output matrix of, i.e.

N₁And L are each x₁The number of the sample points and the number of the nodes of the newly added hidden layer are taken into consideration, and N is the number of the new sample points which arrive one by one₁＝L＝1；

And (3): updating parameters of category ball O

Updating the parameters of the ball O of the classification, i.e. its centre coordinates and radius, so that a new ball O₁Just about X₀And x₁All the sample points in (1) are enclosed therein, and the update formula is as follows:

R_{new} = \frac{| | x_{a} - x_{b} | |}{2}

x_{o_new} = \frac{x_{a} + x_{b}}{2}

wherein,x_aand x_bAre respectively new sample points x₁Coordinate of (2) and ball O upper distance x₁The coordinate of the farthest point, x_bCan be calculated from the following formula,

x_b＝x_o+λ₂(x_o-x_a)

wherein x is_oIs the coordinate of the center of sphere O, λ₂Can be calculated from the following formula,

according to the online modeling method, a large number of simulation tests are carried out, and as can be seen from simulation results, the online modeling method provided by the invention has higher learning precision compared with other online modeling methods, and a model established by the method also has better generalization performance.

Drawings

FIG. 1: the invention provides an algorithm flow chart, which is a specific implementation step of the online modeling method based on the structure adjustable extreme learning machine.

FIG. 2: a schematic diagram of a ball class enclosing all data used in the training process, wherein the small ball class O is the ball class enclosing all used data except the newly added training data, and the large ball class O₁Is a category ball that encloses all used data and new data.

FIG. 3: a gaussian function diagram, where the peak in the middle is its output at the center of the gaussian function and the smaller output values at the edges are its outputs farther away from the center of the gaussian function.

FIG. 4: and in the simulation experiment, the relationship between the training precision and the verification precision along with the change of the number of hidden nodes is shown, wherein a red curve is the relationship between the training precision and the change of the number of hidden nodes, and a green curve is the relationship between the verification precision and the change of the number of hidden nodes.

FIG. 5: an online modeling process schematic diagram in a simulation experiment, wherein fig. 5.1 is a variation relation of verification precision along with the increase of training data, and fig. 5.2 is a variation relation of hidden node number along with the increase of training data.

Detailed Description

The online modeling method based on the structure-adjustable extreme learning machine has the main advantage that the network structure can be adjusted according to needs in the online modeling process. According to the characteristics of on-line modeling, if new training data arrives, learning is carried out, otherwise, the existing model is used for prediction, and the prediction precision can be gradually improved along with the increase of the training data.

The steps involved in the online modeling method based on the structure-adjustable extreme learning machine provided by the invention are explained in detail as follows:

first step, model selection

For the method proposed by the present invention, model selection only involves determining the number M of initial ELM hidden nodes. The invention adopts a cross validation method to determine the number of the initial hidden layer nodes: dividing initial training data into two parts, wherein one part is used for training, and the other part is used for verification; training ELM by using training data from a smaller hidden node number, then obtaining verification error by using verification data, then gradually increasing the number of hidden nodes, repeating the training and verification steps, and finally selecting the hidden node number which enables the verification error to be minimum as the initial hidden node number.

Second step, model initialization

The initialization of the model is the initialization of the model parameters. The method provided by the invention adopts an ELM network structure, and the excitation function of the hidden node adopts a Gaussian function, so that the parameter to be initialized firstly has the center a of the Gaussian function_iAnd width b_i，i＝1，2，…M，a_iAnd b_iRespectively selected from random numbers according to a specific distribution. Second, the number of samples initially engaged in training is determined. In the invention, for the model for classifying the problem, the initial training sample number is selected to be M + 100; for the model used for the regression problem, M +50 was chosen. The principle of determining the number of initial training samples is such that H₀Column full rank. Finally, a classification ball S with the smallest radius is used₀Enclosing the initial training data, recording the center and radius of the sphere as C₀And R₀。

Thirdly, determining the initial value of the training process data

The initial data of training process includes hidden output matrix H, output layer connection matrix beta, intermediate matrix K and its inverse matrix K for calculating output layer connection matrix^-1。

If the initial training data is

The corresponding hidden layer output matrix is

Solving an objective function of the output layer connection matrix beta as min (| F-T) according to an Empirical Risk Minimization (ERM) principle₀‖)＝min(‖H_0(N×M)β_(M×m)-T_0(N×m)‖)，Wherein, T₀For training purposes, i.e.

From the knowledge of the matrix, the solution of the above optimization problem is easily solved as

Wherein

Is H₀The pseudo-inverse of the matrix. When matrix H₀When the column is full rank, i.e. rank (H)₀) When M, there are

For the convenience of derivation, an intermediate matrix K is introduced such that K equals H^TH, then

Let P be K^-1Then, then

Fourthly, fitting the newly added data by only adjusting the weight of the ELM output layer without adjusting the ELM structure

If the newly added data is

(in fact, in this method N₁Selecting as 1; and N is₁The case where the value is more than 1 can be converted into N₁Case processing equal to 1), the corresponding hidden layer output matrix and training target are respectively:

at this time, the objective function of the new output layer connection matrix β is found to be

Get a solution to the above optimization problem

To achieve the goal of on-line modeling, β₁Must be in contact with H₀、T₀Is not relevant, but only with respect to P₀、K₀、β₀As a function of (c). As can be derived by a simple mathematical derivation,

K_{1} = K_{0} + H_{1}^{T} H_{1}

the two equations are the ELM training algorithm under the condition that the ELM structure is not adjusted and only the parameters are adjusted.

Fifthly, judging the new data X of the ELM obtained by the training method in the fourth step₁Whether the training error above meets the requirements or not and whether the data X is judged or not₁Whether or not it is in the category ball S₀And (c) out. If the training error does not meet the requirement, and X₁At S₀Otherwise, turning to the sixth step, otherwise, turning to the seventh step.

Sixthly, adding a hidden layer node, and then adjusting the output layer connection weight

In the case of adding a hidden node, the hidden output matrix becomes

[\begin{matrix} H_{0} & H_{01} \\ H_{1} & H_{11} \end{matrix}]

Wherein H₀₁Data set X used for training process for newly adding hidden nodes₀Output of (H)₁₁For newly added hidden layer nodes with respect to newly added and unused data set X₁To output of (c). Accordingly, the objective function of finding a new output layer connection matrix β becomes

But due to the used data set X₀Has been discarded before the hidden node is added, which results in H₀₁It is not known that the requirement for taking beta requires that H be treated first₀₁An unknown problem.

As can be seen from the schematic diagram of the gaussian function (fig. 3), the output of the hidden node away from its center can be considered to be 0. If it isData set X₀And X₁The schematic diagram is shown in FIG. 2, in which S₀The surrounding is the data used in the training process, and the point A is the position of the newly added data. Therefore, if the center of the node of the newly added hidden layer is selected as the point A, the output at the point C is small enough, that is, the requirement is met

<math><mrow><msup><mi>e</mi><mfrac><mrow><mo>-</mo><mo>|</mo><mo>|</mo><msub><mi>x</mi><mi>c</mi></msub><mo>-</mo><mi>a</mi><mo>|</mo><mo>|</mo></mrow><mi>b</mi></mfrac></msup><mo>≤</mo><mi>ϵ</mi><mo>&DoubleRightArrow;</mo><mi>b</mi><mo>≤</mo><mo>-</mo><mfrac><mrow><mo>|</mo><mo>|</mo><msub><mi>x</mi><mi>c</mi></msub><mo>-</mo><mi>a</mi><mo>|</mo><mo>|</mo></mrow><mrow><mi>ln</mi><mi>ϵ</mi></mrow></mfrac></mrow></math>

Where ε is a preselected threshold value, x_cIs the coordinate of point C, and can be determined by the following two formulas

x_c＝x_o+λ₁(x_a-x_o)

Wherein x is_oAs a classification ball S₀Center of sphere coordinate of (1), x_aIs the coordinate of point A, and R is a category sphere S₀A radius.

After the center a and the width b of the newly added hidden layer node are selected according to the method, the output of the newly added hidden layer node at the point C is smaller than a very small real number epsilon which is in a category sphere S₀The inner output is much smaller to be considered as 0, so H₀₁Can be considered as a 0 matrix. At this time, the hidden layer output matrix after adding the hidden layer node is

Solving the optimization problem can be obtained

If order

Then

At this time, the process of the present invention,

order to

Wherein each element of the matrix A is

A₁₁＝(H^TH)^-1+(H^TH)^-1(H^TδH)×R^-1(δH^TH)(H^TH)^-1

A₁₂＝-(H^TH)^-1(H^TδH)R^-1，

A₂₂＝R^-1

Wherein R is δ H^TδH-(δH^TH)(H^TH)^-1(H^Tδ H), substituting the expressions of H and δ H into:

A_{11} = {(K_{0} + H_{1}^{T} H_{1})}^{- 1} + {(K_{0} + H_{1}^{T} H_{1})}^{- 1} (H_{1}^{T} H_{11}) R^{- 1} (H_{11}^{T} H_{1}) {(K_{0} + H_{1}^{T} H_{1})}^{- 1}

A_{12} = - {(K_{0} + H_{1}^{T} H_{1})}^{- 1} (H_{1}^{T} H_{11}) R^{- 1},

A_{21} = A_{12}^{T},

A₂₂＝R^-1

R = H_{11}^{T} H_{11} - (H_{11}^{T} H_{1}) {(K_{0} + H_{1}^{T} H_{1})}^{- 1} (H_{1}^{T} H_{11})

when the hidden nodes are added by combining the above formulas, K, P and the updated formula of β are:

K_{1} = [\begin{matrix} K_{0} + H_{1}^{T} H_{1} & H_{1}^{T} H_{11} \\ H_{11}^{T} H_{1} & H_{11}^{T} H_{11} \end{matrix}],

P_{1} = K_{1}^{- 1} = [\begin{matrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{matrix}]

A_{21} = A_{12}^{T},

A₂₂＝R^-1

P_{0} = K_{0}^{- 1}

seventh, the category ball S is updated₀Is S₁So that S₁Comprises S₀Data X in (1)₀And newly added data X₁

As can be seen from FIG. 2, S₁Should be at the midpoint of A and BThe radius should be half of the line segment AB. Wherein, point A is the position of the new data, point B is the ball S₀The point farthest from point a. So that the new ball class has a center and a radius of

R_{new} = \frac{| | x_{a} - x_{b} | |}{2}

x_{o_new} = \frac{x_{a} + x_{b}}{2}

x_bIs the coordinate of point B, which can be determined by the following two formulas

x_b＝x_o+λ₂(x_o-x_a)

The flow chart of the method provided by the invention is shown in figure 1.

According to the online modeling method based on the structure-adjustable extreme learning machine, a large number of simulation experiments are performed, the simulation experiments are limited to space, and only the application effect of the method on actual steelmaking continuous casting quality prediction data is given. The data set originates from an industrial site, the input dimension is 84, and the output dimension is 1; the number of training samples was 1056 and the number of test samples was 508.

The invention compares the application effects of the SAO-ELM method and batch processing type algorithm-BP neural network algorithm and OS-ELM method. The BP neural network algorithm is a classical neural network training method, but it is not an online learning algorithm, and the OS-ELM method is an online learning algorithm, which differs from the algorithm proposed by the present invention in that it has no structural adjustability, i.e., lacks the fifth, sixth, and seventh steps in the present invention. The results of the comparison are shown in Table 1:

TABLE 1 comparison of the Performance of SAO-ELM and other algorithms

As can be seen from Table 1, the training accuracy and testing accuracy of SAO-ELM are best compared to the other two modeling methods, and the training time is nearly an order of magnitude less than that of BP algorithm. Fig. 5 shows an online modeling process, and it can be known from comparison of variation trends thereof that the number of hidden nodes is continuously increasing and the learning accuracy is continuously improving with the advancement of the modeling process, and the variation trends of the hidden nodes and the learning accuracy are consistent, which also shows the effectiveness of the SAO-ELM proposed by the present invention.

Claims

1. An online modeling method based on an Extreme Learning Machine (ELM) with an adjustable structure is characterized in that the method is realized on a computer sequentially according to the following steps:

step (1): model selection and parameter initialization

excitation function G (a) of hidden node_i，b_iX) using a Gaussian function, determined randomlyCenter a of each hidden node_iAnd width b_i，i＝1，2，…M；

Based on the first N samplesTraining extreme learning machine to obtain initial hidden layer output matrix H₀And output layer connection matrix beta₀Wherein

β₀＝(H₀ ^TH₀)^-1H₀T₀

the matrix K is initialized so thatComputingPreservation of beta₀、K₀And P₀；

Step (2): online learning process

P_{1} = K_{1}^{- 1} = P_{0} - P_{0} H_{1}^{T} {(I + H_{1} P_{0} H_{1}^{T})}^{- 1} H_{1} P_{0}

K_{1} = K_{0} + H_{1}^{T} H_{1}

H₁＝[G(a₁，b₁，x_N+1)…G(a_M，b_M，x_N+1)]_1×M

step (2.3): adding a hidden node, setting the center a as x₁The width b is determined by

Wherein epsilon is a preset threshold value, x_cFor new sample point x on the classification sphere O₁The coordinates of the closest point can be determined by

x_c＝x_o+λ₁(x_a-x_o)

K_{1} = [\begin{matrix} K_{0} + H_{1}^{T} H_{1} & H_{1}^{T} H_{11} \\ H_{11}^{T} H_{1} & H_{11}^{T} H_{11} \end{matrix}],

P_{1} = K_{1}^{- 1} = [\begin{matrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{matrix}]

A_{21} = A_{12}^{T},

A₂₂＝R^-1

P_{0} = K_{0}^{- 1}

And (3): updating parameters of category ball O

R_{new} = \frac{| | x_{a} - x_{b} | |}{2}

x_{o_new} = \frac{x_{a} + x_{b}}{2}

wherein x is_aAnd x_bAre respectively new sample points x₁Coordinate of (2) and ball O upper distance x₁The coordinate of the farthest point, x_bCan be calculated from the following formula,

x_b＝x_o+λ₂(x_o-x_a)