CN104537391A

CN104537391A - Meta learning method of extreme learning machine

Info

Publication number: CN104537391A
Application number: CN201410814269.3A
Authority: CN
Inventors: 廖士中; 冯昌
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2014-12-23
Filing date: 2014-12-23
Publication date: 2015-04-22

Abstract

The invention discloses a meta learning method of an extreme learning machine. The meta learning method comprises the following steps that (1) a plurality of original training sets are generated; (2) a plurality of Base-ELMs are trained on each original training set; (3) the Base-ELMs are used as the hidden node activation function of an Meta-ELM, and the Meta-ELM is trained, wherein the training method comprises the steps of calculating a hidden layer matrix H, calculating an output layer weight beta, and finally obtaining a prediction function of f(x) = <beta, h(x)>. According to the meta learning method of the extreme learning machine, the mixed distribution type storage scheme capable of effectively and efficiently managing big semantic data is provided, and therefore the improvement and development of large-scale data storage and management are promoted. Compared with the prior art, the influence of randomness of an original ELM algorithm on the learning performance of the ELM is reduced.

Description

A kind of element study method of extreme learning machine

Technical field

Technical field belonging to the present invention is machine learning techniques field, particularly relates to a kind of learning method of extreme learning machine.

Background technology

Extreme learning machine (ELM) is the important learning method of the class that grows up on neural network theory basis, is widely used in fields such as data mining, recognition of face, pattern-recognitions.ELM is single hidden layer feed-forward type neural network (Single hidden-Layer Feedforward Networks in essence, SLFNs), with traditional SLFNs unlike, the input weight (Weight) that the mono-hidden layer of ELM is corresponding and biased (Bias) are produced by random assignment, form the linear system that a parameter is fixing, then solve this linear system by least square method.By theoretical and a large amount of experimental verification, ELM is a kind of efficient and effective learning algorithm to the people such as Huang.But input weight and biased random assignment bring certain randomness to this learning algorithm, have impact on the Generalization Capability of ELM.

Statistically, the single learner of average ratio of multiple learner has less variance, and results of learning are more stable.Therefore built-up pattern can have better method performance than best single model.This facilitate and use the synthesis result of some ELM to promote the research of single ELM learning performance method.The present invention is based on the element study method that the different mode of learning of base ELM and different meta learning modes give a kind of new ELM.The gordian technique that the present invention relies on is described below respectively.

One, extreme learning machine (ELM)

Given training dataset S={ (x _i, y _i), i=1 ..., N}, x _i∈ R ^dfor input vector, y _i∈ R is the output corresponding with input, and N is number of training, and R is real number field, and d is the dimension of input data.

The mathematical model of the SLFNs of L hidden node is:

Σ_{i = 1}^{L} β_{i} G (w_{i}, b_{i}, x_{j}) = t_{j}, j = 1, . . ., N;

Wherein β is weight vectors, and G represents a certain activation function, and w, b are the parameter of function G, are input as x, exports as t, N are sample number.

Being write as matrix form is H β=T, wherein

H = [\begin{matrix} h (x_{i}) \\ . \\ . \\ . \\ h (x_{N}) \end{matrix}] = {[\begin{matrix} G (w_{1}, b_{1}, x_{1}) & . . . & G (w_{L}, b_{L}, x_{1}) \\ . & . \\ . & . . . & . \\ . & . \\ G (w_{1}, b_{1}, x_{N}) & . . . & G (w_{L}, b_{L}, x_{N}) \end{matrix}]}_{N \times L},

β = {[\begin{matrix} β_{1} \\ . \\ . \\ . \\ β_{L} \end{matrix}]}_{L \times 1} and T = {[\begin{matrix} t_{1} \\ . \\ . \\ . \\ t_{N} \end{matrix}]}_{N \times 1},

Wherein H represents hidden layer output matrix, and h represents that L ties up nonlinear characteristic and maps, and N is sample number, and L is hidden layer node number, and t is for exporting.

For ELM algorithm, just keep fixing after stochastic parameters all in H produces, only need to solve β,

‖Hβ-T‖＝min _β‖Hβ-T‖。

ELM algorithm comprises three steps:

(1) stochastic generation hidden layer parameter (w _i, b _i), i=1 ..., L;

(2) calculate the mathematical model of hidden layer matrix S LFNs, represent by matrix H;

(3) output layer weight is calculated wherein the Moore-Penrose inverse (generalized inverse) of representing matrix H.

So final anticipation function is

f(x)＝〈β,h(x)〉。

Two, meta learning (Meta-Learning)

Meta learning is a kind of method utilizing multiple existing learning outcome again to carry out the broad sense learnt.Can be interpreted as simply in machine learning field, after obtaining multiple base learner, then these base learners be combined in some way.Such learning method comprises integrated study, Boosting, Bagging, Stacking etc.

Unit's extreme learning machine comprises two level training, that is: Base-ELM level training, Meta-ELM level is trained.What Base-ELM level was trained is input as training set, and hidden node is general activation function; Base-ELM is trained a new ELM as hidden node by Meta-ELM, and the model finally obtained is forecast model.As shown in Figure 1, be the model framework of first extreme learning machine, i.e. single hidden layer feed-forward type neural network model of Meta-ELM formation, comprises: ground floor is input layer, i.e. raw data input layer from left to right; The second layer is hidden layer, and hidden node activation function is wherein single ELM; Third layer is output layer.

Go to see Meta-ELM from the angle of single hidden layer feed-forward type neural network, Meta-ELM itself is also a SLFN, its hidden layer node be not before activation function, a but ELM.

As shown in Figure 2, be first extreme learning machine Meta-ELM algorithm flow chart.This algorithm flow is described below:

1, Base-ELM is trained;

1) on original training data, produce the data set needed for Base-ELM training;

2) on training set, Base-ELM is trained;

2, Meta-ELM is trained;

1) hidden layer matrix H is calculated;

2) output layer weight beta is calculated.Meta-ELM trains Base-ELM in a different manner, and then trains the ELM on a upper strata, defines the learning model of a layering.

With ELM unlike, Meta-ELM hidden layer matrix is not simple activation function, but single Base-ELM,

H = [\begin{matrix} h (x_{1}) \\ . \\ . \\ . \\ h (x_{N}) \end{matrix}] = {[\begin{matrix} {ELM}_{1} (x_{1}) & . . . & {ELM}_{M} (x_{1}) \\ . & . \\ . & . . . & . \\ . & . \\ {ELM}_{1} (x_{N}) & . . . & {ELM}_{M} (x_{N}) \end{matrix}]}_{N \times M},

Wherein h (x)=[ELM ₁(x), ELM ₂(x) ..., ELM _m(x)].

So final anticipation function is f (x)=< β, h (x) >.

List of references:

[1]G.-B.Huang，Q.-Y.Zhu，C.-K.Siew，Extreme Learning Machine:Theory andapplications，Neurocomputing 70(1)(2006)489–501.

[2]G.-B.Huang，H.Zhou，X.Ding，R.Zhang，Extreme Learning Machine forregression and multiclass classification，IEEE Transactions on Systems，Man，andCybernetics，Part B:Cybernetics 42(2)(2012)513–529.

[3]Z.-L Sun，T.-M.Choi，K.-F.Au，Y.Yu，Sales forecasting using Extreme LearningMachine with applications in fashion retailing，Decision Support Systems 46(1)(2008)411–419.

[4]Y.Lan，Y.C.Soh，G.-B.Huang，Ensemble of online sequential Extreme LearningMachine，Neurocomputing 72(135)(2009)3391–3395.

[5]M.V.Heeswijk，Y.Miche，E.Oja，A.Lendasse，GPU-accelerated and parallelizedELM ensembles for large-scale regression，Neurocomputing 74(16)(2011)2430–2437.

[6]Y.Guo，S.M.R¨uger，J.Sutiwaraphun，J.Forbes-Millott，Meta-learning for paralleldata mining，in:Proceedings of the Seventh Parallel Computing Workshop，1997.

[7]D.Serre，Matrices:Theory and applications，Springer-Verlag，2010.

[8]L.Breiman，Bagging predictors，Machine Learning 24(2)(1996)123–140.

Summary of the invention

Based on above-mentioned prior art, the present invention proposes a kind of element study method of extreme learning machine, utilizes the comprehensive learning performance randomness decreasing ELM of multiple learner.

The element study method of a kind of extreme learning machine of the present invention, comprises the following steps:

Step 1, produce some original training sets;

Step 2, on each original training set, train some Base-ELM;

Step 3, using the hidden node activation function of some Base-ELM as Meta-ELM, training Meta-ELM, comprises and calculates hidden layer matrix H; Calculate output layer weight beta, finally obtain anticipation function f (x)=< β, h (x) >.

Original training set in described step 1, by comprising original training data method, having original training data sampling method, original training data split plot design or the original training data sub sampling method put back to produce.

Output layer weight beta in described step 3, adopts one of following two kinds of modes to obtain:

Be averaged method or learn out output weight according to the mode of ELM Algorithm for Training

Compared with prior art, the learning performance of randomness to ELM that The present invention reduces original EL M algorithm is affected.

Accompanying drawing explanation

Fig. 1 is the model structure of first extreme learning machine of prior art;

Fig. 2 is first extreme learning machine algorithm flow chart of the present invention;

Fig. 3 is on SinC data set, based on the Meta-ELM algorithm of Data Segmentation and the performance comparison result schematic diagram of ELM algorithm;

Fig. 4 is on standard regression data set, the variance ratio comparatively result schematic diagram of original EL M algorithm, Ensemble and Meta-ELM test of heuristics square error;

Fig. 5 tests square error based on the Meta-ELM of sub sampling to change schematic diagram along with oversampling ratio.

Table 1 is tested square error for different training set producing method in Meta-ELM with element study method and is compared with T.T..

Embodiment

Below in conjunction with the drawings and specific embodiments, the present invention is described in detail, but practical range of the present invention is not limited thereto.

Step 1, generation training set; The mode producing training set has multiple, and the present invention mainly uses following four kinds:

1-1, original training data: namely do not carry out any operation to raw data, whole training data trains Base-ELM, and this method is relatively more effective on the data set that scale is relatively little, and the data larger for data volume have been difficult to training;

1-2, Bagging: use and have the sampling of putting back to obtain, from the different training sets of the size such as original training set, training set being trained Base-ELM;

1-3, segmentation: original training data is divided into Uncrossed some little training sets that size is almost equal;

1-4, sub sampling: obtain training set according to the sampling that sample size is put back to again.

Step 2, training Base-ELM

Obtain some training sets by step 1, each training set trains Base-ELM.

Step 3, training Meta-ELM

Obtain some Base-ELM, as the hidden node activation function of Meta-ELM.Wherein, for output weight, consider that the following two kinds training patterns draws:

3-1, to be averaged;

3-2, learn out output weight according to the mode of ELM Algorithm for Training.

As shown in Figure 5, based on the Meta-ELM of sub sampling, test square error reduces along with the increase of oversampling ratio.

Enforcement of the present invention uses MatLab and R programming language to realize.Step is as follows particularly:

Training set is produced according to different training set production methods;

1, on each training set, Base-ELM is trained;

2, using the hidden layer activation function of each Base-ELM as Meta-ELM, training Meta-ELM.

3, on whole original training data collection, Meta-ELM is trained to obtain final forecast model.

In table 1, Meta-ELM, different training set production method and element study method are tested square error and are compared with T.T.

Claims

1. an element study method for extreme learning machine, is characterized in that, the method comprises the following steps:

Step (1), produce some original training sets;

Step (2), on each original training set, train some Base-ELM;

Step (3), using the hidden node activation function of some Base-ELM as Meta-ELM, training Meta-ELM, comprises and calculates hidden layer matrix H; Calculate output layer weight beta, finally obtain anticipation function f (x)=< β, h (x) >.

2. the element study method of a kind of extreme learning machine as claimed in claim 1, it is characterized in that, original training set in described step (1), by comprising original training data method, has original training data sampling method, one of the original training data split plot design or original training data sub sampling method four kinds of methods put back to produce.

3. the element study method of a kind of extreme learning machine as claimed in claim 1, is characterized in that, the output layer weight beta in described step (3), adopts one of following two kinds of modes to obtain:

Be averaged method or learn out output weight according to the mode of ELM Algorithm for Training.