CN105117525B

CN105117525B - Bagging extreme learning machine integrated modelling approach

Info

Publication number: CN105117525B
Application number: CN201510466504.7A
Authority: CN
Inventors: 卞希慧; 李淑娟; 谭小耀; 王江江; 王治国; 刘维国; 陈宗蓬; 王晨
Original assignee: Shanghai Sui Hua Industrial Ltd By Share Ltd; Tianjin Polytechnic University
Current assignee: Shanghai Sui Hua Industrial Ltd By Share Ltd; Tianjin Polytechnic University
Priority date: 2015-07-31
Filing date: 2015-07-31
Publication date: 2018-05-15
Anticipated expiration: 2035-07-31
Also published as: CN105117525A

Abstract

The invention belongs to chemometric techniques field, is specially Bagging extreme learning machine integrated modelling approach.The present invention's concretely comprises the following steps：Measured object sample spectrum data are gathered, measure sample is tested the content of component；Sample set is divided into training set and forecast set；Boostrap resamplings are carried out to training set sample, randomly select certain amount sample as a training subset；With the Sample Establishing extreme learning machine submodel of training subset；Repeatedly, multiple submodels are established；For unknown sample, by the prediction result simple average of multiple submodels, final prediction result is obtained.Compared with ELM methods, the method for the present invention has a clear superiority in terms of precision of prediction and stability.The present invention is suitable for the complex material quantitative analysis fields such as oil, tobacco, food, Chinese medicine.

Description

Bagging extreme learning machine integrated modelling approach

Technical field

The invention belongs to chemometric techniques field, and in particular to Bagging extreme learning machine integrated modelling approach.

Background technology

Artificial neural network has been widely used because of its powerful adaptive, self-organizing, self study and non-linear mapping capability In biology, chemistry, medicine, the various fields of economic dispatch.But traditional Learning Algorithm（Such as BP algorithm）Need artificial Substantial amounts of network training parameter is set, and training speed is slow, is easy to produce locally optimal solution.2004, Singapore's Nan Yang science and engineering University professor Huang Guangbin proposes a kind of new algorithm of Single hidden layer feedforward neural networks, is named as extreme learning machine（Extreme Learning Machine, ELM）.The core of ELM algorithms is to be changing into the training problem of neutral net to solve least square Problem, avoids the defects of artificial neural network needs artificial adjusting parameter and is easy to be absorbed in locally optimal solution.ELM algorithms because The features such as it is simple easily to realize, pace of learning is fast, generalization ability is strong, receives more and more attention in recent years, analytical chemistry, The multiple fields such as control engineering, image recognition are applied.But the biasing of the input weight and hidden neuron due to ELM be with What machine was set so that the operation result of model has unstability.

Integrated moulding technology obtains final prediction result by the way that the result of multiple models is merged, and it is pre- to improve model The precision and stability of survey.Bagging is as a kind of common integrated modelling approach, with " bootstrap " method from training set The middle random selection multiple submodels of part Sample Establishing, then multiple submodel prediction results are averaged to obtain finally prediction knot Fruit.On the one hand this method increases the diversity factor of integrated moulding by choosing training set again, on the other hand multiple pre- by merging Survey the precision of prediction that result improves basic model.

The present invention combines the advantage of ELM and Bagging, it is proposed that the ELM integrated modelling approach based on Bagging, is used in combination In complex sample quantitative analysis, the advantage that ELM calculating speeds are fast, predictive ability is strong was not only remained, but also to overcome ELM stability poor The shortcomings that.

The content of the invention

It is an object of the invention to propose that a kind of stability is good, precision of prediction is high Bagging extreme learning machines integrate to build Mould method.

The present invention is by Bagging algorithms and extreme learning machine model（ELM）It is combined, establishes the pole based on Bagging Limit learning machine integrated approach（It is denoted as Bagging ELM）, its flow is as shown in Figure 1, concretely comprise the following steps：

（1）The spectroscopic data of measured object sample is gathered, the content of component is tested with conventional method measure sample；By sample set It is divided into training set and forecast set；

（2）Boostrap resamplings are carried out to training set sample, randomly select certain amount sample as training Collection；

（3）Extreme learning machine Optimum Excitation function and hidden layer number of nodes are determined, with the Sample Establishing limit of training subset Learning machine submodel；

Repeat step（2）, step（3）Repeatedly, N number of submodel is established；

（4）For unknown sample, by making arithmetic average to the prediction result of multiple submodels, finally prediction knot is obtained Fruit.

In the present invention, the definite method of the number of N number of submodel is as follows：Give a sufficiently large submodel number Value, the training subset sample number of fixed each data set is the 50% of total number of samples, calculates predicted root mean square error（Root mean square error of prediction, RMSEP）, and observe changes of the RMSEP with submodel number；When RMSEP values It is constant or almost unchanged（Tend towards stability）When, the submodel number corresponding to it is required number N.

In the present invention, the definite method of the number of samples of training subset is as follows：Stator Number of Models, by the 5% of sample number ~ 100%, at interval of 5%, change the number for the sample being selected（The method that truncates during non-integer rounding）, RMSEP values are calculated, RMSEP is most Small or corresponding number of samples when tending to be steady, is the number of samples that circulation is chosen every time.

In the present invention, determine that the specific method of extreme learning machine Optimum Excitation function and hidden layer number of nodes is as follows：According to The RMSEP values of training set spectrum are with the change for choosing excitation function and hidden layer number of nodes, when RMSEP reaches minimum, RMSEP Corresponding excitation function and hidden layer number of nodes is optimal parameter.

It is an advantage of the invention that：The modeling method combines the advantage of integrated moulding technology Bagging and extreme learning machine, The precision of prediction and stability of extreme learning machine algorithm are improved, a kind of new build is provided for the analysis of complex material Multivariate Correction Mould method.The method of the present invention is widely portable to the complex material quantitative analysis in the fields such as oil, tobacco, food, Chinese medicine.

Brief description of the drawings

Fig. 1 is the flow chart of Bagging extreme learning machines.

Fig. 2 is the predicted root mean square error value of the ultraviolet data of fuel oil with the change of submodel number.

Fig. 3 is the predicted root mean square error value of the ultraviolet data of fuel oil with the change of training subset sample percentage.

Fig. 4 be the ultraviolet data extreme learning machine of fuel oil and Bagging extreme learning machines predicted root mean square error value with Predict the change of number.

Fig. 5 is the consensus forecast that the ultraviolet data Bagging extreme learning machines of fuel oil and extreme learning machine predict forecast set Value and the relation of actual value, wherein (a) and (b) is respectively Bagging extreme learning machines and extreme learning machine.

Fig. 6 is the predicted root mean square error value of ethanol solution near-infrared data with the change of submodel number.

Fig. 7 is the predicted root mean square error value of ethanol solution near-infrared data with the change of training subset sample percentage Change.

Fig. 8 is the predicted root mean square error of ethanol solution near-infrared data extreme learning machine and Bagging extreme learning machines Value is with the change of prediction number.

Fig. 9 is that ethanol solution near-infrared data Bagging extreme learning machines and extreme learning machine put down forecast set prediction The relation of equal predicted value and actual value, wherein (a) and (b) is respectively Bagging extreme learning machines and extreme learning machine.

Figure 10 is the predicted root mean square error value of diesel oil near-infrared data with the change of submodel number.

Figure 11 is the predicted root mean square error value of diesel oil near-infrared data with the change of training subset sample percentage.

Figure 12 be diesel oil near-infrared data extreme learning machine and Bagging extreme learning machines predicted root mean square error value with The change of prediction number.

Figure 13 is that it is averagely pre- to predict forecast set for diesel oil near-infrared data Bagging extreme learning machines and extreme learning machine The relation of measured value and actual value, wherein (a) and (b) is respectively Bagging extreme learning machines and extreme learning machine.

Figure 14 is the predicted root mean square error value of blood near-infrared data with the change of submodel number.

Figure 15 is the predicted root mean square error value of blood near-infrared data with the change of training subset sample percentage.

Figure 16 be blood near-infrared data extreme learning machine and Bagging extreme learning machines predicted root mean square error value with The change of prediction number.

Figure 17 is that it is averagely pre- to predict forecast set for blood near-infrared data Bagging extreme learning machines and extreme learning machine The relation of measured value and actual value, wherein (a) and (b) is respectively Bagging extreme learning machines and extreme learning machine.

Figure 18 is the predicted root mean square error value of cigarette near-infrared data with the change of submodel number.

Figure 19 is the predicted root mean square error value of cigarette near-infrared data with the change of training subset sample percentage.

Figure 20 be cigarette near-infrared data extreme learning machine and Bagging extreme learning machines predicted root mean square error value with The change of prediction number.

Figure 21 is that it is averagely pre- to predict forecast set for cigarette near-infrared data Bagging extreme learning machines and extreme learning machine The relation of measured value and actual value, wherein (a) and (b) is respectively Bagging extreme learning machines and extreme learning machine.

Embodiment

To more fully understand the present invention, the present invention will be described in further detail with reference to the following examples, but this hair Bright claimed scope is not limited to the scope of embodiment expression.

Embodiment 1：

The present embodiment is to be applied to ultraviolet spectral analysis, and monoaromatics content value in fuel oil sample is surveyed It is fixed.Specific step is as follows：

（1）The ultraviolet spectrum data of 115 fuel oil samples is gathered, wave-length coverage 200-400nm, the sampling interval is 0.35nm, comprising 572 wavelength points, spectrum uses 3 UV-visible spectrophometer spectrometers of Varian Cary Measure.Monoaromatics content is measured using HPG1205A supercritical fluid chromatographies, and carbon dioxide is as carrier gas, stream Speed is 2mLmin^-1, furnace temperature 35oC, outlet pressure 150bar, detector is flame ionic detector.According to logarithm on website According to the division of collection, 70 samples are used as training set, and 45 samples are used as forecast set.

（2）Boostrap resamplings are carried out to training set sample, randomly select certain amount sample as training Collection.

（3）Extreme learning machine Optimum Excitation function and hidden layer number of nodes are determined, with the Sample Establishing limit of training subset Learning machine submodel.

Repeat（2）-（3）Step is multiple, establishes multiple submodels.

（4）For unknown sample, arithmetic average is made by the prediction result of multiple submodels, obtains final prediction result.

The definite method of submodel number：Give 500 sub- model number values, the training subset sample of fixed each data set This number is the 50% of total number of samples, calculates predicted root mean square error（Root mean square error of prediction, RMSEP）With the change of submodel number, RMSEP values are constant or almost unchanged（Tend towards stability）The model that Shi Weiying is established Number.In the embodiment RMSEP with submodel number change as shown in Fig. 2, group pattern number be 500 after, RMSEP values It is almost unchanged, so the Number of Models established is 500.

The choosing method of certain amount sample is：Stator Number of Models is 500, by the 5% ~ 100% of sample number, at interval of 5%, change the number for the sample being selected（The method that truncates during non-integer rounding）, RMSEP values are calculated, RMSEP is minimum or tends to be steady When corresponding sample number for every time circulation choose sample number.RMSEP values are with training subset sample percentage in the embodiment Change as shown in figure 3, when training subset sample number reach training sample sum 20 ~ 100% when, RMSEP values reach minimum and It is almost unchanged, therefore, training subset sample number for total sample number 20 ~ 100% can, this example choose training subset sample number For the 50% of total number of samples.

Extreme learning machine Optimum Excitation function and hidden layer number of nodes determine that method is：According to the RMSEP of training set spectrum Value is corresponding when RMSEP reaches minimum in the embodiment optimal to swash with the change for choosing excitation function and hidden layer number of nodes Encourage function and hidden layer number of nodes is：Excitation function sin and number of nodes 9.

In order to compare the stability of Bagging ELM and ELM, rerun respectively to two methods 20 times, obtain RMSEP It is worth the change with number of run, as shown in Figure 4.It can be seen from the figure that ELM algorithms rerun 20 times RMSEP fluctuation compared with Greatly, illustrate that the stability of model is poor；And it is smaller with the fluctuation of 20 prediction results of bagging ELM algorithms, and become substantially In stabilization, good stability is shown；This shows that the stability being run multiple times of bagging ELM will be substantially better than ELM.Separately On the one hand, the RMSEP values of Bagging ELM are always less than ELM, show that Bagging integrates the prediction essence for improving ELM algorithms Degree.Fig. 5 shows that Bagging ELM and ELM carry out mean predicted value and actual value that forecast set is predicted in 20 computings respectively Relation, wherein vertical line represent 20 times operation deviations, it can be seen from the figure that Bagging ELM deviations will be far smaller than ELM Deviation.Related coefficient 0.9921 between the prediction result and actual value of Bagging ELM is more than the 0.9858 of ELM, Standard deviation 0.0001 between the prediction result and actual value of Bagging ELM is less than the 0.0044 of ELM.This shows Bagging ELM improve the precision of prediction of ELM models, and have more preferable stability.

Embodiment 2：

The present embodiment is to be applied to ultraviolet spectral analysis, and ethanol component content value is measured.Specific step is as follows：

（1）Gather the near infrared spectrum data of 95 ethanol solution samples, wave-length coverage 850-1049nm, sampling interval For 1nm, including 200 wavelength points, spectrum is measured using 8453 spectrometers of HP.According to the division on website to data set, 65 Sample is used as training set, and 30 samples are used as forecast set.

Repeat（2）-（3）Step is multiple, establishes multiple submodels.

（4）For unknown sample, by the prediction result arithmetic average of multiple submodels, final prediction result is obtained.

The definite method of submodel number：Give 500 sub- model number values, the training subset sample of fixed each data set This number is the 50% of total number of samples, calculates predicted root mean square error（Root mean square error of prediction, RMSEP）With the change of submodel number, RMSEP values are constant or almost unchanged（Tend towards stability）The model that Shi Weiying is established Number.In the embodiment RMSEP with submodel number change as shown in fig. 6, group pattern number be 500 after, RMSEP values It is almost unchanged, so the Number of Models established is 500.

The choosing method of certain amount sample is：Stator Number of Models is 500, by the 5% ~ 100% of sample number, at interval of 5%, change the number for the sample being selected（The method that truncates during non-integer rounding）, RMSEP values are calculated, RMSEP is minimum or tends to be steady When corresponding sample number for every time circulation choose sample number.RMSEP values are with training subset sample percentage in the embodiment Change as shown in fig. 7, when training subset sample number reach training sample sum 30 ~ 100% when, RMSEP values reach minimum and It is almost unchanged, therefore, training subset sample number for total sample number 30 ~ 100% can, this example choose training subset sample number For the 60% of total number of samples.

Extreme learning machine Optimum Excitation function and hidden layer number of nodes determine that method is：According to the RMSEP of training set spectrum Value is corresponding when RMSEP reaches minimum in the embodiment optimal to swash with the change for choosing excitation function and hidden layer number of nodes Encourage function and hidden layer number of nodes is：Excitation function radbas and number of nodes 35.

In order to compare the stability of Bagging ELM and ELM, rerun respectively to two methods 20 times, obtain RMSEP It is worth the change with number of run, as shown in Figure 8.It can be seen from the figure that ELM algorithms rerun 20 times RMSEP fluctuation compared with Greatly, illustrate that the stability of model is poor；And it is smaller with the fluctuation of 20 prediction results of Bagging ELM algorithms, and become substantially In stabilization, good stability is shown；The stability being run multiple times this demonstrate Bagging ELM will be substantially better than ELM. On the other hand, the RMSEP values of Bagging ELM are always less than ELM, show that Bagging integrates the prediction essence for improving ELM algorithms Degree.Fig. 9 shows that Bagging ELM and ELM carry out mean predicted value and actual value that forecast set is predicted in 20 computings respectively Relation, wherein vertical line represent 20 times operation deviations, it can be seen from the figure that Bagging ELM deviations will be far smaller than ELM Deviation.Related coefficient 0.9988 between the prediction result and actual value of Bagging ELM is more than the 0.9957 of ELM, Standard deviation 0.00015 between the prediction result and actual value of Bagging ELM is less than the 0.0023 of ELM.This demonstrate Bagging ELM improve the precision of prediction of ELM models, and have more preferable stability.

Embodiment 3：

The present embodiment is to be applied to near-infrared spectrum analysis, and the density value of diesel oil sample is measured.Specific step It is as follows：

（1）Gather the near infrared spectrum data of 263 diesel fuel samples, wave-length coverage 750-1550nm, sampling Be spaced 2nm, including 401 wavelength points, data by US military Southwest Research Institute (SWRI), San Antonio, TX through Eigenvector Research, Inc. (Manson, Washington) are provided, Download network address：http://www.eigenvector.com/Data/SWRI）.According to the division on website to data set, 142 Sample is used as training set, and 121 samples are used as forecast set.

Repeat（2）-（3）Step is multiple, establishes multiple submodels.

The definite method of submodel number：Give 500 sub- model number values, the training subset sample of fixed each data set This number is the 50% of total number of samples, calculates predicted root mean square error（Root mean square error of prediction, RMSEP）With the change of submodel number, RMSEP values are constant or almost unchanged（Tend towards stability）The model that Shi Weiying is established Number.In the embodiment RMSEP with the change of submodel number it is as shown in Figure 10, group pattern number be 500 after, RMSEP Be worth it is almost unchanged, so establish Number of Models be 500.

The choosing method of certain amount sample is：Stator Number of Models is 500, by the 5% ~ 100% of sample number, at interval of 5%, change the number for the sample being selected（The method that truncates during non-integer rounding）, RMSEP values are calculated, RMSEP is minimum or tends to be steady When corresponding sample number for every time circulation choose sample number.RMSEP values are with training subset sample percentage in the embodiment Change it is as shown in figure 11, when training subset sample number reach training sample sum 40 ~ 100% when, RMSEP values reach minimum And it is almost unchanged, therefore, training subset sample number for total sample number 40 ~ 100% can, this example choose training subset sample Number is the 50% of total number of samples.

Extreme learning machine Optimum Excitation function and hidden layer number of nodes determine that method is：According to the RMSEP of training set spectrum Value is corresponding when RMSEP reaches minimum in the embodiment optimal to swash with the change for choosing excitation function and hidden layer number of nodes Encourage function and hidden layer number of nodes is：Excitation function tribas and number of nodes 48.

In order to compare the stability of Bagging ELM and ELM, rerun respectively to two methods 20 times, obtain RMSEP It is worth the change with number of run, as shown in figure 12.It can be seen from the figure that ELM algorithms rerun 20 times RMSEP fluctuation compared with Greatly, illustrate that the stability of model is poor；And it is smaller with the fluctuation of 20 prediction results of Bagging ELM algorithms, and become substantially In stabilization, good stability is shown；The stability being run multiple times this demonstrate Bagging ELM will be substantially better than ELM. On the other hand, the RMSEP values of Bagging ELM are always less than ELM, show that Bagging integrates the prediction essence for improving ELM algorithms Degree.Figure 13 shows that Bagging ELM and ELM carry out mean predicted value and actual value that forecast set is predicted in 20 computings respectively Relation, wherein vertical line represent 20 times operation deviations, it can be seen from the figure that Bagging ELM deviations will be far smaller than ELM Deviation.Related coefficient 0.9970 between the prediction result and actual value of Bagging ELM is more than the 0.9923 of ELM, Standard deviation 0.00014 between the prediction result and actual value of Bagging ELM is less than the 0.0031 of ELM.This demonstrate Bagging ELM improve the precision of prediction of ELM models, and have more preferable stability.

Embodiment 4：

The present embodiment is to be applied near infrared spectral transmission to analyze, and the content of blood sample is measured.Specific step It is rapid as follows：

（1）Gather the near-infrared transmission data of 231 blood samples, wave-length coverage 1100-2498nm, sampling interval 2nm, including 700 wavelength points, spectrum use 6500 instruments of NIR systems spectrometer model（NIR systems, Inc., Silver Springs, USA）Reflective-mode measures, and downloads network address：http://www.idrc- chambersburg.org/shootout2010.html.According to the division on website to data set, 143 samples are used as training Collection, 47 samples are used as forecast set.

Repeat（2）-（3）Step is multiple, establishes multiple submodels.

The definite method of submodel number：Give 500 sub- model number values, the training subset sample of fixed each data set This number is the 50% of total number of samples, calculates predicted root mean square error（Root mean square error of prediction, RMSEP）With the change of submodel number, RMSEP values are constant or almost unchanged（Tend towards stability）The model that Shi Weiying is established Number.In the embodiment RMSEP with the change of submodel number it is as shown in figure 14, group pattern number be 500 after, RMSEP Be worth it is almost unchanged, so establish Number of Models be 500.

The choosing method of certain amount sample is：Stator Number of Models is 500, by the 5% ~ 100% of sample number, at interval of 5%, change the number for the sample being selected（The method that truncates during non-integer rounding）, RMSEP values are calculated, RMSEP is minimum or tends to be steady When corresponding sample number for every time circulation choose sample number.RMSEP values are with training subset sample percentage in the embodiment Change it is as shown in figure 15, when training subset sample number reaches the 60% of training sample sum, RMSEP values are minimum, circulate every time The training subset sample number of selection is the 60% of total number of samples.

Extreme learning machine Optimum Excitation function and hidden layer number of nodes determine that method is：According to the RMSEP of training set spectrum Value is corresponding when RMSEP reaches minimum in the embodiment optimal to swash with the change for choosing excitation function and hidden layer number of nodes Encourage function and hidden layer number of nodes is：Excitation function sig and number of nodes 37.

In order to compare the stability of Bagging ELM and ELM, rerun respectively to two methods 20 times, obtain RMSEP It is worth the change with number of run, as shown in figure 16.It can be seen from the figure that ELM algorithms rerun 20 times RMSEP fluctuation compared with Greatly, illustrate that the stability of model is poor；And it is smaller with the fluctuation of 20 prediction results of Bagging ELM algorithms, and become substantially In stabilization, good stability is shown；The stability being run multiple times this demonstrate Bagging ELM will be substantially better than ELM. On the other hand, the RMSEP values of Bagging ELM are always less than ELM, show that Bagging integrates the prediction essence for improving ELM algorithms Degree.Figure 17 shows that Bagging ELM and ELM carry out mean predicted value and actual value that forecast set is predicted in 20 computings respectively Relation, wherein vertical line represent 20 times operation deviations, it can be seen from the figure that Bagging ELM deviations will be far smaller than ELM Deviation.Related coefficient 0.9774 between the prediction result and actual value of Bagging ELM is more than the 0.9432 of ELM, Standard deviation 0.0008 between the prediction result and actual value of bagging ELM is less than the 0.0268 of ELM.This demonstrate Bagging ELM improve the precision of prediction of ELM models, and have more preferable stability.

Embodiment 5：

The present embodiment is to be applied to near-infrared spectrum analysis, and the content value of Virginian-type cigarette Powder samples chlorine is surveyed It is fixed.Specific step is as follows：

（1）The near infrared spectrum data of the Virginian-type cigarette product of 58 trades mark is gathered, wave-number range is 4000~9000 cm^-1, the sampling interval is 1 cm^-1, totally 5001 wavelength points, cigarette be prepared into powder sample according to YC/331-1996, average grain Spend and measured for 0.45 mm, spectrum using Vector 22/N FT-NIR System (Bruker) spectrometer.Chlorine contains in sample Amount is measured using Auto Analyzer type III Continuous Flow Analysis instrument according to standard method.Using KS group technologies to data Collection is divided, and 38 samples are as training set, and 20 samples are as forecast set.

Repeat（2）-（3）Step is multiple, establishes multiple submodels.

The definite method of submodel number：Give 500 sub- model number values, the training subset sample of fixed each data set This number is the 50% of total number of samples, calculates predicted root mean square error（Root mean square error of prediction, RMSEP）With the change of submodel number, RMSEP values are constant or almost unchanged（Tend towards stability）The model that Shi Weiying is established Number.In the embodiment RMSEP with the change of submodel number it is as shown in figure 18, group pattern number be 500 after, RMSEP Be worth it is almost unchanged, so establish Number of Models be 500.

The choosing method of certain amount sample is：Stator Number of Models is 500, by the 5% ~ 100% of sample number, at interval of 5%, change the number for the sample being selected（The method that truncates during non-integer rounding）, RMSEP values are calculated, RMSEP is minimum or tends to be steady When corresponding sample number for every time circulation choose sample number.RMSEP values are with training subset sample percentage in the embodiment Change it is as shown in figure 19, when training subset sample number reach training sample sum 60 ~ 100% when, RMSEP values reach minimum And it is almost unchanged, therefore, training subset sample number for total sample number 60 ~ 100% can, this example choose training subset sample Number is the 60% of total number of samples.

Extreme learning machine Optimum Excitation function and hidden layer number of nodes determine that method is：According to the RMSEP of training set spectrum Value is corresponding when RMSEP reaches minimum in the embodiment optimal to swash with the change for choosing excitation function and hidden layer number of nodes Encourage function and hidden layer number of nodes is：Excitation function sig and number of nodes 69.

In order to compare the stability of Bagging ELM and ELM, rerun respectively to two methods 20 times, obtain RMSEP It is worth the change with number of run, as shown in figure 20.It can be seen from the figure that ELM algorithms rerun 20 times RMSEP fluctuation compared with Greatly, illustrate that the stability of model is poor；And it is smaller with the fluctuation of 20 prediction results of Bagging ELM algorithms, and become substantially In stabilization, good stability is shown；The stability being run multiple times this demonstrate Bagging ELM will be substantially better than ELM. On the other hand, the RMSEP values of Bagging ELM are always less than ELM, show that Bagging integrates the prediction essence for improving ELM algorithms Degree.Figure 21 shows that Bagging ELM and ELM carry out mean predicted value and actual value that forecast set is predicted in 20 computings respectively Relation, wherein vertical line represent 20 times operation deviations, it can be seen from the figure that Bagging ELM deviations will be far smaller than ELM Deviation.Related coefficient 0.9762 between the prediction result and actual value of Bagging ELM is more than the 0.9635 of ELM, Standard deviation 0.0007 between the prediction result and actual value of Bagging ELM is less than the 0.0062 of ELM.This demonstrate Bagging ELM improve the precision of prediction of ELM models, and have more preferable stability.

Above-described embodiment illustrates that this method provides one kind for complex material quantitative analysis and is based on Bagging extreme learning machines Complex sample quantitative analysis modeling method, this method can greatly improve the stability and precision of prediction of model.

Claims

1. a kind of Bagging extreme learning machines integrated modelling approach, it is characterised in that concretely comprise the following steps：

（1）The spectroscopic data of measured object sample is gathered, measure sample is tested the content of component；By sample set be divided into training set and Forecast set；

（2）Boostrap resamplings are carried out to training set sample, randomly select certain amount sample as a training subset；

The definite method of the number of samples of training subset is as follows：Stator Number of Models, by the 5% ~ 100% of sample number, at interval of 5%, change the number for the sample being selected, when non-integer truncates method rounding, calculates RMSEP values, RMSEP minimums or tends to be steady When corresponding number of samples, be every time circulation choose number of samples；

（3）Determine extreme learning machine Optimum Excitation function and hidden layer number of nodes, learnt with the Sample Establishing limit of training subset Loom model；

Determine that the specific method of extreme learning machine Optimum Excitation function and hidden layer number of nodes is as follows：According to training set spectrum RMSEP values are with the change for choosing excitation function and hidden layer number of nodes, when RMSEP reaches minimum, the excitation corresponding to RMSEP Function and hidden layer number of nodes are optimal parameter；

Repeat step（2）, step（3）Repeatedly, N number of submodel is established；（4）For unknown sample, by multiple submodels Prediction result makees arithmetic average, obtains final prediction result.

2. Bagging extreme learning machines integrated modelling approach according to claim 1, it is characterised in that：N number of submodule The definite method of the number of type is as follows：Give a sufficiently large submodel number value, the training subset of fixed each data set Sample number is the 50% of total number of samples, calculates predicted root mean square error, is denoted as RMSEP, and observe RMSEP with submodel number Change；When RMSEP values are constant or tend towards stability, the submodel number corresponding to it is required number N.