CN105117525A

CN105117525A - Bagging extreme learning machine integrated modeling method

Info

Publication number: CN105117525A
Application number: CN201510466504.7A
Authority: CN
Inventors: 卞希慧; 李淑娟; 谭小耀; 王江江; 王治国; 刘维国; 陈宗蓬; 王晨
Original assignee: SHANGHAI HUISHAN INDUSTRIAL Co Ltd; Tianjin Polytechnic University
Current assignee: SHANGHAI HUISHAN INDUSTRIAL Co Ltd; Tianjin Polytechnic University
Priority date: 2015-07-31
Filing date: 2015-07-31
Publication date: 2015-12-02
Anticipated expiration: 2035-07-31
Also published as: CN105117525B

Abstract

The invention belongs to the technical field of chemical metrology, and particularly relates to a Bagging extreme learning machine integrated modeling method. The method comprises the specific steps of: collecting a spectral data of an analyte sample, and determining a quantity of a tested component in the sample; dividing a sample set into a training set and a prediction set; carrying on a boostrap resample to the training set, randomly selecting a certain number of samples as a training subset; establishing an extreme learning machine submodel by the sample of the training subset; repeating the steps for multiple times, and establishing a plurality of submodels; and simply averaging predicted results of the plurality of submodels for unknown samples to obtain a final predicted result. Compared with an ELM method, the method related to the invention has significant advantages in terms of prediction accuracy and stability. The present invention is applicable to quantitative analysis field of complex substances such as petroleum, tobacco, food, traditional Chinese medicine and the like.

Description

Bagging extreme learning machine integrated modelling approach

Technical field

The invention belongs to chemometric techniques field, be specifically related to Bagging extreme learning machine integrated modelling approach.

Background technology

Artificial neural network, because of its powerful self-adaptation, self-organization, self study and non-linear mapping capability, has been widely used in biology, chemistry, medical science, the various field of economic dispatch.But traditional Learning Algorithm (as BP algorithm) needs artificially to arrange a large amount of network training parameters, training speed is slow, be easy to produce locally optimal solution.2004, Nanyang Technological University professor Huang Guangbin proposed a kind of new algorithm of Single hidden layer feedforward neural networks, called after extreme learning machine (ExtremeLearningMachine, ELM).The core of ELM algorithm is that the training problem of neural network is changing into the problem solving least square, avoids the defect that artificial neural network needs artificial adjustment parameter and is easy to be absorbed in locally optimal solution.ELM algorithm is simple and easy to because of it features such as realizing, pace of learning is fast, generalization ability is strong, receives increasing concern in recent years, is applied in multiple fields such as analytical chemistry, control engineering, image recognitions.But because the input weight of ELM and the biased of hidden neuron are random settings, make the operation result of model have instability.

Integrated moulding technology is finally predicted the outcome by the result of multiple model is carried out fusion, can improve precision and the stability of model prediction.Multiple submodel, as a kind of conventional integrated modelling approach, with " bootstrap " method multiple submodel of Stochastic choice part Sample Establishing from training set, then predicts the outcome to be averaged and is finally predicted the outcome by Bagging.The method increases the diversity factor of integrated moulding on the one hand by again choosing training set, improve the precision of prediction of basic model on the other hand by merging multiple predicting the outcome.

The present invention, in conjunction with the advantage of ELM and Bagging, proposes the ELM integrated modelling approach based on Bagging, and for complex sample quantitative test, has both remained the advantage that ELM computing velocity is fast, predictive ability is strong, and overcome again the shortcoming of ELM poor stability.

Summary of the invention

The object of the invention is to propose a kind of good stability, Bagging extreme learning machine integrated modelling approach that precision of prediction is high.

Bagging algorithm combines with extreme learning machine model (ELM) by the present invention, establishes the extreme learning machine integrated approach (being designated as BaggingELM) based on Bagging, and as shown in Figure 1, concrete steps are its flow process:

(1) gather the spectroscopic data of measured object sample, measure the content of the tested composition of sample by conventional method; Sample set is divided into training set and forecast set;

(2) carry out boostrap resampling to training set sample, random selecting some samples are as a training subset;

(3) extreme learning machine Optimum Excitation function and hidden layer nodes is determined, with the Sample Establishing extreme learning machine submodel of training subset;

Repeat step (2), step (3) repeatedly, set up N number of submodel;

(4) for unknown sample, by making arithmetic mean to predicting the outcome of multiple submodel, finally predicted the outcome.

In the present invention, the defining method of the number of described N number of submodel is as follows: a given enough large submodel number value, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error (Rootmeansquareerrorofprediction, and observe the change of RMSEP along with submodel number RMSEP); When RMSEP value is constant or almost constant (tending towards stability) time, the submodel number corresponding to it is required number N.

In the present invention, the defining method of the number of samples of training subset is as follows: stator Number of Models, by 5% ~ 100% of sample number, at interval of 5%, change the number (during non-integer, the method for truncating rounds) of the sample be selected, calculate RMSEP value, the number of samples that RMSEP is minimum or corresponding when tending to be steady, is the number of samples chosen that at every turn circulates.

In the present invention, determine that the concrete grammar of extreme learning machine Optimum Excitation function and hidden layer nodes is as follows: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, when RMSEP reaches minimum, the excitation function corresponding to RMSEP and hidden layer nodes are optimal parameter.

Advantage of the present invention is: this modeling method combines the advantage of integrated moulding technology Bagging and extreme learning machine, improves precision of prediction and the stability of extreme learning machine algorithm, for the analysis of complex material Multivariate Correction provides a kind of new modeling method.The inventive method can be widely used in the complex material quantitative test in the fields such as oil, tobacco, food, Chinese medicine.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of Bagging extreme learning machine.

Fig. 2 is the change of predicted root mean square error value along with submodel number of fuel oil ultraviolet data.

Fig. 3 is the change of predicted root mean square error value along with training subset sample percentage of fuel oil ultraviolet data.

Fig. 4 is the change of predicted root mean square error value along with prediction number of times of fuel oil ultraviolet data extreme learning machine and Bagging extreme learning machine.

The relation of Fig. 5 mean predicted value that to be fuel oil ultraviolet data Bagging extreme learning machine and extreme learning machine predict forecast set and actual value, wherein (a) and (b) is respectively Bagging extreme learning machine and extreme learning machine.

Fig. 6 is the change of predicted root mean square error value along with submodel number of ethanolic solution near infrared data.

Fig. 7 is the change of predicted root mean square error value along with training subset sample percentage of ethanolic solution near infrared data.

Fig. 8 is the change of predicted root mean square error value along with prediction number of times of ethanolic solution near infrared data extreme learning machine and Bagging extreme learning machine.

The relation of Fig. 9 mean predicted value that to be ethanolic solution near infrared data Bagging extreme learning machine and extreme learning machine predict forecast set and actual value, wherein (a) and (b) is respectively Bagging extreme learning machine and extreme learning machine.

Figure 10 is the change of predicted root mean square error value along with submodel number of diesel oil near infrared data.

Figure 11 is the change of predicted root mean square error value along with training subset sample percentage of diesel oil near infrared data.

Figure 12 is the change of predicted root mean square error value along with prediction number of times of diesel oil near infrared data extreme learning machine and Bagging extreme learning machine.

The relation of Figure 13 mean predicted value that to be diesel oil near infrared data Bagging extreme learning machine and extreme learning machine predict forecast set and actual value, wherein (a) and (b) is respectively Bagging extreme learning machine and extreme learning machine.

Figure 14 is the change of predicted root mean square error value along with submodel number of blood near infrared data.

Figure 15 is the change of predicted root mean square error value along with training subset sample percentage of blood near infrared data.

Figure 16 is the change of predicted root mean square error value along with prediction number of times of blood near infrared data extreme learning machine and Bagging extreme learning machine.

The relation of Figure 17 mean predicted value that to be blood near infrared data Bagging extreme learning machine and extreme learning machine predict forecast set and actual value, wherein (a) and (b) is respectively Bagging extreme learning machine and extreme learning machine.

Figure 18 is the change of predicted root mean square error value along with submodel number of cigarette near infrared data.

Figure 19 is the change of predicted root mean square error value along with training subset sample percentage of cigarette near infrared data.

Figure 20 is the change of predicted root mean square error value along with prediction number of times of cigarette near infrared data extreme learning machine and Bagging extreme learning machine.

The relation of Figure 21 mean predicted value that to be cigarette near infrared data Bagging extreme learning machine and extreme learning machine predict forecast set and actual value, wherein (a) and (b) is respectively Bagging extreme learning machine and extreme learning machine.

Embodiment

For understanding the present invention better, below in conjunction with embodiment the present invention done and describe in detail further, but the scope of protection of present invention being not limited to the scope that embodiment represents.

Embodiment 1:

The present embodiment is applied to ultraviolet spectral analysis, measures monoaromatics content value in fuel oil sample.Concrete step is as follows:

(1) gather the ultraviolet spectrum data of 115 fuel oil samples, wavelength coverage is 200-400nm, and sampling interval is 0.35nm, comprises 572 wavelength points, and spectrum adopts VarianCary3UV-visiblespectrophometer spectrophotometer.Monoaromatics content adopts HPG1205A supercritical fluid chromatography to measure, and carbon dioxide is as carrier gas, and flow velocity is 2mLmin ^-1, furnace temperature is 35oC, and top hole pressure is 150bar, and detecting device is flame ionic detector.According on website to the division of data set, 70 samples are used as training set, and 45 samples are used as forecast set.

(2) carry out boostrap resampling to training set sample, random selecting some samples are as a training subset.

(3) extreme learning machine Optimum Excitation function and hidden layer nodes is determined, with the Sample Establishing extreme learning machine submodel of training subset.

Repeat (2)-(3) step repeatedly, set up multiple submodel.

(4) for unknown sample, make arithmetic mean by predicting the outcome of multiple submodel, finally predicted the outcome.

The defining method of submodel number: given 500 sub-model number values, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error (Rootmeansquareerrorofprediction, RMSEP) along with the change of submodel number, the constant or almost Number of Models set up of constant (tending towards stability) Shi Weiying of RMSEP value.In this embodiment RMSEP along with submodel number change as shown in Figure 2, group pattern number is after 500, and RMSEP value is almost constant, so the Number of Models set up is 500.

The choosing method of some samples is: stator Number of Models is 500, by 5% ~ 100% of sample number, at interval of 5%, change the number (during non-integer, the method for truncating rounds) of the sample be selected, calculate RMSEP value, the sample number that RMSEP is minimum or corresponding when tending to be steady is the sample number chosen that at every turn circulates.In this embodiment RMSEP value along with training subset sample percentage change as shown in Figure 3, when training subset sample number reaches 20 ~ 100% of training sample sum, RMSEP value reaches minimum and almost constant, therefore, training subset sample number be total sample number 20 ~ 100% can, this example choose that training subset sample number is total number of samples 50%.

Extreme learning machine Optimum Excitation function and hidden layer nodes defining method are: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, Optimum Excitation function corresponding when RMSEP reaches minimum in this embodiment and hidden layer nodes are: excitation function sin and nodes 9.

In order to compare the stability of BaggingELM and ELM, two kinds of methods being reruned 20 times respectively, obtaining the change of RMSEP value with number of run, as shown in Figure 4.As can be seen from the figure, the rerun RMSEP fluctuation of 20 times of ELM algorithm is comparatively large, and the less stable of model is described; And less with the fluctuation predicted the outcome for 20 times of baggingELM algorithm, and substantially tend towards stability, show good stability; This shows that the stability repeatedly run of baggingELM obviously will be better than ELM.On the other hand, the RMSEP value of BaggingELM, always lower than ELM, shows the integrated precision of prediction that improve ELM algorithm of Bagging.Fig. 5 shows the relation that BaggingELM and ELM carries out mean predicted value that 20 computings predict forecast set and actual value respectively, and wherein vertical line represents the deviation run for 20 times, and as can be seen from the figure, BaggingELM deviation will be far smaller than the deviation of ELM.BaggingELM predict the outcome and related coefficient 0.9921 between actual value is greater than predicting the outcome of 0.9858, BaggingELM of ELM and the standard deviation between actual value 0.0001 is less than 0.0044 of ELM.This shows that BaggingELM improves the precision of prediction of ELM model, and has better stability.

Embodiment 2:

The present embodiment is applied to ultraviolet spectral analysis, measures ethanol component content value.Concrete step is as follows:

(1) gather the near infrared spectrum data of 95 ethanolic solution samples, wavelength coverage is 850-1049nm, and sampling interval is 1nm, comprises 200 wavelength points, and spectrum adopts HP8453 spectrophotometer.According on website to the division of data set, 65 samples are used as training set, and 30 samples are used as forecast set.

Repeat (2)-(3) step repeatedly, set up multiple submodel.

(4) for unknown sample, by the arithmetic mean that predicts the outcome of multiple submodel, finally predicted the outcome.

The defining method of submodel number: given 500 sub-model number values, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error (Rootmeansquareerrorofprediction, RMSEP) along with the change of submodel number, the constant or almost Number of Models set up of constant (tending towards stability) Shi Weiying of RMSEP value.In this embodiment RMSEP along with submodel number change as shown in Figure 6, group pattern number is after 500, and RMSEP value is almost constant, so the Number of Models set up is 500.

The choosing method of some samples is: stator Number of Models is 500, by 5% ~ 100% of sample number, at interval of 5%, change the number (during non-integer, the method for truncating rounds) of the sample be selected, calculate RMSEP value, the sample number that RMSEP is minimum or corresponding when tending to be steady is the sample number chosen that at every turn circulates.In this embodiment RMSEP value along with training subset sample percentage change as shown in Figure 7, when training subset sample number reaches 30 ~ 100% of training sample sum, RMSEP value reaches minimum and almost constant, therefore, training subset sample number be total sample number 30 ~ 100% can, this example choose that training subset sample number is total number of samples 60%.

Extreme learning machine Optimum Excitation function and hidden layer nodes defining method are: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, Optimum Excitation function corresponding when RMSEP reaches minimum in this embodiment and hidden layer nodes are: excitation function radbas and nodes 35.

In order to compare the stability of BaggingELM and ELM, two kinds of methods being reruned 20 times respectively, obtaining the change of RMSEP value with number of run, as shown in Figure 8.As can be seen from the figure, the rerun RMSEP fluctuation of 20 times of ELM algorithm is comparatively large, and the less stable of model is described; And less with the fluctuation predicted the outcome for 20 times of BaggingELM algorithm, and substantially tend towards stability, show good stability; The stability repeatedly run this demonstrating BaggingELM obviously will be better than ELM.On the other hand, the RMSEP value of BaggingELM, always lower than ELM, shows the integrated precision of prediction that improve ELM algorithm of Bagging.Fig. 9 shows the relation that BaggingELM and ELM carries out mean predicted value that 20 computings predict forecast set and actual value respectively, and wherein vertical line represents the deviation run for 20 times, and as can be seen from the figure, BaggingELM deviation will be far smaller than the deviation of ELM.BaggingELM predict the outcome and related coefficient 0.9988 between actual value is greater than predicting the outcome of 0.9957, BaggingELM of ELM and the standard deviation between actual value 0.00015 is less than 0.0023 of ELM.This demonstrate the precision of prediction that BaggingELM improves ELM model, and there is better stability.

Embodiment 3:

The present embodiment is applied to near-infrared spectrum analysis, measures the density value of diesel oil sample.Concrete step is as follows:

(1) near infrared spectrum data of 263 diesel fuel samples is gathered, wavelength coverage is 750-1550nm, sampling interval 2nm, comprise 401 wavelength points, data by US military SouthwestResearchInstitute (SWRI), SanAntonio, TXthroughEigenvectorResearch, Inc. (Manson, Washington) provides, and downloads network address: http://www.eigenvector.com/Data/SWRI).According on website to the division of data set, 142 samples are used as training set, and 121 samples are used as forecast set.

Repeat (2)-(3) step repeatedly, set up multiple submodel.

The defining method of submodel number: given 500 sub-model number values, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error (Rootmeansquareerrorofprediction, RMSEP) along with the change of submodel number, the constant or almost Number of Models set up of constant (tending towards stability) Shi Weiying of RMSEP value.In this embodiment RMSEP along with submodel number change as shown in Figure 10, group pattern number is after 500, and RMSEP value is almost constant, so the Number of Models set up is 500.

The choosing method of some samples is: stator Number of Models is 500, by 5% ~ 100% of sample number, at interval of 5%, change the number (during non-integer, the method for truncating rounds) of the sample be selected, calculate RMSEP value, the sample number that RMSEP is minimum or corresponding when tending to be steady is the sample number chosen that at every turn circulates.In this embodiment RMSEP value along with training subset sample percentage change as shown in figure 11, when training subset sample number reaches 40 ~ 100% of training sample sum, RMSEP value reaches minimum and almost constant, therefore, training subset sample number be total sample number 40 ~ 100% can, this example choose that training subset sample number is total number of samples 50%.

Extreme learning machine Optimum Excitation function and hidden layer nodes defining method are: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, Optimum Excitation function corresponding when RMSEP reaches minimum in this embodiment and hidden layer nodes are: excitation function tribas and nodes 48.

In order to compare the stability of BaggingELM and ELM, two kinds of methods being reruned 20 times respectively, obtaining the change of RMSEP value with number of run, as shown in figure 12.As can be seen from the figure, the rerun RMSEP fluctuation of 20 times of ELM algorithm is comparatively large, and the less stable of model is described; And less with the fluctuation predicted the outcome for 20 times of BaggingELM algorithm, and substantially tend towards stability, show good stability; The stability repeatedly run this demonstrating BaggingELM obviously will be better than ELM.On the other hand, the RMSEP value of BaggingELM, always lower than ELM, shows the integrated precision of prediction that improve ELM algorithm of Bagging.Figure 13 shows the relation that BaggingELM and ELM carries out mean predicted value that 20 computings predict forecast set and actual value respectively, and wherein vertical line represents the deviation run for 20 times, and as can be seen from the figure, BaggingELM deviation will be far smaller than the deviation of ELM.BaggingELM predict the outcome and related coefficient 0.9970 between actual value is greater than predicting the outcome of 0.9923, BaggingELM of ELM and the standard deviation between actual value 0.00014 is less than 0.0031 of ELM.This demonstrate the precision of prediction that BaggingELM improves ELM model, and there is better stability.

Embodiment 4:

The present embodiment is applied near infrared spectral transmission analysis, measures the content of blood sample.Concrete step is as follows:

(1) the near-infrared transmission data of 231 blood samples are gathered, wavelength coverage is 1100-2498nm, sampling interval 2nm, comprise 700 wavelength points, spectrum adopts NIRsystemsspectrometermodel6500 instrument (NIRsystems, Inc., SilverSprings, USA) reflective-mode measures, and downloads network address: http://www.idrc-chambersburg.org/shootout2010.html.According on website to the division of data set, 143 samples are used as training set, and 47 samples are used as forecast set.

Repeat (2)-(3) step repeatedly, set up multiple submodel.

The defining method of submodel number: given 500 sub-model number values, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error (Rootmeansquareerrorofprediction, RMSEP) along with the change of submodel number, the constant or almost Number of Models set up of constant (tending towards stability) Shi Weiying of RMSEP value.In this embodiment RMSEP along with submodel number change as shown in figure 14, group pattern number is after 500, and RMSEP value is almost constant, so the Number of Models set up is 500.

The choosing method of some samples is: stator Number of Models is 500, by 5% ~ 100% of sample number, at interval of 5%, change the number (during non-integer, the method for truncating rounds) of the sample be selected, calculate RMSEP value, the sample number that RMSEP is minimum or corresponding when tending to be steady is the sample number chosen that at every turn circulates.In this embodiment RMSEP value along with training subset sample percentage change as shown in figure 15, when training subset sample number reaches 60% of training sample sum, RMSEP value is minimum, and the training subset sample number chosen that at every turn circulates is 60% of total number of samples.

Extreme learning machine Optimum Excitation function and hidden layer nodes defining method are: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, Optimum Excitation function corresponding when RMSEP reaches minimum in this embodiment and hidden layer nodes are: excitation function sig and nodes 37.

In order to compare the stability of BaggingELM and ELM, two kinds of methods being reruned 20 times respectively, obtaining the change of RMSEP value with number of run, as shown in figure 16.As can be seen from the figure, the rerun RMSEP fluctuation of 20 times of ELM algorithm is comparatively large, and the less stable of model is described; And less with the fluctuation predicted the outcome for 20 times of BaggingELM algorithm, and substantially tend towards stability, show good stability; The stability repeatedly run this demonstrating BaggingELM obviously will be better than ELM.On the other hand, the RMSEP value of BaggingELM, always lower than ELM, shows the integrated precision of prediction that improve ELM algorithm of Bagging.Figure 17 shows the relation that BaggingELM and ELM carries out mean predicted value that 20 computings predict forecast set and actual value respectively, and wherein vertical line represents the deviation run for 20 times, and as can be seen from the figure, BaggingELM deviation will be far smaller than the deviation of ELM.BaggingELM predict the outcome and related coefficient 0.9774 between actual value is greater than predicting the outcome of 0.9432, baggingELM of ELM and the standard deviation between actual value 0.0008 is less than 0.0268 of ELM.This demonstrate the precision of prediction that baggingELM improves ELM model, and there is better stability.

Embodiment 5:

The present embodiment is applied to near-infrared spectrum analysis, measures the content value of Virginian-type cigarette Powder samples chlorine.Concrete step is as follows:

(1) gather the near infrared spectrum data of the Virginian-type cigarette product of 58 trades mark, wave-number range is 4000 ~ 9000cm ^-1, sampling interval is 1cm ^-1, totally 5001 wavelength points, cigarette is prepared into powdered sample according to YC/331-1996, and particle mean size is 0.45mm, and spectrum adopts Vector22/NFT-NIRSystem (Bruker) spectrophotometer.In sample, the content of chlorine adopts AutoAnalyzerIII type Continuous Flow Analysis instrument to measure according to standard method.Adopt KS group technology to divide data set, 38 samples are as training set, and 20 samples are as forecast set.

Repeat (2)-(3) step repeatedly, set up multiple submodel.

The defining method of submodel number: given 500 sub-model number values, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error (Rootmeansquareerrorofprediction, RMSEP) along with the change of submodel number, the constant or almost Number of Models set up of constant (tending towards stability) Shi Weiying of RMSEP value.In this embodiment RMSEP along with submodel number change as shown in figure 18, group pattern number is after 500, and RMSEP value is almost constant, so the Number of Models set up is 500.

The choosing method of some samples is: stator Number of Models is 500, by 5% ~ 100% of sample number, at interval of 5%, change the number (during non-integer, the method for truncating rounds) of the sample be selected, calculate RMSEP value, the sample number that RMSEP is minimum or corresponding when tending to be steady is the sample number chosen that at every turn circulates.In this embodiment RMSEP value along with training subset sample percentage change as shown in figure 19, when training subset sample number reaches 60 ~ 100% of training sample sum, RMSEP value reaches minimum and almost constant, therefore, training subset sample number be total sample number 60 ~ 100% can, this example choose that training subset sample number is total number of samples 60%.

Extreme learning machine Optimum Excitation function and hidden layer nodes defining method are: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, Optimum Excitation function corresponding when RMSEP reaches minimum in this embodiment and hidden layer nodes are: excitation function sig and nodes 69.

In order to compare the stability of BaggingELM and ELM, two kinds of methods being reruned 20 times respectively, obtaining the change of RMSEP value with number of run, as shown in figure 20.As can be seen from the figure, the rerun RMSEP fluctuation of 20 times of ELM algorithm is comparatively large, and the less stable of model is described; And less with the fluctuation predicted the outcome for 20 times of BaggingELM algorithm, and substantially tend towards stability, show good stability; The stability repeatedly run this demonstrating BaggingELM obviously will be better than ELM.On the other hand, the RMSEP value of BaggingELM, always lower than ELM, shows the integrated precision of prediction that improve ELM algorithm of Bagging.Figure 21 shows the relation that BaggingELM and ELM carries out mean predicted value that 20 computings predict forecast set and actual value respectively, and wherein vertical line represents the deviation run for 20 times, and as can be seen from the figure, BaggingELM deviation will be far smaller than the deviation of ELM.BaggingELM predict the outcome and related coefficient 0.9762 between actual value is greater than predicting the outcome of 0.9635, BaggingELM of ELM and the standard deviation between actual value 0.0007 is less than 0.0062 of ELM.This demonstrate the precision of prediction that BaggingELM improves ELM model, and there is better stability.

Above-described embodiment illustrates that the method is that complex material quantitative test provides a kind of complex sample quantitative test modeling method based on Bagging extreme learning machine, and this method can improve stability and the precision of prediction of model greatly.

Claims

1. a Bagging extreme learning machine integrated modelling approach, is characterized in that concrete steps are:

(1) gather the spectroscopic data of measured object sample, measure the content of the tested composition of sample; Sample set is divided into training set and forecast set;

Repeat step (2), step (3) repeatedly, set up N number of submodel;

2. Bagging extreme learning machine integrated modelling approach according to claim 1, it is characterized in that: the defining method of the number of described N number of submodel is as follows: a given enough large submodel number value, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error, be designated as RMSEP, and observe the change of RMSEP along with submodel number; When RMSEP value is constant or tend towards stability, the submodel number corresponding to it is required number N.

3. Bagging extreme learning machine integrated modelling approach according to claim 2, it is characterized in that: the defining method of the number of samples of described training subset is as follows: stator Number of Models, by 5% ~ 100% of sample number, at interval of 5%, change the number of the sample be selected, during non-integer, the method for truncating rounds, and calculates RMSEP value, the number of samples that RMSEP is minimum or corresponding when tending to be steady, is the number of samples chosen that at every turn circulates.

4. Bagging extreme learning machine integrated modelling approach according to claim 3, it is characterized in that: describedly determine that the concrete grammar of extreme learning machine Optimum Excitation function and hidden layer nodes is as follows: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, when RMSEP reaches minimum, the excitation function corresponding to RMSEP and hidden layer nodes are optimal parameter.