CN105117525B - Bagging extreme learning machine integrated modelling approach - Google Patents

Bagging extreme learning machine integrated modelling approach Download PDF

Info

Publication number
CN105117525B
CN105117525B CN201510466504.7A CN201510466504A CN105117525B CN 105117525 B CN105117525 B CN 105117525B CN 201510466504 A CN201510466504 A CN 201510466504A CN 105117525 B CN105117525 B CN 105117525B
Authority
CN
China
Prior art keywords
sample
rmsep
bagging
elm
extreme learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510466504.7A
Other languages
Chinese (zh)
Other versions
CN105117525A (en
Inventor
卞希慧
李淑娟
谭小耀
王江江
王治国
刘维国
陈宗蓬
王晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Sui Hua Industrial Ltd By Share Ltd
Tianjin Polytechnic University
Original Assignee
Shanghai Sui Hua Industrial Ltd By Share Ltd
Tianjin Polytechnic University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sui Hua Industrial Ltd By Share Ltd, Tianjin Polytechnic University filed Critical Shanghai Sui Hua Industrial Ltd By Share Ltd
Priority to CN201510466504.7A priority Critical patent/CN105117525B/en
Publication of CN105117525A publication Critical patent/CN105117525A/en
Application granted granted Critical
Publication of CN105117525B publication Critical patent/CN105117525B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention belongs to chemometric techniques field, is specially Bagging extreme learning machine integrated modelling approach.The present invention's concretely comprises the following steps:Measured object sample spectrum data are gathered, measure sample is tested the content of component;Sample set is divided into training set and forecast set;Boostrap resamplings are carried out to training set sample, randomly select certain amount sample as a training subset;With the Sample Establishing extreme learning machine submodel of training subset;Repeatedly, multiple submodels are established;For unknown sample, by the prediction result simple average of multiple submodels, final prediction result is obtained.Compared with ELM methods, the method for the present invention has a clear superiority in terms of precision of prediction and stability.The present invention is suitable for the complex material quantitative analysis fields such as oil, tobacco, food, Chinese medicine.

Description

Bagging extreme learning machine integrated modelling approach
Technical field
The invention belongs to chemometric techniques field, and in particular to Bagging extreme learning machine integrated modelling approach.
Background technology
Artificial neural network has been widely used because of its powerful adaptive, self-organizing, self study and non-linear mapping capability In biology, chemistry, medicine, the various fields of economic dispatch.But traditional Learning Algorithm(Such as BP algorithm)Need artificial Substantial amounts of network training parameter is set, and training speed is slow, is easy to produce locally optimal solution.2004, Singapore's Nan Yang science and engineering University professor Huang Guangbin proposes a kind of new algorithm of Single hidden layer feedforward neural networks, is named as extreme learning machine(Extreme Learning Machine, ELM).The core of ELM algorithms is to be changing into the training problem of neutral net to solve least square Problem, avoids the defects of artificial neural network needs artificial adjusting parameter and is easy to be absorbed in locally optimal solution.ELM algorithms because The features such as it is simple easily to realize, pace of learning is fast, generalization ability is strong, receives more and more attention in recent years, analytical chemistry, The multiple fields such as control engineering, image recognition are applied.But the biasing of the input weight and hidden neuron due to ELM be with What machine was set so that the operation result of model has unstability.
Integrated moulding technology obtains final prediction result by the way that the result of multiple models is merged, and it is pre- to improve model The precision and stability of survey.Bagging is as a kind of common integrated modelling approach, with " bootstrap " method from training set The middle random selection multiple submodels of part Sample Establishing, then multiple submodel prediction results are averaged to obtain finally prediction knot Fruit.On the one hand this method increases the diversity factor of integrated moulding by choosing training set again, on the other hand multiple pre- by merging Survey the precision of prediction that result improves basic model.
The present invention combines the advantage of ELM and Bagging, it is proposed that the ELM integrated modelling approach based on Bagging, is used in combination In complex sample quantitative analysis, the advantage that ELM calculating speeds are fast, predictive ability is strong was not only remained, but also to overcome ELM stability poor The shortcomings that.
The content of the invention
It is an object of the invention to propose that a kind of stability is good, precision of prediction is high Bagging extreme learning machines integrate to build Mould method.
The present invention is by Bagging algorithms and extreme learning machine model(ELM)It is combined, establishes the pole based on Bagging Limit learning machine integrated approach(It is denoted as Bagging ELM), its flow is as shown in Figure 1, concretely comprise the following steps:
(1)The spectroscopic data of measured object sample is gathered, the content of component is tested with conventional method measure sample;By sample set It is divided into training set and forecast set;
(2)Boostrap resamplings are carried out to training set sample, randomly select certain amount sample as training Collection;
(3)Extreme learning machine Optimum Excitation function and hidden layer number of nodes are determined, with the Sample Establishing limit of training subset Learning machine submodel;
Repeat step(2), step(3)Repeatedly, N number of submodel is established;
(4)For unknown sample, by making arithmetic average to the prediction result of multiple submodels, finally prediction knot is obtained Fruit.
In the present invention, the definite method of the number of N number of submodel is as follows:Give a sufficiently large submodel number Value, the training subset sample number of fixed each data set is the 50% of total number of samples, calculates predicted root mean square error(Root mean square error of prediction, RMSEP), and observe changes of the RMSEP with submodel number;When RMSEP values It is constant or almost unchanged(Tend towards stability)When, the submodel number corresponding to it is required number N.
In the present invention, the definite method of the number of samples of training subset is as follows:Stator Number of Models, by the 5% of sample number ~ 100%, at interval of 5%, change the number for the sample being selected(The method that truncates during non-integer rounding), RMSEP values are calculated, RMSEP is most Small or corresponding number of samples when tending to be steady, is the number of samples that circulation is chosen every time.
In the present invention, determine that the specific method of extreme learning machine Optimum Excitation function and hidden layer number of nodes is as follows:According to The RMSEP values of training set spectrum are with the change for choosing excitation function and hidden layer number of nodes, when RMSEP reaches minimum, RMSEP Corresponding excitation function and hidden layer number of nodes is optimal parameter.
It is an advantage of the invention that:The modeling method combines the advantage of integrated moulding technology Bagging and extreme learning machine, The precision of prediction and stability of extreme learning machine algorithm are improved, a kind of new build is provided for the analysis of complex material Multivariate Correction Mould method.The method of the present invention is widely portable to the complex material quantitative analysis in the fields such as oil, tobacco, food, Chinese medicine.
Brief description of the drawings
Fig. 1 is the flow chart of Bagging extreme learning machines.
Fig. 2 is the predicted root mean square error value of the ultraviolet data of fuel oil with the change of submodel number.
Fig. 3 is the predicted root mean square error value of the ultraviolet data of fuel oil with the change of training subset sample percentage.
Fig. 4 be the ultraviolet data extreme learning machine of fuel oil and Bagging extreme learning machines predicted root mean square error value with Predict the change of number.
Fig. 5 is the consensus forecast that the ultraviolet data Bagging extreme learning machines of fuel oil and extreme learning machine predict forecast set Value and the relation of actual value, wherein (a) and (b) is respectively Bagging extreme learning machines and extreme learning machine.
Fig. 6 is the predicted root mean square error value of ethanol solution near-infrared data with the change of submodel number.
Fig. 7 is the predicted root mean square error value of ethanol solution near-infrared data with the change of training subset sample percentage Change.
Fig. 8 is the predicted root mean square error of ethanol solution near-infrared data extreme learning machine and Bagging extreme learning machines Value is with the change of prediction number.
Fig. 9 is that ethanol solution near-infrared data Bagging extreme learning machines and extreme learning machine put down forecast set prediction The relation of equal predicted value and actual value, wherein (a) and (b) is respectively Bagging extreme learning machines and extreme learning machine.
Figure 10 is the predicted root mean square error value of diesel oil near-infrared data with the change of submodel number.
Figure 11 is the predicted root mean square error value of diesel oil near-infrared data with the change of training subset sample percentage.
Figure 12 be diesel oil near-infrared data extreme learning machine and Bagging extreme learning machines predicted root mean square error value with The change of prediction number.
Figure 13 is that it is averagely pre- to predict forecast set for diesel oil near-infrared data Bagging extreme learning machines and extreme learning machine The relation of measured value and actual value, wherein (a) and (b) is respectively Bagging extreme learning machines and extreme learning machine.
Figure 14 is the predicted root mean square error value of blood near-infrared data with the change of submodel number.
Figure 15 is the predicted root mean square error value of blood near-infrared data with the change of training subset sample percentage.
Figure 16 be blood near-infrared data extreme learning machine and Bagging extreme learning machines predicted root mean square error value with The change of prediction number.
Figure 17 is that it is averagely pre- to predict forecast set for blood near-infrared data Bagging extreme learning machines and extreme learning machine The relation of measured value and actual value, wherein (a) and (b) is respectively Bagging extreme learning machines and extreme learning machine.
Figure 18 is the predicted root mean square error value of cigarette near-infrared data with the change of submodel number.
Figure 19 is the predicted root mean square error value of cigarette near-infrared data with the change of training subset sample percentage.
Figure 20 be cigarette near-infrared data extreme learning machine and Bagging extreme learning machines predicted root mean square error value with The change of prediction number.
Figure 21 is that it is averagely pre- to predict forecast set for cigarette near-infrared data Bagging extreme learning machines and extreme learning machine The relation of measured value and actual value, wherein (a) and (b) is respectively Bagging extreme learning machines and extreme learning machine.
Embodiment
To more fully understand the present invention, the present invention will be described in further detail with reference to the following examples, but this hair Bright claimed scope is not limited to the scope of embodiment expression.
Embodiment 1:
The present embodiment is to be applied to ultraviolet spectral analysis, and monoaromatics content value in fuel oil sample is surveyed It is fixed.Specific step is as follows:
(1)The ultraviolet spectrum data of 115 fuel oil samples is gathered, wave-length coverage 200-400nm, the sampling interval is 0.35nm, comprising 572 wavelength points, spectrum uses 3 UV-visible spectrophometer spectrometers of Varian Cary Measure.Monoaromatics content is measured using HPG1205A supercritical fluid chromatographies, and carbon dioxide is as carrier gas, stream Speed is 2mLmin-1, furnace temperature 35oC, outlet pressure 150bar, detector is flame ionic detector.According to logarithm on website According to the division of collection, 70 samples are used as training set, and 45 samples are used as forecast set.
(2)Boostrap resamplings are carried out to training set sample, randomly select certain amount sample as training Collection.
(3)Extreme learning machine Optimum Excitation function and hidden layer number of nodes are determined, with the Sample Establishing limit of training subset Learning machine submodel.
Repeat(2)-(3)Step is multiple, establishes multiple submodels.
(4)For unknown sample, arithmetic average is made by the prediction result of multiple submodels, obtains final prediction result.
The definite method of submodel number:Give 500 sub- model number values, the training subset sample of fixed each data set This number is the 50% of total number of samples, calculates predicted root mean square error(Root mean square error of prediction, RMSEP)With the change of submodel number, RMSEP values are constant or almost unchanged(Tend towards stability)The model that Shi Weiying is established Number.In the embodiment RMSEP with submodel number change as shown in Fig. 2, group pattern number be 500 after, RMSEP values It is almost unchanged, so the Number of Models established is 500.
The choosing method of certain amount sample is:Stator Number of Models is 500, by the 5% ~ 100% of sample number, at interval of 5%, change the number for the sample being selected(The method that truncates during non-integer rounding), RMSEP values are calculated, RMSEP is minimum or tends to be steady When corresponding sample number for every time circulation choose sample number.RMSEP values are with training subset sample percentage in the embodiment Change as shown in figure 3, when training subset sample number reach training sample sum 20 ~ 100% when, RMSEP values reach minimum and It is almost unchanged, therefore, training subset sample number for total sample number 20 ~ 100% can, this example choose training subset sample number For the 50% of total number of samples.
Extreme learning machine Optimum Excitation function and hidden layer number of nodes determine that method is:According to the RMSEP of training set spectrum Value is corresponding when RMSEP reaches minimum in the embodiment optimal to swash with the change for choosing excitation function and hidden layer number of nodes Encourage function and hidden layer number of nodes is:Excitation function sin and number of nodes 9.
In order to compare the stability of Bagging ELM and ELM, rerun respectively to two methods 20 times, obtain RMSEP It is worth the change with number of run, as shown in Figure 4.It can be seen from the figure that ELM algorithms rerun 20 times RMSEP fluctuation compared with Greatly, illustrate that the stability of model is poor;And it is smaller with the fluctuation of 20 prediction results of bagging ELM algorithms, and become substantially In stabilization, good stability is shown;This shows that the stability being run multiple times of bagging ELM will be substantially better than ELM.Separately On the one hand, the RMSEP values of Bagging ELM are always less than ELM, show that Bagging integrates the prediction essence for improving ELM algorithms Degree.Fig. 5 shows that Bagging ELM and ELM carry out mean predicted value and actual value that forecast set is predicted in 20 computings respectively Relation, wherein vertical line represent 20 times operation deviations, it can be seen from the figure that Bagging ELM deviations will be far smaller than ELM Deviation.Related coefficient 0.9921 between the prediction result and actual value of Bagging ELM is more than the 0.9858 of ELM, Standard deviation 0.0001 between the prediction result and actual value of Bagging ELM is less than the 0.0044 of ELM.This shows Bagging ELM improve the precision of prediction of ELM models, and have more preferable stability.
Embodiment 2:
The present embodiment is to be applied to ultraviolet spectral analysis, and ethanol component content value is measured.Specific step is as follows:
(1)Gather the near infrared spectrum data of 95 ethanol solution samples, wave-length coverage 850-1049nm, sampling interval For 1nm, including 200 wavelength points, spectrum is measured using 8453 spectrometers of HP.According to the division on website to data set, 65 Sample is used as training set, and 30 samples are used as forecast set.
(2)Boostrap resamplings are carried out to training set sample, randomly select certain amount sample as training Collection.
(3)Extreme learning machine Optimum Excitation function and hidden layer number of nodes are determined, with the Sample Establishing limit of training subset Learning machine submodel.
Repeat(2)-(3)Step is multiple, establishes multiple submodels.
(4)For unknown sample, by the prediction result arithmetic average of multiple submodels, final prediction result is obtained.
The definite method of submodel number:Give 500 sub- model number values, the training subset sample of fixed each data set This number is the 50% of total number of samples, calculates predicted root mean square error(Root mean square error of prediction, RMSEP)With the change of submodel number, RMSEP values are constant or almost unchanged(Tend towards stability)The model that Shi Weiying is established Number.In the embodiment RMSEP with submodel number change as shown in fig. 6, group pattern number be 500 after, RMSEP values It is almost unchanged, so the Number of Models established is 500.
The choosing method of certain amount sample is:Stator Number of Models is 500, by the 5% ~ 100% of sample number, at interval of 5%, change the number for the sample being selected(The method that truncates during non-integer rounding), RMSEP values are calculated, RMSEP is minimum or tends to be steady When corresponding sample number for every time circulation choose sample number.RMSEP values are with training subset sample percentage in the embodiment Change as shown in fig. 7, when training subset sample number reach training sample sum 30 ~ 100% when, RMSEP values reach minimum and It is almost unchanged, therefore, training subset sample number for total sample number 30 ~ 100% can, this example choose training subset sample number For the 60% of total number of samples.
Extreme learning machine Optimum Excitation function and hidden layer number of nodes determine that method is:According to the RMSEP of training set spectrum Value is corresponding when RMSEP reaches minimum in the embodiment optimal to swash with the change for choosing excitation function and hidden layer number of nodes Encourage function and hidden layer number of nodes is:Excitation function radbas and number of nodes 35.
In order to compare the stability of Bagging ELM and ELM, rerun respectively to two methods 20 times, obtain RMSEP It is worth the change with number of run, as shown in Figure 8.It can be seen from the figure that ELM algorithms rerun 20 times RMSEP fluctuation compared with Greatly, illustrate that the stability of model is poor;And it is smaller with the fluctuation of 20 prediction results of Bagging ELM algorithms, and become substantially In stabilization, good stability is shown;The stability being run multiple times this demonstrate Bagging ELM will be substantially better than ELM. On the other hand, the RMSEP values of Bagging ELM are always less than ELM, show that Bagging integrates the prediction essence for improving ELM algorithms Degree.Fig. 9 shows that Bagging ELM and ELM carry out mean predicted value and actual value that forecast set is predicted in 20 computings respectively Relation, wherein vertical line represent 20 times operation deviations, it can be seen from the figure that Bagging ELM deviations will be far smaller than ELM Deviation.Related coefficient 0.9988 between the prediction result and actual value of Bagging ELM is more than the 0.9957 of ELM, Standard deviation 0.00015 between the prediction result and actual value of Bagging ELM is less than the 0.0023 of ELM.This demonstrate Bagging ELM improve the precision of prediction of ELM models, and have more preferable stability.
Embodiment 3:
The present embodiment is to be applied to near-infrared spectrum analysis, and the density value of diesel oil sample is measured.Specific step It is as follows:
(1)Gather the near infrared spectrum data of 263 diesel fuel samples, wave-length coverage 750-1550nm, sampling Be spaced 2nm, including 401 wavelength points, data by US military Southwest Research Institute (SWRI), San Antonio, TX through Eigenvector Research, Inc. (Manson, Washington) are provided, Download network address:http://www.eigenvector.com/Data/SWRI).According to the division on website to data set, 142 Sample is used as training set, and 121 samples are used as forecast set.
(2)Boostrap resamplings are carried out to training set sample, randomly select certain amount sample as training Collection.
(3)Extreme learning machine Optimum Excitation function and hidden layer number of nodes are determined, with the Sample Establishing limit of training subset Learning machine submodel.
Repeat(2)-(3)Step is multiple, establishes multiple submodels.
(4)For unknown sample, by the prediction result arithmetic average of multiple submodels, final prediction result is obtained.
The definite method of submodel number:Give 500 sub- model number values, the training subset sample of fixed each data set This number is the 50% of total number of samples, calculates predicted root mean square error(Root mean square error of prediction, RMSEP)With the change of submodel number, RMSEP values are constant or almost unchanged(Tend towards stability)The model that Shi Weiying is established Number.In the embodiment RMSEP with the change of submodel number it is as shown in Figure 10, group pattern number be 500 after, RMSEP Be worth it is almost unchanged, so establish Number of Models be 500.
The choosing method of certain amount sample is:Stator Number of Models is 500, by the 5% ~ 100% of sample number, at interval of 5%, change the number for the sample being selected(The method that truncates during non-integer rounding), RMSEP values are calculated, RMSEP is minimum or tends to be steady When corresponding sample number for every time circulation choose sample number.RMSEP values are with training subset sample percentage in the embodiment Change it is as shown in figure 11, when training subset sample number reach training sample sum 40 ~ 100% when, RMSEP values reach minimum And it is almost unchanged, therefore, training subset sample number for total sample number 40 ~ 100% can, this example choose training subset sample Number is the 50% of total number of samples.
Extreme learning machine Optimum Excitation function and hidden layer number of nodes determine that method is:According to the RMSEP of training set spectrum Value is corresponding when RMSEP reaches minimum in the embodiment optimal to swash with the change for choosing excitation function and hidden layer number of nodes Encourage function and hidden layer number of nodes is:Excitation function tribas and number of nodes 48.
In order to compare the stability of Bagging ELM and ELM, rerun respectively to two methods 20 times, obtain RMSEP It is worth the change with number of run, as shown in figure 12.It can be seen from the figure that ELM algorithms rerun 20 times RMSEP fluctuation compared with Greatly, illustrate that the stability of model is poor;And it is smaller with the fluctuation of 20 prediction results of Bagging ELM algorithms, and become substantially In stabilization, good stability is shown;The stability being run multiple times this demonstrate Bagging ELM will be substantially better than ELM. On the other hand, the RMSEP values of Bagging ELM are always less than ELM, show that Bagging integrates the prediction essence for improving ELM algorithms Degree.Figure 13 shows that Bagging ELM and ELM carry out mean predicted value and actual value that forecast set is predicted in 20 computings respectively Relation, wherein vertical line represent 20 times operation deviations, it can be seen from the figure that Bagging ELM deviations will be far smaller than ELM Deviation.Related coefficient 0.9970 between the prediction result and actual value of Bagging ELM is more than the 0.9923 of ELM, Standard deviation 0.00014 between the prediction result and actual value of Bagging ELM is less than the 0.0031 of ELM.This demonstrate Bagging ELM improve the precision of prediction of ELM models, and have more preferable stability.
Embodiment 4:
The present embodiment is to be applied near infrared spectral transmission to analyze, and the content of blood sample is measured.Specific step It is rapid as follows:
(1)Gather the near-infrared transmission data of 231 blood samples, wave-length coverage 1100-2498nm, sampling interval 2nm, including 700 wavelength points, spectrum use 6500 instruments of NIR systems spectrometer model(NIR systems, Inc., Silver Springs, USA)Reflective-mode measures, and downloads network address:http://www.idrc- chambersburg.org/shootout2010.html.According to the division on website to data set, 143 samples are used as training Collection, 47 samples are used as forecast set.
(2)Boostrap resamplings are carried out to training set sample, randomly select certain amount sample as training Collection.
(3)Extreme learning machine Optimum Excitation function and hidden layer number of nodes are determined, with the Sample Establishing limit of training subset Learning machine submodel.
Repeat(2)-(3)Step is multiple, establishes multiple submodels.
(4)For unknown sample, arithmetic average is made by the prediction result of multiple submodels, obtains final prediction result.
The definite method of submodel number:Give 500 sub- model number values, the training subset sample of fixed each data set This number is the 50% of total number of samples, calculates predicted root mean square error(Root mean square error of prediction, RMSEP)With the change of submodel number, RMSEP values are constant or almost unchanged(Tend towards stability)The model that Shi Weiying is established Number.In the embodiment RMSEP with the change of submodel number it is as shown in figure 14, group pattern number be 500 after, RMSEP Be worth it is almost unchanged, so establish Number of Models be 500.
The choosing method of certain amount sample is:Stator Number of Models is 500, by the 5% ~ 100% of sample number, at interval of 5%, change the number for the sample being selected(The method that truncates during non-integer rounding), RMSEP values are calculated, RMSEP is minimum or tends to be steady When corresponding sample number for every time circulation choose sample number.RMSEP values are with training subset sample percentage in the embodiment Change it is as shown in figure 15, when training subset sample number reaches the 60% of training sample sum, RMSEP values are minimum, circulate every time The training subset sample number of selection is the 60% of total number of samples.
Extreme learning machine Optimum Excitation function and hidden layer number of nodes determine that method is:According to the RMSEP of training set spectrum Value is corresponding when RMSEP reaches minimum in the embodiment optimal to swash with the change for choosing excitation function and hidden layer number of nodes Encourage function and hidden layer number of nodes is:Excitation function sig and number of nodes 37.
In order to compare the stability of Bagging ELM and ELM, rerun respectively to two methods 20 times, obtain RMSEP It is worth the change with number of run, as shown in figure 16.It can be seen from the figure that ELM algorithms rerun 20 times RMSEP fluctuation compared with Greatly, illustrate that the stability of model is poor;And it is smaller with the fluctuation of 20 prediction results of Bagging ELM algorithms, and become substantially In stabilization, good stability is shown;The stability being run multiple times this demonstrate Bagging ELM will be substantially better than ELM. On the other hand, the RMSEP values of Bagging ELM are always less than ELM, show that Bagging integrates the prediction essence for improving ELM algorithms Degree.Figure 17 shows that Bagging ELM and ELM carry out mean predicted value and actual value that forecast set is predicted in 20 computings respectively Relation, wherein vertical line represent 20 times operation deviations, it can be seen from the figure that Bagging ELM deviations will be far smaller than ELM Deviation.Related coefficient 0.9774 between the prediction result and actual value of Bagging ELM is more than the 0.9432 of ELM, Standard deviation 0.0008 between the prediction result and actual value of bagging ELM is less than the 0.0268 of ELM.This demonstrate Bagging ELM improve the precision of prediction of ELM models, and have more preferable stability.
Embodiment 5:
The present embodiment is to be applied to near-infrared spectrum analysis, and the content value of Virginian-type cigarette Powder samples chlorine is surveyed It is fixed.Specific step is as follows:
(1)The near infrared spectrum data of the Virginian-type cigarette product of 58 trades mark is gathered, wave-number range is 4000~9000 cm-1, the sampling interval is 1 cm-1, totally 5001 wavelength points, cigarette be prepared into powder sample according to YC/331-1996, average grain Spend and measured for 0.45 mm, spectrum using Vector 22/N FT-NIR System (Bruker) spectrometer.Chlorine contains in sample Amount is measured using Auto Analyzer type III Continuous Flow Analysis instrument according to standard method.Using KS group technologies to data Collection is divided, and 38 samples are as training set, and 20 samples are as forecast set.
(2)Boostrap resamplings are carried out to training set sample, randomly select certain amount sample as training Collection.
(3)Extreme learning machine Optimum Excitation function and hidden layer number of nodes are determined, with the Sample Establishing limit of training subset Learning machine submodel.
Repeat(2)-(3)Step is multiple, establishes multiple submodels.
(4)For unknown sample, arithmetic average is made by the prediction result of multiple submodels, obtains final prediction result.
The definite method of submodel number:Give 500 sub- model number values, the training subset sample of fixed each data set This number is the 50% of total number of samples, calculates predicted root mean square error(Root mean square error of prediction, RMSEP)With the change of submodel number, RMSEP values are constant or almost unchanged(Tend towards stability)The model that Shi Weiying is established Number.In the embodiment RMSEP with the change of submodel number it is as shown in figure 18, group pattern number be 500 after, RMSEP Be worth it is almost unchanged, so establish Number of Models be 500.
The choosing method of certain amount sample is:Stator Number of Models is 500, by the 5% ~ 100% of sample number, at interval of 5%, change the number for the sample being selected(The method that truncates during non-integer rounding), RMSEP values are calculated, RMSEP is minimum or tends to be steady When corresponding sample number for every time circulation choose sample number.RMSEP values are with training subset sample percentage in the embodiment Change it is as shown in figure 19, when training subset sample number reach training sample sum 60 ~ 100% when, RMSEP values reach minimum And it is almost unchanged, therefore, training subset sample number for total sample number 60 ~ 100% can, this example choose training subset sample Number is the 60% of total number of samples.
Extreme learning machine Optimum Excitation function and hidden layer number of nodes determine that method is:According to the RMSEP of training set spectrum Value is corresponding when RMSEP reaches minimum in the embodiment optimal to swash with the change for choosing excitation function and hidden layer number of nodes Encourage function and hidden layer number of nodes is:Excitation function sig and number of nodes 69.
In order to compare the stability of Bagging ELM and ELM, rerun respectively to two methods 20 times, obtain RMSEP It is worth the change with number of run, as shown in figure 20.It can be seen from the figure that ELM algorithms rerun 20 times RMSEP fluctuation compared with Greatly, illustrate that the stability of model is poor;And it is smaller with the fluctuation of 20 prediction results of Bagging ELM algorithms, and become substantially In stabilization, good stability is shown;The stability being run multiple times this demonstrate Bagging ELM will be substantially better than ELM. On the other hand, the RMSEP values of Bagging ELM are always less than ELM, show that Bagging integrates the prediction essence for improving ELM algorithms Degree.Figure 21 shows that Bagging ELM and ELM carry out mean predicted value and actual value that forecast set is predicted in 20 computings respectively Relation, wherein vertical line represent 20 times operation deviations, it can be seen from the figure that Bagging ELM deviations will be far smaller than ELM Deviation.Related coefficient 0.9762 between the prediction result and actual value of Bagging ELM is more than the 0.9635 of ELM, Standard deviation 0.0007 between the prediction result and actual value of Bagging ELM is less than the 0.0062 of ELM.This demonstrate Bagging ELM improve the precision of prediction of ELM models, and have more preferable stability.
Above-described embodiment illustrates that this method provides one kind for complex material quantitative analysis and is based on Bagging extreme learning machines Complex sample quantitative analysis modeling method, this method can greatly improve the stability and precision of prediction of model.

Claims (2)

1. a kind of Bagging extreme learning machines integrated modelling approach, it is characterised in that concretely comprise the following steps:
(1)The spectroscopic data of measured object sample is gathered, measure sample is tested the content of component;By sample set be divided into training set and Forecast set;
(2)Boostrap resamplings are carried out to training set sample, randomly select certain amount sample as a training subset;
The definite method of the number of samples of training subset is as follows:Stator Number of Models, by the 5% ~ 100% of sample number, at interval of 5%, change the number for the sample being selected, when non-integer truncates method rounding, calculates RMSEP values, RMSEP minimums or tends to be steady When corresponding number of samples, be every time circulation choose number of samples;
(3)Determine extreme learning machine Optimum Excitation function and hidden layer number of nodes, learnt with the Sample Establishing limit of training subset Loom model;
Determine that the specific method of extreme learning machine Optimum Excitation function and hidden layer number of nodes is as follows:According to training set spectrum RMSEP values are with the change for choosing excitation function and hidden layer number of nodes, when RMSEP reaches minimum, the excitation corresponding to RMSEP Function and hidden layer number of nodes are optimal parameter;
Repeat step(2), step(3)Repeatedly, N number of submodel is established;(4)For unknown sample, by multiple submodels Prediction result makees arithmetic average, obtains final prediction result.
2. Bagging extreme learning machines integrated modelling approach according to claim 1, it is characterised in that:N number of submodule The definite method of the number of type is as follows:Give a sufficiently large submodel number value, the training subset of fixed each data set Sample number is the 50% of total number of samples, calculates predicted root mean square error, is denoted as RMSEP, and observe RMSEP with submodel number Change;When RMSEP values are constant or tend towards stability, the submodel number corresponding to it is required number N.
CN201510466504.7A 2015-07-31 2015-07-31 Bagging extreme learning machine integrated modelling approach Active CN105117525B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510466504.7A CN105117525B (en) 2015-07-31 2015-07-31 Bagging extreme learning machine integrated modelling approach

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510466504.7A CN105117525B (en) 2015-07-31 2015-07-31 Bagging extreme learning machine integrated modelling approach

Publications (2)

Publication Number Publication Date
CN105117525A CN105117525A (en) 2015-12-02
CN105117525B true CN105117525B (en) 2018-05-15

Family

ID=54665513

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510466504.7A Active CN105117525B (en) 2015-07-31 2015-07-31 Bagging extreme learning machine integrated modelling approach

Country Status (1)

Country Link
CN (1) CN105117525B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650926B (en) * 2016-09-14 2019-04-16 天津工业大学 A kind of steady boosting extreme learning machine integrated modelling approach
CN106529680B (en) * 2016-10-27 2019-01-29 天津工业大学 A kind of multiple dimensioned extreme learning machine integrated modelling approach based on empirical mode decomposition
CN107356556A (en) * 2017-07-10 2017-11-17 天津工业大学 A kind of double integrated modelling approach of Near-Infrared Spectra for Quantitative Analysis
CN109325516B (en) * 2018-08-13 2021-02-02 众安信息技术服务有限公司 Image classification-oriented ensemble learning method and device
CN113094892A (en) * 2021-04-02 2021-07-09 辽宁石油化工大学 Oil concentration prediction method based on data elimination and local partial least squares
CN117150877A (en) * 2022-05-23 2023-12-01 北京理工大学 Method for predicting optimal pressing process of press-loading mixed explosive based on Bagging algorithm
CN115691703A (en) * 2022-10-15 2023-02-03 苏州创腾软件有限公司 Drug property prediction method and system based on pharmacokinetic model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226728A (en) * 2013-04-07 2013-07-31 北京化工大学 Intelligent detection and yield optimization method for HDPE (high density polyethylene) cascade polymerization reaction course
CN103528990A (en) * 2013-10-31 2014-01-22 天津工业大学 Method for establishing multiple models of near infrared spectrums
CN103593550A (en) * 2013-08-12 2014-02-19 东北大学 Pierced billet quality modeling and prediction method based on integrated mean value staged RPLS-OS-ELM
CN104463251A (en) * 2014-12-15 2015-03-25 江苏科技大学 Cancer gene expression profile data identification method based on integration of extreme learning machines
CN104573699A (en) * 2015-01-21 2015-04-29 中国计量学院 Trypetid identification method based on medium field intensity magnetic resonance dissection imaging

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226728A (en) * 2013-04-07 2013-07-31 北京化工大学 Intelligent detection and yield optimization method for HDPE (high density polyethylene) cascade polymerization reaction course
CN103593550A (en) * 2013-08-12 2014-02-19 东北大学 Pierced billet quality modeling and prediction method based on integrated mean value staged RPLS-OS-ELM
CN103528990A (en) * 2013-10-31 2014-01-22 天津工业大学 Method for establishing multiple models of near infrared spectrums
CN104463251A (en) * 2014-12-15 2015-03-25 江苏科技大学 Cancer gene expression profile data identification method based on integration of extreme learning machines
CN104573699A (en) * 2015-01-21 2015-04-29 中国计量学院 Trypetid identification method based on medium field intensity magnetic resonance dissection imaging

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
亚麻酸红外光谱定量分析模型构建方法研究;郝勇等;《中国农机化学报》;20150531;第36卷(第3期);第164-168页 *

Also Published As

Publication number Publication date
CN105117525A (en) 2015-12-02

Similar Documents

Publication Publication Date Title
CN105117525B (en) Bagging extreme learning machine integrated modelling approach
CN107064054B (en) A kind of near-infrared spectral analytical method based on CC-PLS-RBFNN Optimized model
CN103528990B (en) A kind of multi-model Modeling Method of near infrared spectrum
Lin et al. Determination of grain protein content by near-infrared spectrometry and multivariate calibration in barley
Galvao et al. A method for calibration and validation subset partitioning
CN107491784A (en) Tobacco leaf near infrared spectrum quantitative modeling method and application based on deep learning algorithm
CN104020127A (en) Method for rapidly measuring inorganic element in tobacco by near infrared spectrum
Wang et al. Rapid detection of protein content in rice based on Raman and near-infrared spectroscopy fusion strategy combined with characteristic wavelength selection
CN110503156B (en) Multivariate correction characteristic wavelength selection method based on minimum correlation coefficient
CN106372426A (en) Multi-response parameter optimization method based on principal component analysis and neural network
Wei et al. Generalisation of tea moisture content models based on VNIR spectra subjected to fractional differential treatment
Sheng et al. Data fusion strategy for rapid prediction of moisture content during drying of black tea based on micro-NIR spectroscopy and machine vision
Baumann et al. A systematic evaluation of the benefits and hazards of variable selection in latent variable regression. Part II. Practical applications
Li et al. Improvement of NIR prediction ability by dual model optimization in fusion of NSIA and SA methods
Lei et al. Achieving joint calibration of soil Vis-NIR spectra across instruments, soil types and properties by an attention-based spectra encoding-spectra/property decoding architecture
CN112395803B (en) ICP-AES multimodal spectral line separation method based on particle swarm optimization
Chen et al. A geographical traceability method for Lanmaoa asiatica mushrooms from 20 township-level geographical origins by near infrared spectroscopy and ResNet image analysis techniques
Zhang et al. Application of swarm intelligence algorithms to the characteristic wavelength selection of soil moisture content
CN107356556A (en) A kind of double integrated modelling approach of Near-Infrared Spectra for Quantitative Analysis
Chen et al. A novel spectral multivariate calibration approach based on a multiple fitting method
CN109063767A (en) A kind of near infrared spectrum modeling method known together based on sample and variable
Xie et al. Calibration transfer via filter learning
Zhu et al. Raman spectroscopy coupled with metaheuristics-based variable selection models: A method for rapid determination of extra virgin olive oil content in vegetable blend oils
CN116484989A (en) Tobacco near-infrared multicomponent prediction method based on deep migration learning
Yi-Ming et al. Ensemble partial least squares algorithm based on variable clustering for quantitative infrared spectrometric analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 300387 Tianjin city Xiqing District West Binshui Road No. 399

Applicant after: Tianjin Polytechnic University

Applicant after: Shanghai Sui Hua Industrial Limited by Share Ltd

Address before: 300387 Tianjin city Xiqing District West Binshui Road No. 399

Applicant before: Tianjin Polytechnic University

Applicant before: Shanghai Huishan Industrial Co., Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant