CN105117525A - Bagging extreme learning machine integrated modeling method - Google Patents
Bagging extreme learning machine integrated modeling method Download PDFInfo
- Publication number
- CN105117525A CN105117525A CN201510466504.7A CN201510466504A CN105117525A CN 105117525 A CN105117525 A CN 105117525A CN 201510466504 A CN201510466504 A CN 201510466504A CN 105117525 A CN105117525 A CN 105117525A
- Authority
- CN
- China
- Prior art keywords
- sample
- rmsep
- learning machine
- extreme learning
- submodel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Investigating Or Analysing Materials By Optical Means (AREA)
Abstract
The invention belongs to the technical field of chemical metrology, and particularly relates to a Bagging extreme learning machine integrated modeling method. The method comprises the specific steps of: collecting a spectral data of an analyte sample, and determining a quantity of a tested component in the sample; dividing a sample set into a training set and a prediction set; carrying on a boostrap resample to the training set, randomly selecting a certain number of samples as a training subset; establishing an extreme learning machine submodel by the sample of the training subset; repeating the steps for multiple times, and establishing a plurality of submodels; and simply averaging predicted results of the plurality of submodels for unknown samples to obtain a final predicted result. Compared with an ELM method, the method related to the invention has significant advantages in terms of prediction accuracy and stability. The present invention is applicable to quantitative analysis field of complex substances such as petroleum, tobacco, food, traditional Chinese medicine and the like.
Description
Technical field
The invention belongs to chemometric techniques field, be specifically related to Bagging extreme learning machine integrated modelling approach.
Background technology
Artificial neural network, because of its powerful self-adaptation, self-organization, self study and non-linear mapping capability, has been widely used in biology, chemistry, medical science, the various field of economic dispatch.But traditional Learning Algorithm (as BP algorithm) needs artificially to arrange a large amount of network training parameters, training speed is slow, be easy to produce locally optimal solution.2004, Nanyang Technological University professor Huang Guangbin proposed a kind of new algorithm of Single hidden layer feedforward neural networks, called after extreme learning machine (ExtremeLearningMachine, ELM).The core of ELM algorithm is that the training problem of neural network is changing into the problem solving least square, avoids the defect that artificial neural network needs artificial adjustment parameter and is easy to be absorbed in locally optimal solution.ELM algorithm is simple and easy to because of it features such as realizing, pace of learning is fast, generalization ability is strong, receives increasing concern in recent years, is applied in multiple fields such as analytical chemistry, control engineering, image recognitions.But because the input weight of ELM and the biased of hidden neuron are random settings, make the operation result of model have instability.
Integrated moulding technology is finally predicted the outcome by the result of multiple model is carried out fusion, can improve precision and the stability of model prediction.Multiple submodel, as a kind of conventional integrated modelling approach, with " bootstrap " method multiple submodel of Stochastic choice part Sample Establishing from training set, then predicts the outcome to be averaged and is finally predicted the outcome by Bagging.The method increases the diversity factor of integrated moulding on the one hand by again choosing training set, improve the precision of prediction of basic model on the other hand by merging multiple predicting the outcome.
The present invention, in conjunction with the advantage of ELM and Bagging, proposes the ELM integrated modelling approach based on Bagging, and for complex sample quantitative test, has both remained the advantage that ELM computing velocity is fast, predictive ability is strong, and overcome again the shortcoming of ELM poor stability.
Summary of the invention
The object of the invention is to propose a kind of good stability, Bagging extreme learning machine integrated modelling approach that precision of prediction is high.
Bagging algorithm combines with extreme learning machine model (ELM) by the present invention, establishes the extreme learning machine integrated approach (being designated as BaggingELM) based on Bagging, and as shown in Figure 1, concrete steps are its flow process:
(1) gather the spectroscopic data of measured object sample, measure the content of the tested composition of sample by conventional method; Sample set is divided into training set and forecast set;
(2) carry out boostrap resampling to training set sample, random selecting some samples are as a training subset;
(3) extreme learning machine Optimum Excitation function and hidden layer nodes is determined, with the Sample Establishing extreme learning machine submodel of training subset;
Repeat step (2), step (3) repeatedly, set up N number of submodel;
(4) for unknown sample, by making arithmetic mean to predicting the outcome of multiple submodel, finally predicted the outcome.
In the present invention, the defining method of the number of described N number of submodel is as follows: a given enough large submodel number value, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error (Rootmeansquareerrorofprediction, and observe the change of RMSEP along with submodel number RMSEP); When RMSEP value is constant or almost constant (tending towards stability) time, the submodel number corresponding to it is required number N.
In the present invention, the defining method of the number of samples of training subset is as follows: stator Number of Models, by 5% ~ 100% of sample number, at interval of 5%, change the number (during non-integer, the method for truncating rounds) of the sample be selected, calculate RMSEP value, the number of samples that RMSEP is minimum or corresponding when tending to be steady, is the number of samples chosen that at every turn circulates.
In the present invention, determine that the concrete grammar of extreme learning machine Optimum Excitation function and hidden layer nodes is as follows: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, when RMSEP reaches minimum, the excitation function corresponding to RMSEP and hidden layer nodes are optimal parameter.
Advantage of the present invention is: this modeling method combines the advantage of integrated moulding technology Bagging and extreme learning machine, improves precision of prediction and the stability of extreme learning machine algorithm, for the analysis of complex material Multivariate Correction provides a kind of new modeling method.The inventive method can be widely used in the complex material quantitative test in the fields such as oil, tobacco, food, Chinese medicine.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of Bagging extreme learning machine.
Fig. 2 is the change of predicted root mean square error value along with submodel number of fuel oil ultraviolet data.
Fig. 3 is the change of predicted root mean square error value along with training subset sample percentage of fuel oil ultraviolet data.
Fig. 4 is the change of predicted root mean square error value along with prediction number of times of fuel oil ultraviolet data extreme learning machine and Bagging extreme learning machine.
The relation of Fig. 5 mean predicted value that to be fuel oil ultraviolet data Bagging extreme learning machine and extreme learning machine predict forecast set and actual value, wherein (a) and (b) is respectively Bagging extreme learning machine and extreme learning machine.
Fig. 6 is the change of predicted root mean square error value along with submodel number of ethanolic solution near infrared data.
Fig. 7 is the change of predicted root mean square error value along with training subset sample percentage of ethanolic solution near infrared data.
Fig. 8 is the change of predicted root mean square error value along with prediction number of times of ethanolic solution near infrared data extreme learning machine and Bagging extreme learning machine.
The relation of Fig. 9 mean predicted value that to be ethanolic solution near infrared data Bagging extreme learning machine and extreme learning machine predict forecast set and actual value, wherein (a) and (b) is respectively Bagging extreme learning machine and extreme learning machine.
Figure 10 is the change of predicted root mean square error value along with submodel number of diesel oil near infrared data.
Figure 11 is the change of predicted root mean square error value along with training subset sample percentage of diesel oil near infrared data.
Figure 12 is the change of predicted root mean square error value along with prediction number of times of diesel oil near infrared data extreme learning machine and Bagging extreme learning machine.
The relation of Figure 13 mean predicted value that to be diesel oil near infrared data Bagging extreme learning machine and extreme learning machine predict forecast set and actual value, wherein (a) and (b) is respectively Bagging extreme learning machine and extreme learning machine.
Figure 14 is the change of predicted root mean square error value along with submodel number of blood near infrared data.
Figure 15 is the change of predicted root mean square error value along with training subset sample percentage of blood near infrared data.
Figure 16 is the change of predicted root mean square error value along with prediction number of times of blood near infrared data extreme learning machine and Bagging extreme learning machine.
The relation of Figure 17 mean predicted value that to be blood near infrared data Bagging extreme learning machine and extreme learning machine predict forecast set and actual value, wherein (a) and (b) is respectively Bagging extreme learning machine and extreme learning machine.
Figure 18 is the change of predicted root mean square error value along with submodel number of cigarette near infrared data.
Figure 19 is the change of predicted root mean square error value along with training subset sample percentage of cigarette near infrared data.
Figure 20 is the change of predicted root mean square error value along with prediction number of times of cigarette near infrared data extreme learning machine and Bagging extreme learning machine.
The relation of Figure 21 mean predicted value that to be cigarette near infrared data Bagging extreme learning machine and extreme learning machine predict forecast set and actual value, wherein (a) and (b) is respectively Bagging extreme learning machine and extreme learning machine.
Embodiment
For understanding the present invention better, below in conjunction with embodiment the present invention done and describe in detail further, but the scope of protection of present invention being not limited to the scope that embodiment represents.
Embodiment 1:
The present embodiment is applied to ultraviolet spectral analysis, measures monoaromatics content value in fuel oil sample.Concrete step is as follows:
(1) gather the ultraviolet spectrum data of 115 fuel oil samples, wavelength coverage is 200-400nm, and sampling interval is 0.35nm, comprises 572 wavelength points, and spectrum adopts VarianCary3UV-visiblespectrophometer spectrophotometer.Monoaromatics content adopts HPG1205A supercritical fluid chromatography to measure, and carbon dioxide is as carrier gas, and flow velocity is 2mLmin
-1, furnace temperature is 35oC, and top hole pressure is 150bar, and detecting device is flame ionic detector.According on website to the division of data set, 70 samples are used as training set, and 45 samples are used as forecast set.
(2) carry out boostrap resampling to training set sample, random selecting some samples are as a training subset.
(3) extreme learning machine Optimum Excitation function and hidden layer nodes is determined, with the Sample Establishing extreme learning machine submodel of training subset.
Repeat (2)-(3) step repeatedly, set up multiple submodel.
(4) for unknown sample, make arithmetic mean by predicting the outcome of multiple submodel, finally predicted the outcome.
The defining method of submodel number: given 500 sub-model number values, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error (Rootmeansquareerrorofprediction, RMSEP) along with the change of submodel number, the constant or almost Number of Models set up of constant (tending towards stability) Shi Weiying of RMSEP value.In this embodiment RMSEP along with submodel number change as shown in Figure 2, group pattern number is after 500, and RMSEP value is almost constant, so the Number of Models set up is 500.
The choosing method of some samples is: stator Number of Models is 500, by 5% ~ 100% of sample number, at interval of 5%, change the number (during non-integer, the method for truncating rounds) of the sample be selected, calculate RMSEP value, the sample number that RMSEP is minimum or corresponding when tending to be steady is the sample number chosen that at every turn circulates.In this embodiment RMSEP value along with training subset sample percentage change as shown in Figure 3, when training subset sample number reaches 20 ~ 100% of training sample sum, RMSEP value reaches minimum and almost constant, therefore, training subset sample number be total sample number 20 ~ 100% can, this example choose that training subset sample number is total number of samples 50%.
Extreme learning machine Optimum Excitation function and hidden layer nodes defining method are: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, Optimum Excitation function corresponding when RMSEP reaches minimum in this embodiment and hidden layer nodes are: excitation function sin and nodes 9.
In order to compare the stability of BaggingELM and ELM, two kinds of methods being reruned 20 times respectively, obtaining the change of RMSEP value with number of run, as shown in Figure 4.As can be seen from the figure, the rerun RMSEP fluctuation of 20 times of ELM algorithm is comparatively large, and the less stable of model is described; And less with the fluctuation predicted the outcome for 20 times of baggingELM algorithm, and substantially tend towards stability, show good stability; This shows that the stability repeatedly run of baggingELM obviously will be better than ELM.On the other hand, the RMSEP value of BaggingELM, always lower than ELM, shows the integrated precision of prediction that improve ELM algorithm of Bagging.Fig. 5 shows the relation that BaggingELM and ELM carries out mean predicted value that 20 computings predict forecast set and actual value respectively, and wherein vertical line represents the deviation run for 20 times, and as can be seen from the figure, BaggingELM deviation will be far smaller than the deviation of ELM.BaggingELM predict the outcome and related coefficient 0.9921 between actual value is greater than predicting the outcome of 0.9858, BaggingELM of ELM and the standard deviation between actual value 0.0001 is less than 0.0044 of ELM.This shows that BaggingELM improves the precision of prediction of ELM model, and has better stability.
Embodiment 2:
The present embodiment is applied to ultraviolet spectral analysis, measures ethanol component content value.Concrete step is as follows:
(1) gather the near infrared spectrum data of 95 ethanolic solution samples, wavelength coverage is 850-1049nm, and sampling interval is 1nm, comprises 200 wavelength points, and spectrum adopts HP8453 spectrophotometer.According on website to the division of data set, 65 samples are used as training set, and 30 samples are used as forecast set.
(2) carry out boostrap resampling to training set sample, random selecting some samples are as a training subset.
(3) extreme learning machine Optimum Excitation function and hidden layer nodes is determined, with the Sample Establishing extreme learning machine submodel of training subset.
Repeat (2)-(3) step repeatedly, set up multiple submodel.
(4) for unknown sample, by the arithmetic mean that predicts the outcome of multiple submodel, finally predicted the outcome.
The defining method of submodel number: given 500 sub-model number values, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error (Rootmeansquareerrorofprediction, RMSEP) along with the change of submodel number, the constant or almost Number of Models set up of constant (tending towards stability) Shi Weiying of RMSEP value.In this embodiment RMSEP along with submodel number change as shown in Figure 6, group pattern number is after 500, and RMSEP value is almost constant, so the Number of Models set up is 500.
The choosing method of some samples is: stator Number of Models is 500, by 5% ~ 100% of sample number, at interval of 5%, change the number (during non-integer, the method for truncating rounds) of the sample be selected, calculate RMSEP value, the sample number that RMSEP is minimum or corresponding when tending to be steady is the sample number chosen that at every turn circulates.In this embodiment RMSEP value along with training subset sample percentage change as shown in Figure 7, when training subset sample number reaches 30 ~ 100% of training sample sum, RMSEP value reaches minimum and almost constant, therefore, training subset sample number be total sample number 30 ~ 100% can, this example choose that training subset sample number is total number of samples 60%.
Extreme learning machine Optimum Excitation function and hidden layer nodes defining method are: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, Optimum Excitation function corresponding when RMSEP reaches minimum in this embodiment and hidden layer nodes are: excitation function radbas and nodes 35.
In order to compare the stability of BaggingELM and ELM, two kinds of methods being reruned 20 times respectively, obtaining the change of RMSEP value with number of run, as shown in Figure 8.As can be seen from the figure, the rerun RMSEP fluctuation of 20 times of ELM algorithm is comparatively large, and the less stable of model is described; And less with the fluctuation predicted the outcome for 20 times of BaggingELM algorithm, and substantially tend towards stability, show good stability; The stability repeatedly run this demonstrating BaggingELM obviously will be better than ELM.On the other hand, the RMSEP value of BaggingELM, always lower than ELM, shows the integrated precision of prediction that improve ELM algorithm of Bagging.Fig. 9 shows the relation that BaggingELM and ELM carries out mean predicted value that 20 computings predict forecast set and actual value respectively, and wherein vertical line represents the deviation run for 20 times, and as can be seen from the figure, BaggingELM deviation will be far smaller than the deviation of ELM.BaggingELM predict the outcome and related coefficient 0.9988 between actual value is greater than predicting the outcome of 0.9957, BaggingELM of ELM and the standard deviation between actual value 0.00015 is less than 0.0023 of ELM.This demonstrate the precision of prediction that BaggingELM improves ELM model, and there is better stability.
Embodiment 3:
The present embodiment is applied to near-infrared spectrum analysis, measures the density value of diesel oil sample.Concrete step is as follows:
(1) near infrared spectrum data of 263 diesel fuel samples is gathered, wavelength coverage is 750-1550nm, sampling interval 2nm, comprise 401 wavelength points, data by US military SouthwestResearchInstitute (SWRI), SanAntonio, TXthroughEigenvectorResearch, Inc. (Manson, Washington) provides, and downloads network address: http://www.eigenvector.com/Data/SWRI).According on website to the division of data set, 142 samples are used as training set, and 121 samples are used as forecast set.
(2) carry out boostrap resampling to training set sample, random selecting some samples are as a training subset.
(3) extreme learning machine Optimum Excitation function and hidden layer nodes is determined, with the Sample Establishing extreme learning machine submodel of training subset.
Repeat (2)-(3) step repeatedly, set up multiple submodel.
(4) for unknown sample, by the arithmetic mean that predicts the outcome of multiple submodel, finally predicted the outcome.
The defining method of submodel number: given 500 sub-model number values, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error (Rootmeansquareerrorofprediction, RMSEP) along with the change of submodel number, the constant or almost Number of Models set up of constant (tending towards stability) Shi Weiying of RMSEP value.In this embodiment RMSEP along with submodel number change as shown in Figure 10, group pattern number is after 500, and RMSEP value is almost constant, so the Number of Models set up is 500.
The choosing method of some samples is: stator Number of Models is 500, by 5% ~ 100% of sample number, at interval of 5%, change the number (during non-integer, the method for truncating rounds) of the sample be selected, calculate RMSEP value, the sample number that RMSEP is minimum or corresponding when tending to be steady is the sample number chosen that at every turn circulates.In this embodiment RMSEP value along with training subset sample percentage change as shown in figure 11, when training subset sample number reaches 40 ~ 100% of training sample sum, RMSEP value reaches minimum and almost constant, therefore, training subset sample number be total sample number 40 ~ 100% can, this example choose that training subset sample number is total number of samples 50%.
Extreme learning machine Optimum Excitation function and hidden layer nodes defining method are: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, Optimum Excitation function corresponding when RMSEP reaches minimum in this embodiment and hidden layer nodes are: excitation function tribas and nodes 48.
In order to compare the stability of BaggingELM and ELM, two kinds of methods being reruned 20 times respectively, obtaining the change of RMSEP value with number of run, as shown in figure 12.As can be seen from the figure, the rerun RMSEP fluctuation of 20 times of ELM algorithm is comparatively large, and the less stable of model is described; And less with the fluctuation predicted the outcome for 20 times of BaggingELM algorithm, and substantially tend towards stability, show good stability; The stability repeatedly run this demonstrating BaggingELM obviously will be better than ELM.On the other hand, the RMSEP value of BaggingELM, always lower than ELM, shows the integrated precision of prediction that improve ELM algorithm of Bagging.Figure 13 shows the relation that BaggingELM and ELM carries out mean predicted value that 20 computings predict forecast set and actual value respectively, and wherein vertical line represents the deviation run for 20 times, and as can be seen from the figure, BaggingELM deviation will be far smaller than the deviation of ELM.BaggingELM predict the outcome and related coefficient 0.9970 between actual value is greater than predicting the outcome of 0.9923, BaggingELM of ELM and the standard deviation between actual value 0.00014 is less than 0.0031 of ELM.This demonstrate the precision of prediction that BaggingELM improves ELM model, and there is better stability.
Embodiment 4:
The present embodiment is applied near infrared spectral transmission analysis, measures the content of blood sample.Concrete step is as follows:
(1) the near-infrared transmission data of 231 blood samples are gathered, wavelength coverage is 1100-2498nm, sampling interval 2nm, comprise 700 wavelength points, spectrum adopts NIRsystemsspectrometermodel6500 instrument (NIRsystems, Inc., SilverSprings, USA) reflective-mode measures, and downloads network address: http://www.idrc-chambersburg.org/shootout2010.html.According on website to the division of data set, 143 samples are used as training set, and 47 samples are used as forecast set.
(2) carry out boostrap resampling to training set sample, random selecting some samples are as a training subset.
(3) extreme learning machine Optimum Excitation function and hidden layer nodes is determined, with the Sample Establishing extreme learning machine submodel of training subset.
Repeat (2)-(3) step repeatedly, set up multiple submodel.
(4) for unknown sample, make arithmetic mean by predicting the outcome of multiple submodel, finally predicted the outcome.
The defining method of submodel number: given 500 sub-model number values, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error (Rootmeansquareerrorofprediction, RMSEP) along with the change of submodel number, the constant or almost Number of Models set up of constant (tending towards stability) Shi Weiying of RMSEP value.In this embodiment RMSEP along with submodel number change as shown in figure 14, group pattern number is after 500, and RMSEP value is almost constant, so the Number of Models set up is 500.
The choosing method of some samples is: stator Number of Models is 500, by 5% ~ 100% of sample number, at interval of 5%, change the number (during non-integer, the method for truncating rounds) of the sample be selected, calculate RMSEP value, the sample number that RMSEP is minimum or corresponding when tending to be steady is the sample number chosen that at every turn circulates.In this embodiment RMSEP value along with training subset sample percentage change as shown in figure 15, when training subset sample number reaches 60% of training sample sum, RMSEP value is minimum, and the training subset sample number chosen that at every turn circulates is 60% of total number of samples.
Extreme learning machine Optimum Excitation function and hidden layer nodes defining method are: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, Optimum Excitation function corresponding when RMSEP reaches minimum in this embodiment and hidden layer nodes are: excitation function sig and nodes 37.
In order to compare the stability of BaggingELM and ELM, two kinds of methods being reruned 20 times respectively, obtaining the change of RMSEP value with number of run, as shown in figure 16.As can be seen from the figure, the rerun RMSEP fluctuation of 20 times of ELM algorithm is comparatively large, and the less stable of model is described; And less with the fluctuation predicted the outcome for 20 times of BaggingELM algorithm, and substantially tend towards stability, show good stability; The stability repeatedly run this demonstrating BaggingELM obviously will be better than ELM.On the other hand, the RMSEP value of BaggingELM, always lower than ELM, shows the integrated precision of prediction that improve ELM algorithm of Bagging.Figure 17 shows the relation that BaggingELM and ELM carries out mean predicted value that 20 computings predict forecast set and actual value respectively, and wherein vertical line represents the deviation run for 20 times, and as can be seen from the figure, BaggingELM deviation will be far smaller than the deviation of ELM.BaggingELM predict the outcome and related coefficient 0.9774 between actual value is greater than predicting the outcome of 0.9432, baggingELM of ELM and the standard deviation between actual value 0.0008 is less than 0.0268 of ELM.This demonstrate the precision of prediction that baggingELM improves ELM model, and there is better stability.
Embodiment 5:
The present embodiment is applied to near-infrared spectrum analysis, measures the content value of Virginian-type cigarette Powder samples chlorine.Concrete step is as follows:
(1) gather the near infrared spectrum data of the Virginian-type cigarette product of 58 trades mark, wave-number range is 4000 ~ 9000cm
-1, sampling interval is 1cm
-1, totally 5001 wavelength points, cigarette is prepared into powdered sample according to YC/331-1996, and particle mean size is 0.45mm, and spectrum adopts Vector22/NFT-NIRSystem (Bruker) spectrophotometer.In sample, the content of chlorine adopts AutoAnalyzerIII type Continuous Flow Analysis instrument to measure according to standard method.Adopt KS group technology to divide data set, 38 samples are as training set, and 20 samples are as forecast set.
(2) carry out boostrap resampling to training set sample, random selecting some samples are as a training subset.
(3) extreme learning machine Optimum Excitation function and hidden layer nodes is determined, with the Sample Establishing extreme learning machine submodel of training subset.
Repeat (2)-(3) step repeatedly, set up multiple submodel.
(4) for unknown sample, make arithmetic mean by predicting the outcome of multiple submodel, finally predicted the outcome.
The defining method of submodel number: given 500 sub-model number values, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error (Rootmeansquareerrorofprediction, RMSEP) along with the change of submodel number, the constant or almost Number of Models set up of constant (tending towards stability) Shi Weiying of RMSEP value.In this embodiment RMSEP along with submodel number change as shown in figure 18, group pattern number is after 500, and RMSEP value is almost constant, so the Number of Models set up is 500.
The choosing method of some samples is: stator Number of Models is 500, by 5% ~ 100% of sample number, at interval of 5%, change the number (during non-integer, the method for truncating rounds) of the sample be selected, calculate RMSEP value, the sample number that RMSEP is minimum or corresponding when tending to be steady is the sample number chosen that at every turn circulates.In this embodiment RMSEP value along with training subset sample percentage change as shown in figure 19, when training subset sample number reaches 60 ~ 100% of training sample sum, RMSEP value reaches minimum and almost constant, therefore, training subset sample number be total sample number 60 ~ 100% can, this example choose that training subset sample number is total number of samples 60%.
Extreme learning machine Optimum Excitation function and hidden layer nodes defining method are: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, Optimum Excitation function corresponding when RMSEP reaches minimum in this embodiment and hidden layer nodes are: excitation function sig and nodes 69.
In order to compare the stability of BaggingELM and ELM, two kinds of methods being reruned 20 times respectively, obtaining the change of RMSEP value with number of run, as shown in figure 20.As can be seen from the figure, the rerun RMSEP fluctuation of 20 times of ELM algorithm is comparatively large, and the less stable of model is described; And less with the fluctuation predicted the outcome for 20 times of BaggingELM algorithm, and substantially tend towards stability, show good stability; The stability repeatedly run this demonstrating BaggingELM obviously will be better than ELM.On the other hand, the RMSEP value of BaggingELM, always lower than ELM, shows the integrated precision of prediction that improve ELM algorithm of Bagging.Figure 21 shows the relation that BaggingELM and ELM carries out mean predicted value that 20 computings predict forecast set and actual value respectively, and wherein vertical line represents the deviation run for 20 times, and as can be seen from the figure, BaggingELM deviation will be far smaller than the deviation of ELM.BaggingELM predict the outcome and related coefficient 0.9762 between actual value is greater than predicting the outcome of 0.9635, BaggingELM of ELM and the standard deviation between actual value 0.0007 is less than 0.0062 of ELM.This demonstrate the precision of prediction that BaggingELM improves ELM model, and there is better stability.
Above-described embodiment illustrates that the method is that complex material quantitative test provides a kind of complex sample quantitative test modeling method based on Bagging extreme learning machine, and this method can improve stability and the precision of prediction of model greatly.
Claims (4)
1. a Bagging extreme learning machine integrated modelling approach, is characterized in that concrete steps are:
(1) gather the spectroscopic data of measured object sample, measure the content of the tested composition of sample; Sample set is divided into training set and forecast set;
(2) carry out boostrap resampling to training set sample, random selecting some samples are as a training subset;
(3) extreme learning machine Optimum Excitation function and hidden layer nodes is determined, with the Sample Establishing extreme learning machine submodel of training subset;
Repeat step (2), step (3) repeatedly, set up N number of submodel;
(4) for unknown sample, by making arithmetic mean to predicting the outcome of multiple submodel, finally predicted the outcome.
2. Bagging extreme learning machine integrated modelling approach according to claim 1, it is characterized in that: the defining method of the number of described N number of submodel is as follows: a given enough large submodel number value, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error, be designated as RMSEP, and observe the change of RMSEP along with submodel number; When RMSEP value is constant or tend towards stability, the submodel number corresponding to it is required number N.
3. Bagging extreme learning machine integrated modelling approach according to claim 2, it is characterized in that: the defining method of the number of samples of described training subset is as follows: stator Number of Models, by 5% ~ 100% of sample number, at interval of 5%, change the number of the sample be selected, during non-integer, the method for truncating rounds, and calculates RMSEP value, the number of samples that RMSEP is minimum or corresponding when tending to be steady, is the number of samples chosen that at every turn circulates.
4. Bagging extreme learning machine integrated modelling approach according to claim 3, it is characterized in that: describedly determine that the concrete grammar of extreme learning machine Optimum Excitation function and hidden layer nodes is as follows: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, when RMSEP reaches minimum, the excitation function corresponding to RMSEP and hidden layer nodes are optimal parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510466504.7A CN105117525B (en) | 2015-07-31 | 2015-07-31 | Bagging extreme learning machine integrated modelling approach |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510466504.7A CN105117525B (en) | 2015-07-31 | 2015-07-31 | Bagging extreme learning machine integrated modelling approach |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105117525A true CN105117525A (en) | 2015-12-02 |
CN105117525B CN105117525B (en) | 2018-05-15 |
Family
ID=54665513
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510466504.7A Active CN105117525B (en) | 2015-07-31 | 2015-07-31 | Bagging extreme learning machine integrated modelling approach |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105117525B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106529680A (en) * | 2016-10-27 | 2017-03-22 | 天津工业大学 | Multiscale extreme learning machine integrated modeling method based on empirical mode decomposition |
CN106650926A (en) * | 2016-09-14 | 2017-05-10 | 天津工业大学 | Robust boosting extreme learning machine integrated modeling method |
CN107356556A (en) * | 2017-07-10 | 2017-11-17 | 天津工业大学 | A kind of double integrated modelling approach of Near-Infrared Spectra for Quantitative Analysis |
CN109325516A (en) * | 2018-08-13 | 2019-02-12 | 众安信息技术服务有限公司 | A kind of integrated learning approach and device towards image classification |
CN113094892A (en) * | 2021-04-02 | 2021-07-09 | 辽宁石油化工大学 | Oil concentration prediction method based on data elimination and local partial least squares |
CN115691703A (en) * | 2022-10-15 | 2023-02-03 | 苏州创腾软件有限公司 | Drug property prediction method and system based on pharmacokinetic model |
CN117150877A (en) * | 2022-05-23 | 2023-12-01 | 北京理工大学 | Method for predicting optimal pressing process of press-loading mixed explosive based on Bagging algorithm |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103226728A (en) * | 2013-04-07 | 2013-07-31 | 北京化工大学 | Intelligent detection and yield optimization method for HDPE (high density polyethylene) cascade polymerization reaction course |
CN103528990A (en) * | 2013-10-31 | 2014-01-22 | 天津工业大学 | Method for establishing multiple models of near infrared spectrums |
CN103593550A (en) * | 2013-08-12 | 2014-02-19 | 东北大学 | Pierced billet quality modeling and prediction method based on integrated mean value staged RPLS-OS-ELM |
CN104463251A (en) * | 2014-12-15 | 2015-03-25 | 江苏科技大学 | Cancer gene expression profile data identification method based on integration of extreme learning machines |
CN104573699A (en) * | 2015-01-21 | 2015-04-29 | 中国计量学院 | Trypetid identification method based on medium field intensity magnetic resonance dissection imaging |
-
2015
- 2015-07-31 CN CN201510466504.7A patent/CN105117525B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103226728A (en) * | 2013-04-07 | 2013-07-31 | 北京化工大学 | Intelligent detection and yield optimization method for HDPE (high density polyethylene) cascade polymerization reaction course |
CN103593550A (en) * | 2013-08-12 | 2014-02-19 | 东北大学 | Pierced billet quality modeling and prediction method based on integrated mean value staged RPLS-OS-ELM |
CN103528990A (en) * | 2013-10-31 | 2014-01-22 | 天津工业大学 | Method for establishing multiple models of near infrared spectrums |
CN104463251A (en) * | 2014-12-15 | 2015-03-25 | 江苏科技大学 | Cancer gene expression profile data identification method based on integration of extreme learning machines |
CN104573699A (en) * | 2015-01-21 | 2015-04-29 | 中国计量学院 | Trypetid identification method based on medium field intensity magnetic resonance dissection imaging |
Non-Patent Citations (1)
Title |
---|
郝勇等: "亚麻酸红外光谱定量分析模型构建方法研究", 《中国农机化学报》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106650926A (en) * | 2016-09-14 | 2017-05-10 | 天津工业大学 | Robust boosting extreme learning machine integrated modeling method |
CN106650926B (en) * | 2016-09-14 | 2019-04-16 | 天津工业大学 | A kind of steady boosting extreme learning machine integrated modelling approach |
CN106529680A (en) * | 2016-10-27 | 2017-03-22 | 天津工业大学 | Multiscale extreme learning machine integrated modeling method based on empirical mode decomposition |
CN106529680B (en) * | 2016-10-27 | 2019-01-29 | 天津工业大学 | A kind of multiple dimensioned extreme learning machine integrated modelling approach based on empirical mode decomposition |
CN107356556A (en) * | 2017-07-10 | 2017-11-17 | 天津工业大学 | A kind of double integrated modelling approach of Near-Infrared Spectra for Quantitative Analysis |
CN109325516A (en) * | 2018-08-13 | 2019-02-12 | 众安信息技术服务有限公司 | A kind of integrated learning approach and device towards image classification |
CN113094892A (en) * | 2021-04-02 | 2021-07-09 | 辽宁石油化工大学 | Oil concentration prediction method based on data elimination and local partial least squares |
CN117150877A (en) * | 2022-05-23 | 2023-12-01 | 北京理工大学 | Method for predicting optimal pressing process of press-loading mixed explosive based on Bagging algorithm |
CN115691703A (en) * | 2022-10-15 | 2023-02-03 | 苏州创腾软件有限公司 | Drug property prediction method and system based on pharmacokinetic model |
Also Published As
Publication number | Publication date |
---|---|
CN105117525B (en) | 2018-05-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105117525A (en) | Bagging extreme learning machine integrated modeling method | |
CN104062257B (en) | A kind of based on the method for general flavone content near infrared ray solution | |
CN105300923B (en) | Without measuring point model of temperature compensation modification method during a kind of near-infrared spectrometers application on site | |
CN104020127B (en) | A kind of near infrared spectrum is utilized quickly to measure the method for inorganic elements in Nicotiana tabacum L. | |
CN106841075B (en) | COD ultraviolet spectra on-line checking optimization method neural network based | |
CN103048339B (en) | Soil moisture detection method and soil moist detection device | |
CN101988895A (en) | Method for predicting single-type crude oil content in mixed crude oil by near infrared spectrum | |
CN106650926B (en) | A kind of steady boosting extreme learning machine integrated modelling approach | |
CN108152239A (en) | The sample composition content assaying method of feature based migration | |
CN105486658A (en) | Near-infrared physical property parameter measuring method without measuring point temperature compensation | |
CN110969282A (en) | Runoff stability prediction method based on LSTM composite network | |
CN102830096A (en) | Method for measuring element concentration and correcting error based on artificial neural network | |
CN104730042A (en) | Method for improving free calibration analysis precision by combining genetic algorithm with laser induced breakdown spectroscopy | |
CN105823751B (en) | Infrared spectrum Multivariate Correction regression modeling method based on λ-SPXY algorithms | |
CN105319179B (en) | A kind of method using middle infrared spectrum prediction hydrogen sulfide content in desulfurized amine | |
CN103398971A (en) | Chemometrics method for determining cetane number of diesel oil | |
Cai et al. | On-line multi-gas component measurement in the mud logging process based on Raman spectroscopy combined with a CNN-LSTM-AM hybrid model | |
CN107966499A (en) | A kind of method by near infrared spectrum prediction crude oil carbon number distribution | |
CN105550457B (en) | Dynamic Evolution Model bearing calibration and system | |
CN105466885B (en) | Based on the near infrared online measuring method without measuring point temperature-compensating mechanism | |
CN107356556A (en) | A kind of double integrated modelling approach of Near-Infrared Spectra for Quantitative Analysis | |
CN105092509A (en) | Sample component measurement method based on PCR-ELM algorithm | |
CN108120694A (en) | For the multivariate calibration methods and system of Dark sun-cured chemical composition analysis | |
CN106529680A (en) | Multiscale extreme learning machine integrated modeling method based on empirical mode decomposition | |
CN102706855A (en) | Flammable liquid flash point prediction method based on Raman spectroscopy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 300387 Tianjin city Xiqing District West Binshui Road No. 399 Applicant after: Tianjin Polytechnic University Applicant after: Shanghai Sui Hua Industrial Limited by Share Ltd Address before: 300387 Tianjin city Xiqing District West Binshui Road No. 399 Applicant before: Tianjin Polytechnic University Applicant before: Shanghai Huishan Industrial Co., Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |