CN105117525A - Bagging extreme learning machine integrated modeling method - Google Patents

Bagging extreme learning machine integrated modeling method Download PDF

Info

Publication number
CN105117525A
CN105117525A CN201510466504.7A CN201510466504A CN105117525A CN 105117525 A CN105117525 A CN 105117525A CN 201510466504 A CN201510466504 A CN 201510466504A CN 105117525 A CN105117525 A CN 105117525A
Authority
CN
China
Prior art keywords
sample
rmsep
learning machine
extreme learning
submodel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510466504.7A
Other languages
Chinese (zh)
Other versions
CN105117525B (en
Inventor
卞希慧
李淑娟
谭小耀
王江江
王治国
刘维国
陈宗蓬
王晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI HUISHAN INDUSTRIAL Co Ltd
Tianjin Polytechnic University
Original Assignee
SHANGHAI HUISHAN INDUSTRIAL Co Ltd
Tianjin Polytechnic University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI HUISHAN INDUSTRIAL Co Ltd, Tianjin Polytechnic University filed Critical SHANGHAI HUISHAN INDUSTRIAL Co Ltd
Priority to CN201510466504.7A priority Critical patent/CN105117525B/en
Publication of CN105117525A publication Critical patent/CN105117525A/en
Application granted granted Critical
Publication of CN105117525B publication Critical patent/CN105117525B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention belongs to the technical field of chemical metrology, and particularly relates to a Bagging extreme learning machine integrated modeling method. The method comprises the specific steps of: collecting a spectral data of an analyte sample, and determining a quantity of a tested component in the sample; dividing a sample set into a training set and a prediction set; carrying on a boostrap resample to the training set, randomly selecting a certain number of samples as a training subset; establishing an extreme learning machine submodel by the sample of the training subset; repeating the steps for multiple times, and establishing a plurality of submodels; and simply averaging predicted results of the plurality of submodels for unknown samples to obtain a final predicted result. Compared with an ELM method, the method related to the invention has significant advantages in terms of prediction accuracy and stability. The present invention is applicable to quantitative analysis field of complex substances such as petroleum, tobacco, food, traditional Chinese medicine and the like.

Description

Bagging extreme learning machine integrated modelling approach
Technical field
The invention belongs to chemometric techniques field, be specifically related to Bagging extreme learning machine integrated modelling approach.
Background technology
Artificial neural network, because of its powerful self-adaptation, self-organization, self study and non-linear mapping capability, has been widely used in biology, chemistry, medical science, the various field of economic dispatch.But traditional Learning Algorithm (as BP algorithm) needs artificially to arrange a large amount of network training parameters, training speed is slow, be easy to produce locally optimal solution.2004, Nanyang Technological University professor Huang Guangbin proposed a kind of new algorithm of Single hidden layer feedforward neural networks, called after extreme learning machine (ExtremeLearningMachine, ELM).The core of ELM algorithm is that the training problem of neural network is changing into the problem solving least square, avoids the defect that artificial neural network needs artificial adjustment parameter and is easy to be absorbed in locally optimal solution.ELM algorithm is simple and easy to because of it features such as realizing, pace of learning is fast, generalization ability is strong, receives increasing concern in recent years, is applied in multiple fields such as analytical chemistry, control engineering, image recognitions.But because the input weight of ELM and the biased of hidden neuron are random settings, make the operation result of model have instability.
Integrated moulding technology is finally predicted the outcome by the result of multiple model is carried out fusion, can improve precision and the stability of model prediction.Multiple submodel, as a kind of conventional integrated modelling approach, with " bootstrap " method multiple submodel of Stochastic choice part Sample Establishing from training set, then predicts the outcome to be averaged and is finally predicted the outcome by Bagging.The method increases the diversity factor of integrated moulding on the one hand by again choosing training set, improve the precision of prediction of basic model on the other hand by merging multiple predicting the outcome.
The present invention, in conjunction with the advantage of ELM and Bagging, proposes the ELM integrated modelling approach based on Bagging, and for complex sample quantitative test, has both remained the advantage that ELM computing velocity is fast, predictive ability is strong, and overcome again the shortcoming of ELM poor stability.
Summary of the invention
The object of the invention is to propose a kind of good stability, Bagging extreme learning machine integrated modelling approach that precision of prediction is high.
Bagging algorithm combines with extreme learning machine model (ELM) by the present invention, establishes the extreme learning machine integrated approach (being designated as BaggingELM) based on Bagging, and as shown in Figure 1, concrete steps are its flow process:
(1) gather the spectroscopic data of measured object sample, measure the content of the tested composition of sample by conventional method; Sample set is divided into training set and forecast set;
(2) carry out boostrap resampling to training set sample, random selecting some samples are as a training subset;
(3) extreme learning machine Optimum Excitation function and hidden layer nodes is determined, with the Sample Establishing extreme learning machine submodel of training subset;
Repeat step (2), step (3) repeatedly, set up N number of submodel;
(4) for unknown sample, by making arithmetic mean to predicting the outcome of multiple submodel, finally predicted the outcome.
In the present invention, the defining method of the number of described N number of submodel is as follows: a given enough large submodel number value, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error (Rootmeansquareerrorofprediction, and observe the change of RMSEP along with submodel number RMSEP); When RMSEP value is constant or almost constant (tending towards stability) time, the submodel number corresponding to it is required number N.
In the present invention, the defining method of the number of samples of training subset is as follows: stator Number of Models, by 5% ~ 100% of sample number, at interval of 5%, change the number (during non-integer, the method for truncating rounds) of the sample be selected, calculate RMSEP value, the number of samples that RMSEP is minimum or corresponding when tending to be steady, is the number of samples chosen that at every turn circulates.
In the present invention, determine that the concrete grammar of extreme learning machine Optimum Excitation function and hidden layer nodes is as follows: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, when RMSEP reaches minimum, the excitation function corresponding to RMSEP and hidden layer nodes are optimal parameter.
Advantage of the present invention is: this modeling method combines the advantage of integrated moulding technology Bagging and extreme learning machine, improves precision of prediction and the stability of extreme learning machine algorithm, for the analysis of complex material Multivariate Correction provides a kind of new modeling method.The inventive method can be widely used in the complex material quantitative test in the fields such as oil, tobacco, food, Chinese medicine.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of Bagging extreme learning machine.
Fig. 2 is the change of predicted root mean square error value along with submodel number of fuel oil ultraviolet data.
Fig. 3 is the change of predicted root mean square error value along with training subset sample percentage of fuel oil ultraviolet data.
Fig. 4 is the change of predicted root mean square error value along with prediction number of times of fuel oil ultraviolet data extreme learning machine and Bagging extreme learning machine.
The relation of Fig. 5 mean predicted value that to be fuel oil ultraviolet data Bagging extreme learning machine and extreme learning machine predict forecast set and actual value, wherein (a) and (b) is respectively Bagging extreme learning machine and extreme learning machine.
Fig. 6 is the change of predicted root mean square error value along with submodel number of ethanolic solution near infrared data.
Fig. 7 is the change of predicted root mean square error value along with training subset sample percentage of ethanolic solution near infrared data.
Fig. 8 is the change of predicted root mean square error value along with prediction number of times of ethanolic solution near infrared data extreme learning machine and Bagging extreme learning machine.
The relation of Fig. 9 mean predicted value that to be ethanolic solution near infrared data Bagging extreme learning machine and extreme learning machine predict forecast set and actual value, wherein (a) and (b) is respectively Bagging extreme learning machine and extreme learning machine.
Figure 10 is the change of predicted root mean square error value along with submodel number of diesel oil near infrared data.
Figure 11 is the change of predicted root mean square error value along with training subset sample percentage of diesel oil near infrared data.
Figure 12 is the change of predicted root mean square error value along with prediction number of times of diesel oil near infrared data extreme learning machine and Bagging extreme learning machine.
The relation of Figure 13 mean predicted value that to be diesel oil near infrared data Bagging extreme learning machine and extreme learning machine predict forecast set and actual value, wherein (a) and (b) is respectively Bagging extreme learning machine and extreme learning machine.
Figure 14 is the change of predicted root mean square error value along with submodel number of blood near infrared data.
Figure 15 is the change of predicted root mean square error value along with training subset sample percentage of blood near infrared data.
Figure 16 is the change of predicted root mean square error value along with prediction number of times of blood near infrared data extreme learning machine and Bagging extreme learning machine.
The relation of Figure 17 mean predicted value that to be blood near infrared data Bagging extreme learning machine and extreme learning machine predict forecast set and actual value, wherein (a) and (b) is respectively Bagging extreme learning machine and extreme learning machine.
Figure 18 is the change of predicted root mean square error value along with submodel number of cigarette near infrared data.
Figure 19 is the change of predicted root mean square error value along with training subset sample percentage of cigarette near infrared data.
Figure 20 is the change of predicted root mean square error value along with prediction number of times of cigarette near infrared data extreme learning machine and Bagging extreme learning machine.
The relation of Figure 21 mean predicted value that to be cigarette near infrared data Bagging extreme learning machine and extreme learning machine predict forecast set and actual value, wherein (a) and (b) is respectively Bagging extreme learning machine and extreme learning machine.
Embodiment
For understanding the present invention better, below in conjunction with embodiment the present invention done and describe in detail further, but the scope of protection of present invention being not limited to the scope that embodiment represents.
Embodiment 1:
The present embodiment is applied to ultraviolet spectral analysis, measures monoaromatics content value in fuel oil sample.Concrete step is as follows:
(1) gather the ultraviolet spectrum data of 115 fuel oil samples, wavelength coverage is 200-400nm, and sampling interval is 0.35nm, comprises 572 wavelength points, and spectrum adopts VarianCary3UV-visiblespectrophometer spectrophotometer.Monoaromatics content adopts HPG1205A supercritical fluid chromatography to measure, and carbon dioxide is as carrier gas, and flow velocity is 2mLmin -1, furnace temperature is 35oC, and top hole pressure is 150bar, and detecting device is flame ionic detector.According on website to the division of data set, 70 samples are used as training set, and 45 samples are used as forecast set.
(2) carry out boostrap resampling to training set sample, random selecting some samples are as a training subset.
(3) extreme learning machine Optimum Excitation function and hidden layer nodes is determined, with the Sample Establishing extreme learning machine submodel of training subset.
Repeat (2)-(3) step repeatedly, set up multiple submodel.
(4) for unknown sample, make arithmetic mean by predicting the outcome of multiple submodel, finally predicted the outcome.
The defining method of submodel number: given 500 sub-model number values, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error (Rootmeansquareerrorofprediction, RMSEP) along with the change of submodel number, the constant or almost Number of Models set up of constant (tending towards stability) Shi Weiying of RMSEP value.In this embodiment RMSEP along with submodel number change as shown in Figure 2, group pattern number is after 500, and RMSEP value is almost constant, so the Number of Models set up is 500.
The choosing method of some samples is: stator Number of Models is 500, by 5% ~ 100% of sample number, at interval of 5%, change the number (during non-integer, the method for truncating rounds) of the sample be selected, calculate RMSEP value, the sample number that RMSEP is minimum or corresponding when tending to be steady is the sample number chosen that at every turn circulates.In this embodiment RMSEP value along with training subset sample percentage change as shown in Figure 3, when training subset sample number reaches 20 ~ 100% of training sample sum, RMSEP value reaches minimum and almost constant, therefore, training subset sample number be total sample number 20 ~ 100% can, this example choose that training subset sample number is total number of samples 50%.
Extreme learning machine Optimum Excitation function and hidden layer nodes defining method are: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, Optimum Excitation function corresponding when RMSEP reaches minimum in this embodiment and hidden layer nodes are: excitation function sin and nodes 9.
In order to compare the stability of BaggingELM and ELM, two kinds of methods being reruned 20 times respectively, obtaining the change of RMSEP value with number of run, as shown in Figure 4.As can be seen from the figure, the rerun RMSEP fluctuation of 20 times of ELM algorithm is comparatively large, and the less stable of model is described; And less with the fluctuation predicted the outcome for 20 times of baggingELM algorithm, and substantially tend towards stability, show good stability; This shows that the stability repeatedly run of baggingELM obviously will be better than ELM.On the other hand, the RMSEP value of BaggingELM, always lower than ELM, shows the integrated precision of prediction that improve ELM algorithm of Bagging.Fig. 5 shows the relation that BaggingELM and ELM carries out mean predicted value that 20 computings predict forecast set and actual value respectively, and wherein vertical line represents the deviation run for 20 times, and as can be seen from the figure, BaggingELM deviation will be far smaller than the deviation of ELM.BaggingELM predict the outcome and related coefficient 0.9921 between actual value is greater than predicting the outcome of 0.9858, BaggingELM of ELM and the standard deviation between actual value 0.0001 is less than 0.0044 of ELM.This shows that BaggingELM improves the precision of prediction of ELM model, and has better stability.
Embodiment 2:
The present embodiment is applied to ultraviolet spectral analysis, measures ethanol component content value.Concrete step is as follows:
(1) gather the near infrared spectrum data of 95 ethanolic solution samples, wavelength coverage is 850-1049nm, and sampling interval is 1nm, comprises 200 wavelength points, and spectrum adopts HP8453 spectrophotometer.According on website to the division of data set, 65 samples are used as training set, and 30 samples are used as forecast set.
(2) carry out boostrap resampling to training set sample, random selecting some samples are as a training subset.
(3) extreme learning machine Optimum Excitation function and hidden layer nodes is determined, with the Sample Establishing extreme learning machine submodel of training subset.
Repeat (2)-(3) step repeatedly, set up multiple submodel.
(4) for unknown sample, by the arithmetic mean that predicts the outcome of multiple submodel, finally predicted the outcome.
The defining method of submodel number: given 500 sub-model number values, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error (Rootmeansquareerrorofprediction, RMSEP) along with the change of submodel number, the constant or almost Number of Models set up of constant (tending towards stability) Shi Weiying of RMSEP value.In this embodiment RMSEP along with submodel number change as shown in Figure 6, group pattern number is after 500, and RMSEP value is almost constant, so the Number of Models set up is 500.
The choosing method of some samples is: stator Number of Models is 500, by 5% ~ 100% of sample number, at interval of 5%, change the number (during non-integer, the method for truncating rounds) of the sample be selected, calculate RMSEP value, the sample number that RMSEP is minimum or corresponding when tending to be steady is the sample number chosen that at every turn circulates.In this embodiment RMSEP value along with training subset sample percentage change as shown in Figure 7, when training subset sample number reaches 30 ~ 100% of training sample sum, RMSEP value reaches minimum and almost constant, therefore, training subset sample number be total sample number 30 ~ 100% can, this example choose that training subset sample number is total number of samples 60%.
Extreme learning machine Optimum Excitation function and hidden layer nodes defining method are: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, Optimum Excitation function corresponding when RMSEP reaches minimum in this embodiment and hidden layer nodes are: excitation function radbas and nodes 35.
In order to compare the stability of BaggingELM and ELM, two kinds of methods being reruned 20 times respectively, obtaining the change of RMSEP value with number of run, as shown in Figure 8.As can be seen from the figure, the rerun RMSEP fluctuation of 20 times of ELM algorithm is comparatively large, and the less stable of model is described; And less with the fluctuation predicted the outcome for 20 times of BaggingELM algorithm, and substantially tend towards stability, show good stability; The stability repeatedly run this demonstrating BaggingELM obviously will be better than ELM.On the other hand, the RMSEP value of BaggingELM, always lower than ELM, shows the integrated precision of prediction that improve ELM algorithm of Bagging.Fig. 9 shows the relation that BaggingELM and ELM carries out mean predicted value that 20 computings predict forecast set and actual value respectively, and wherein vertical line represents the deviation run for 20 times, and as can be seen from the figure, BaggingELM deviation will be far smaller than the deviation of ELM.BaggingELM predict the outcome and related coefficient 0.9988 between actual value is greater than predicting the outcome of 0.9957, BaggingELM of ELM and the standard deviation between actual value 0.00015 is less than 0.0023 of ELM.This demonstrate the precision of prediction that BaggingELM improves ELM model, and there is better stability.
Embodiment 3:
The present embodiment is applied to near-infrared spectrum analysis, measures the density value of diesel oil sample.Concrete step is as follows:
(1) near infrared spectrum data of 263 diesel fuel samples is gathered, wavelength coverage is 750-1550nm, sampling interval 2nm, comprise 401 wavelength points, data by US military SouthwestResearchInstitute (SWRI), SanAntonio, TXthroughEigenvectorResearch, Inc. (Manson, Washington) provides, and downloads network address: http://www.eigenvector.com/Data/SWRI).According on website to the division of data set, 142 samples are used as training set, and 121 samples are used as forecast set.
(2) carry out boostrap resampling to training set sample, random selecting some samples are as a training subset.
(3) extreme learning machine Optimum Excitation function and hidden layer nodes is determined, with the Sample Establishing extreme learning machine submodel of training subset.
Repeat (2)-(3) step repeatedly, set up multiple submodel.
(4) for unknown sample, by the arithmetic mean that predicts the outcome of multiple submodel, finally predicted the outcome.
The defining method of submodel number: given 500 sub-model number values, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error (Rootmeansquareerrorofprediction, RMSEP) along with the change of submodel number, the constant or almost Number of Models set up of constant (tending towards stability) Shi Weiying of RMSEP value.In this embodiment RMSEP along with submodel number change as shown in Figure 10, group pattern number is after 500, and RMSEP value is almost constant, so the Number of Models set up is 500.
The choosing method of some samples is: stator Number of Models is 500, by 5% ~ 100% of sample number, at interval of 5%, change the number (during non-integer, the method for truncating rounds) of the sample be selected, calculate RMSEP value, the sample number that RMSEP is minimum or corresponding when tending to be steady is the sample number chosen that at every turn circulates.In this embodiment RMSEP value along with training subset sample percentage change as shown in figure 11, when training subset sample number reaches 40 ~ 100% of training sample sum, RMSEP value reaches minimum and almost constant, therefore, training subset sample number be total sample number 40 ~ 100% can, this example choose that training subset sample number is total number of samples 50%.
Extreme learning machine Optimum Excitation function and hidden layer nodes defining method are: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, Optimum Excitation function corresponding when RMSEP reaches minimum in this embodiment and hidden layer nodes are: excitation function tribas and nodes 48.
In order to compare the stability of BaggingELM and ELM, two kinds of methods being reruned 20 times respectively, obtaining the change of RMSEP value with number of run, as shown in figure 12.As can be seen from the figure, the rerun RMSEP fluctuation of 20 times of ELM algorithm is comparatively large, and the less stable of model is described; And less with the fluctuation predicted the outcome for 20 times of BaggingELM algorithm, and substantially tend towards stability, show good stability; The stability repeatedly run this demonstrating BaggingELM obviously will be better than ELM.On the other hand, the RMSEP value of BaggingELM, always lower than ELM, shows the integrated precision of prediction that improve ELM algorithm of Bagging.Figure 13 shows the relation that BaggingELM and ELM carries out mean predicted value that 20 computings predict forecast set and actual value respectively, and wherein vertical line represents the deviation run for 20 times, and as can be seen from the figure, BaggingELM deviation will be far smaller than the deviation of ELM.BaggingELM predict the outcome and related coefficient 0.9970 between actual value is greater than predicting the outcome of 0.9923, BaggingELM of ELM and the standard deviation between actual value 0.00014 is less than 0.0031 of ELM.This demonstrate the precision of prediction that BaggingELM improves ELM model, and there is better stability.
Embodiment 4:
The present embodiment is applied near infrared spectral transmission analysis, measures the content of blood sample.Concrete step is as follows:
(1) the near-infrared transmission data of 231 blood samples are gathered, wavelength coverage is 1100-2498nm, sampling interval 2nm, comprise 700 wavelength points, spectrum adopts NIRsystemsspectrometermodel6500 instrument (NIRsystems, Inc., SilverSprings, USA) reflective-mode measures, and downloads network address: http://www.idrc-chambersburg.org/shootout2010.html.According on website to the division of data set, 143 samples are used as training set, and 47 samples are used as forecast set.
(2) carry out boostrap resampling to training set sample, random selecting some samples are as a training subset.
(3) extreme learning machine Optimum Excitation function and hidden layer nodes is determined, with the Sample Establishing extreme learning machine submodel of training subset.
Repeat (2)-(3) step repeatedly, set up multiple submodel.
(4) for unknown sample, make arithmetic mean by predicting the outcome of multiple submodel, finally predicted the outcome.
The defining method of submodel number: given 500 sub-model number values, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error (Rootmeansquareerrorofprediction, RMSEP) along with the change of submodel number, the constant or almost Number of Models set up of constant (tending towards stability) Shi Weiying of RMSEP value.In this embodiment RMSEP along with submodel number change as shown in figure 14, group pattern number is after 500, and RMSEP value is almost constant, so the Number of Models set up is 500.
The choosing method of some samples is: stator Number of Models is 500, by 5% ~ 100% of sample number, at interval of 5%, change the number (during non-integer, the method for truncating rounds) of the sample be selected, calculate RMSEP value, the sample number that RMSEP is minimum or corresponding when tending to be steady is the sample number chosen that at every turn circulates.In this embodiment RMSEP value along with training subset sample percentage change as shown in figure 15, when training subset sample number reaches 60% of training sample sum, RMSEP value is minimum, and the training subset sample number chosen that at every turn circulates is 60% of total number of samples.
Extreme learning machine Optimum Excitation function and hidden layer nodes defining method are: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, Optimum Excitation function corresponding when RMSEP reaches minimum in this embodiment and hidden layer nodes are: excitation function sig and nodes 37.
In order to compare the stability of BaggingELM and ELM, two kinds of methods being reruned 20 times respectively, obtaining the change of RMSEP value with number of run, as shown in figure 16.As can be seen from the figure, the rerun RMSEP fluctuation of 20 times of ELM algorithm is comparatively large, and the less stable of model is described; And less with the fluctuation predicted the outcome for 20 times of BaggingELM algorithm, and substantially tend towards stability, show good stability; The stability repeatedly run this demonstrating BaggingELM obviously will be better than ELM.On the other hand, the RMSEP value of BaggingELM, always lower than ELM, shows the integrated precision of prediction that improve ELM algorithm of Bagging.Figure 17 shows the relation that BaggingELM and ELM carries out mean predicted value that 20 computings predict forecast set and actual value respectively, and wherein vertical line represents the deviation run for 20 times, and as can be seen from the figure, BaggingELM deviation will be far smaller than the deviation of ELM.BaggingELM predict the outcome and related coefficient 0.9774 between actual value is greater than predicting the outcome of 0.9432, baggingELM of ELM and the standard deviation between actual value 0.0008 is less than 0.0268 of ELM.This demonstrate the precision of prediction that baggingELM improves ELM model, and there is better stability.
Embodiment 5:
The present embodiment is applied to near-infrared spectrum analysis, measures the content value of Virginian-type cigarette Powder samples chlorine.Concrete step is as follows:
(1) gather the near infrared spectrum data of the Virginian-type cigarette product of 58 trades mark, wave-number range is 4000 ~ 9000cm -1, sampling interval is 1cm -1, totally 5001 wavelength points, cigarette is prepared into powdered sample according to YC/331-1996, and particle mean size is 0.45mm, and spectrum adopts Vector22/NFT-NIRSystem (Bruker) spectrophotometer.In sample, the content of chlorine adopts AutoAnalyzerIII type Continuous Flow Analysis instrument to measure according to standard method.Adopt KS group technology to divide data set, 38 samples are as training set, and 20 samples are as forecast set.
(2) carry out boostrap resampling to training set sample, random selecting some samples are as a training subset.
(3) extreme learning machine Optimum Excitation function and hidden layer nodes is determined, with the Sample Establishing extreme learning machine submodel of training subset.
Repeat (2)-(3) step repeatedly, set up multiple submodel.
(4) for unknown sample, make arithmetic mean by predicting the outcome of multiple submodel, finally predicted the outcome.
The defining method of submodel number: given 500 sub-model number values, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error (Rootmeansquareerrorofprediction, RMSEP) along with the change of submodel number, the constant or almost Number of Models set up of constant (tending towards stability) Shi Weiying of RMSEP value.In this embodiment RMSEP along with submodel number change as shown in figure 18, group pattern number is after 500, and RMSEP value is almost constant, so the Number of Models set up is 500.
The choosing method of some samples is: stator Number of Models is 500, by 5% ~ 100% of sample number, at interval of 5%, change the number (during non-integer, the method for truncating rounds) of the sample be selected, calculate RMSEP value, the sample number that RMSEP is minimum or corresponding when tending to be steady is the sample number chosen that at every turn circulates.In this embodiment RMSEP value along with training subset sample percentage change as shown in figure 19, when training subset sample number reaches 60 ~ 100% of training sample sum, RMSEP value reaches minimum and almost constant, therefore, training subset sample number be total sample number 60 ~ 100% can, this example choose that training subset sample number is total number of samples 60%.
Extreme learning machine Optimum Excitation function and hidden layer nodes defining method are: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, Optimum Excitation function corresponding when RMSEP reaches minimum in this embodiment and hidden layer nodes are: excitation function sig and nodes 69.
In order to compare the stability of BaggingELM and ELM, two kinds of methods being reruned 20 times respectively, obtaining the change of RMSEP value with number of run, as shown in figure 20.As can be seen from the figure, the rerun RMSEP fluctuation of 20 times of ELM algorithm is comparatively large, and the less stable of model is described; And less with the fluctuation predicted the outcome for 20 times of BaggingELM algorithm, and substantially tend towards stability, show good stability; The stability repeatedly run this demonstrating BaggingELM obviously will be better than ELM.On the other hand, the RMSEP value of BaggingELM, always lower than ELM, shows the integrated precision of prediction that improve ELM algorithm of Bagging.Figure 21 shows the relation that BaggingELM and ELM carries out mean predicted value that 20 computings predict forecast set and actual value respectively, and wherein vertical line represents the deviation run for 20 times, and as can be seen from the figure, BaggingELM deviation will be far smaller than the deviation of ELM.BaggingELM predict the outcome and related coefficient 0.9762 between actual value is greater than predicting the outcome of 0.9635, BaggingELM of ELM and the standard deviation between actual value 0.0007 is less than 0.0062 of ELM.This demonstrate the precision of prediction that BaggingELM improves ELM model, and there is better stability.
Above-described embodiment illustrates that the method is that complex material quantitative test provides a kind of complex sample quantitative test modeling method based on Bagging extreme learning machine, and this method can improve stability and the precision of prediction of model greatly.

Claims (4)

1. a Bagging extreme learning machine integrated modelling approach, is characterized in that concrete steps are:
(1) gather the spectroscopic data of measured object sample, measure the content of the tested composition of sample; Sample set is divided into training set and forecast set;
(2) carry out boostrap resampling to training set sample, random selecting some samples are as a training subset;
(3) extreme learning machine Optimum Excitation function and hidden layer nodes is determined, with the Sample Establishing extreme learning machine submodel of training subset;
Repeat step (2), step (3) repeatedly, set up N number of submodel;
(4) for unknown sample, by making arithmetic mean to predicting the outcome of multiple submodel, finally predicted the outcome.
2. Bagging extreme learning machine integrated modelling approach according to claim 1, it is characterized in that: the defining method of the number of described N number of submodel is as follows: a given enough large submodel number value, the training subset sample number of fixing each data set is 50% of total number of samples, computational prediction root-mean-square error, be designated as RMSEP, and observe the change of RMSEP along with submodel number; When RMSEP value is constant or tend towards stability, the submodel number corresponding to it is required number N.
3. Bagging extreme learning machine integrated modelling approach according to claim 2, it is characterized in that: the defining method of the number of samples of described training subset is as follows: stator Number of Models, by 5% ~ 100% of sample number, at interval of 5%, change the number of the sample be selected, during non-integer, the method for truncating rounds, and calculates RMSEP value, the number of samples that RMSEP is minimum or corresponding when tending to be steady, is the number of samples chosen that at every turn circulates.
4. Bagging extreme learning machine integrated modelling approach according to claim 3, it is characterized in that: describedly determine that the concrete grammar of extreme learning machine Optimum Excitation function and hidden layer nodes is as follows: according to the RMSEP value of training set spectrum with the change choosing excitation function and hidden layer nodes, when RMSEP reaches minimum, the excitation function corresponding to RMSEP and hidden layer nodes are optimal parameter.
CN201510466504.7A 2015-07-31 2015-07-31 Bagging extreme learning machine integrated modelling approach Active CN105117525B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510466504.7A CN105117525B (en) 2015-07-31 2015-07-31 Bagging extreme learning machine integrated modelling approach

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510466504.7A CN105117525B (en) 2015-07-31 2015-07-31 Bagging extreme learning machine integrated modelling approach

Publications (2)

Publication Number Publication Date
CN105117525A true CN105117525A (en) 2015-12-02
CN105117525B CN105117525B (en) 2018-05-15

Family

ID=54665513

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510466504.7A Active CN105117525B (en) 2015-07-31 2015-07-31 Bagging extreme learning machine integrated modelling approach

Country Status (1)

Country Link
CN (1) CN105117525B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529680A (en) * 2016-10-27 2017-03-22 天津工业大学 Multiscale extreme learning machine integrated modeling method based on empirical mode decomposition
CN106650926A (en) * 2016-09-14 2017-05-10 天津工业大学 Robust boosting extreme learning machine integrated modeling method
CN107356556A (en) * 2017-07-10 2017-11-17 天津工业大学 A kind of double integrated modelling approach of Near-Infrared Spectra for Quantitative Analysis
CN109325516A (en) * 2018-08-13 2019-02-12 众安信息技术服务有限公司 A kind of integrated learning approach and device towards image classification
CN113094892A (en) * 2021-04-02 2021-07-09 辽宁石油化工大学 Oil concentration prediction method based on data elimination and local partial least squares
CN115691703A (en) * 2022-10-15 2023-02-03 苏州创腾软件有限公司 Drug property prediction method and system based on pharmacokinetic model
CN117150877A (en) * 2022-05-23 2023-12-01 北京理工大学 Method for predicting optimal pressing process of press-loading mixed explosive based on Bagging algorithm

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226728A (en) * 2013-04-07 2013-07-31 北京化工大学 Intelligent detection and yield optimization method for HDPE (high density polyethylene) cascade polymerization reaction course
CN103528990A (en) * 2013-10-31 2014-01-22 天津工业大学 Method for establishing multiple models of near infrared spectrums
CN103593550A (en) * 2013-08-12 2014-02-19 东北大学 Pierced billet quality modeling and prediction method based on integrated mean value staged RPLS-OS-ELM
CN104463251A (en) * 2014-12-15 2015-03-25 江苏科技大学 Cancer gene expression profile data identification method based on integration of extreme learning machines
CN104573699A (en) * 2015-01-21 2015-04-29 中国计量学院 Trypetid identification method based on medium field intensity magnetic resonance dissection imaging

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226728A (en) * 2013-04-07 2013-07-31 北京化工大学 Intelligent detection and yield optimization method for HDPE (high density polyethylene) cascade polymerization reaction course
CN103593550A (en) * 2013-08-12 2014-02-19 东北大学 Pierced billet quality modeling and prediction method based on integrated mean value staged RPLS-OS-ELM
CN103528990A (en) * 2013-10-31 2014-01-22 天津工业大学 Method for establishing multiple models of near infrared spectrums
CN104463251A (en) * 2014-12-15 2015-03-25 江苏科技大学 Cancer gene expression profile data identification method based on integration of extreme learning machines
CN104573699A (en) * 2015-01-21 2015-04-29 中国计量学院 Trypetid identification method based on medium field intensity magnetic resonance dissection imaging

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郝勇等: "亚麻酸红外光谱定量分析模型构建方法研究", 《中国农机化学报》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650926A (en) * 2016-09-14 2017-05-10 天津工业大学 Robust boosting extreme learning machine integrated modeling method
CN106650926B (en) * 2016-09-14 2019-04-16 天津工业大学 A kind of steady boosting extreme learning machine integrated modelling approach
CN106529680A (en) * 2016-10-27 2017-03-22 天津工业大学 Multiscale extreme learning machine integrated modeling method based on empirical mode decomposition
CN106529680B (en) * 2016-10-27 2019-01-29 天津工业大学 A kind of multiple dimensioned extreme learning machine integrated modelling approach based on empirical mode decomposition
CN107356556A (en) * 2017-07-10 2017-11-17 天津工业大学 A kind of double integrated modelling approach of Near-Infrared Spectra for Quantitative Analysis
CN109325516A (en) * 2018-08-13 2019-02-12 众安信息技术服务有限公司 A kind of integrated learning approach and device towards image classification
CN113094892A (en) * 2021-04-02 2021-07-09 辽宁石油化工大学 Oil concentration prediction method based on data elimination and local partial least squares
CN117150877A (en) * 2022-05-23 2023-12-01 北京理工大学 Method for predicting optimal pressing process of press-loading mixed explosive based on Bagging algorithm
CN115691703A (en) * 2022-10-15 2023-02-03 苏州创腾软件有限公司 Drug property prediction method and system based on pharmacokinetic model

Also Published As

Publication number Publication date
CN105117525B (en) 2018-05-15

Similar Documents

Publication Publication Date Title
CN105117525A (en) Bagging extreme learning machine integrated modeling method
CN104062257B (en) A kind of based on the method for general flavone content near infrared ray solution
CN105300923B (en) Without measuring point model of temperature compensation modification method during a kind of near-infrared spectrometers application on site
CN104020127B (en) A kind of near infrared spectrum is utilized quickly to measure the method for inorganic elements in Nicotiana tabacum L.
CN106841075B (en) COD ultraviolet spectra on-line checking optimization method neural network based
CN103048339B (en) Soil moisture detection method and soil moist detection device
CN101988895A (en) Method for predicting single-type crude oil content in mixed crude oil by near infrared spectrum
CN106650926B (en) A kind of steady boosting extreme learning machine integrated modelling approach
CN108152239A (en) The sample composition content assaying method of feature based migration
CN105486658A (en) Near-infrared physical property parameter measuring method without measuring point temperature compensation
CN110969282A (en) Runoff stability prediction method based on LSTM composite network
CN102830096A (en) Method for measuring element concentration and correcting error based on artificial neural network
CN104730042A (en) Method for improving free calibration analysis precision by combining genetic algorithm with laser induced breakdown spectroscopy
CN105823751B (en) Infrared spectrum Multivariate Correction regression modeling method based on λ-SPXY algorithms
CN105319179B (en) A kind of method using middle infrared spectrum prediction hydrogen sulfide content in desulfurized amine
CN103398971A (en) Chemometrics method for determining cetane number of diesel oil
Cai et al. On-line multi-gas component measurement in the mud logging process based on Raman spectroscopy combined with a CNN-LSTM-AM hybrid model
CN107966499A (en) A kind of method by near infrared spectrum prediction crude oil carbon number distribution
CN105550457B (en) Dynamic Evolution Model bearing calibration and system
CN105466885B (en) Based on the near infrared online measuring method without measuring point temperature-compensating mechanism
CN107356556A (en) A kind of double integrated modelling approach of Near-Infrared Spectra for Quantitative Analysis
CN105092509A (en) Sample component measurement method based on PCR-ELM algorithm
CN108120694A (en) For the multivariate calibration methods and system of Dark sun-cured chemical composition analysis
CN106529680A (en) Multiscale extreme learning machine integrated modeling method based on empirical mode decomposition
CN102706855A (en) Flammable liquid flash point prediction method based on Raman spectroscopy

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 300387 Tianjin city Xiqing District West Binshui Road No. 399

Applicant after: Tianjin Polytechnic University

Applicant after: Shanghai Sui Hua Industrial Limited by Share Ltd

Address before: 300387 Tianjin city Xiqing District West Binshui Road No. 399

Applicant before: Tianjin Polytechnic University

Applicant before: Shanghai Huishan Industrial Co., Ltd.

GR01 Patent grant
GR01 Patent grant