CN109459408A - A kind of Near-Infrared Quantitative Analysis method based on sparse regression LAR algorithm - Google Patents

A kind of Near-Infrared Quantitative Analysis method based on sparse regression LAR algorithm Download PDF

Info

Publication number
CN109459408A
CN109459408A CN201710793695.7A CN201710793695A CN109459408A CN 109459408 A CN109459408 A CN 109459408A CN 201710793695 A CN201710793695 A CN 201710793695A CN 109459408 A CN109459408 A CN 109459408A
Authority
CN
China
Prior art keywords
covariant
model
regression
lar
coefficient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201710793695.7A
Other languages
Chinese (zh)
Inventor
刘聪
徐友武
阳程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangcheng Institute of Technology
Yancheng Institute of Technology
Original Assignee
Yangcheng Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangcheng Institute of Technology filed Critical Yangcheng Institute of Technology
Priority to CN201710793695.7A priority Critical patent/CN109459408A/en
Publication of CN109459408A publication Critical patent/CN109459408A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Algebra (AREA)
  • Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Operations Research (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The building of quantitative analysis regression model based near infrared spectrum is the core link during entire Near-Infrared Spectra for Quantitative Analysis, and link the most complicated.Minimum angles return (Least angle regression, LAR), are a kind of sparse regression algorithms based on linear model.Minimum angles recurrence is similar with the process that forward direction returns paragraph by paragraph, but keeps computational efficiency higher using mathematical formulae.It is no longer that the step that multiple very littles and length are fixed is carried out on current variable, the appropriate length of step is calculated by mathematical method to be determined, until the correlation of next variable is caught up with.Also, for minimum angles homing method without the coefficient adjustment of small step is carried out in turn between currently having chosen variable until another variable enters model, this method directly jumps to that suitable point according to determining step-length.LAR and other conventional methods the difference is that, irrelevant variable is abandoned to generate a sparse model;It is influenced to less by noise.The present invention proposes that the Near-Infrared Spectra for Quantitative Analysis detection method based on sparse regression LAR algorithm, opposite conventional method have a clear superiority.

Description

A kind of Near-Infrared Quantitative Analysis method based on sparse regression LAR algorithm
Technical field
The quantitative analysis detection method based near infrared spectrum that the present invention relates to a kind of.
Background technique
The building of quantitative analysis regression model based near infrared spectrum, during being entire Near-Infrared Spectra for Quantitative Analysis Core link, and link the most complicated.Near-Infrared Spectra for Quantitative Analysis is high dimensional and small sample size problem, and spectral Dimensions are general All thousands of or even thousands of dimensions, and there are highly linear cross-correlation between each dimension spectroscopic data.Effectively believe near infrared spectrum It is number faint, it needs to extract small-signal relevant with target quality parameter in the huge spectral information of higher-dimension, and establish back Return prediction model, this is challenging task.And this be exactly in machine learning homing method be good at it is to be solved Problem.It is other that the homing method of machine learning can be divided into linear and nonlinear method two major classes.Wherein based on the machine of linear model Learning method is easy to understand because its is simple and quick, is widely welcome, and is the most frequently used near infrared spectrum quality quantitative detection Method.Offset minimum binary (Partial least squares, PLS) is that most wide side is used to obtain in linear regression method again Method;Other such as multiple linear regression (Multiple linear regression, MLR) and principal component regressions (Principal component regression, PCR) is also often used.
Multiple linear regression MLR is earliest near-infrared regression modeling method.Since the High Linear between spectroscopic data is related, Simple multiple linear regression effect is generally not fine.PLS is most widely used homing method in near-infrared spectrum analysis. PLS is used for the quantitative analysis detection of a large amount of near infrared spectrum.Due to overcoming the highly linear relevant issues between spectrum, PLS Prediction effect be generally preferred over MLR.Principal component regression PCR is that linear regression is carried out in principal component.Since it is simply easy to real Existing, PCR is also employed in some researchs, but effect is not so good as PLS.
Non-linear machine learning method is also successfully applied to the Near-Infrared Quantitative Analysis detection of quality of agricultural product.So And in terms of the comprehensibility of model, in terms of such as finding maximally related spectral band by model, the machine based on linear model Learning method is more preferable than nonlinear.Although some special technologies are proposed to find and select most important spy Sign, these methods are all very complicated and calculation amount is very big, and relatively easy directly based on linear method, and are easy to understand With use.For this reason that the PLS method based on linear model is most common method in near-infrared spectrum analysis.
Due to containing complicated physics and optical phenomena in near-infrared (near-infrared, NIR) spectra collection, closely It inevitably include noise in infrared spectroscopy.It is generally acknowledged that noise has smaller variance than signal.In order to reduce noise, PCR abandons the direction of small variance.PLS also tends to compress small variance direction, but can amplify some high variance directions again simultaneously. This, which will lead to PLS, has a bit unstable.Also, PLS reduces the weight of noise characteristic, but does not abandon them;Therefore it largely makes an uproar Sound still influences whether the estimated performance of PLS.And high cross-correlation variable tends to be chosen simultaneously, causes to exist in selected variables set A large amount of redundancy.
Summary of the invention
Minimum angles return (Least angle regression, LAR), are a kind of sparse times based on linear model Reduction method.LAR and PLS scheduling algorithm the difference is that, by irrelevant variable abandon to generate a sparse model;To It is less to be influenced by noise.LAR and lasso (Least absolute shrinkage and selection Operator) it is closely related, in fact the variant of LAR provides the algorithm for calculating the ultrahigh in efficiency in the complete path lasso.
To gradually before minimum angles recurrence (Least angle regression, LAR) and traditional model selection method Recurrence is closely related.Forward stepwire regression, since all coefficients are all zero, then one variable of primary addition gradually, structure A series of model is built, and updates least square coefficient.Forward stepwire regression once selects a variable that model is added to obtain Best least square fitting.This process is continued for the standard until reaching some stoppings.Forward stepwire regression is greedy calculation Method is because it seeks the influence that is optimal and ignoring its future of each single step.
Forward direction returns negative effect similar with forward stepwire regression, but focusing on reducing greedy behavior in successive Regression paragraph by paragraph Fruit.In successive Regression, the most useful variable is all added in model by each step, and the coefficient of the variable is leapt to most from zero Small two multiplying factors value.Forward direction returns first variable of selection as successive Regression paragraph by paragraph, but only changes its coefficient one Lesser amount.Then reselection and the maximally related variable of current residue, this variable may be the same change selected by back Amount.The coefficient of this same variable only changes a little.This process continues always in this way.When a variable is than other changes When amount has apparent initial advantage, this variable will have continuous multiple steps and be selected.Thereafter, when there is multiple variables in model When, this selection process will carry out in turn between these variables.The coefficient that the coefficient ratio stepwise regression method generated in this way obtains It is more stable.
Minimum angles recurrence is similar with the process that forward direction returns paragraph by paragraph, but keeps computational efficiency higher using mathematical formulae. It is no longer that the step that multiple very littles and length are fixed is carried out on current variable, the appropriate length of step passes through mathematical method and calculates It determines, until the correlation of next variable is caught up with.Also, minimum angles homing method is without currently choosing variable Between carry out the coefficient adjustment of small step in turn until another variable enters model, this method is directly jumped to according to determining step-length That suitable point.
The absolute value of related coefficient between first covariant of residual sum is than the related coefficient between other covariants Absolute value is big.When the regression coefficient of first covariant shifts to its least square value (phase relation between this point and residual error Number will become zero) when, and the related coefficient of residual error constantly reduces, and it is related between residual error finally always to have another covariant Coefficient is equal thereto.At this moment that variable is just used as second activity variable (selected variable) that model is added.Then the two are assisted The coefficient of variable is all mobile to their least square value, until the related coefficient of third variable is caught up with.It is returned in higher-dimension In problem, model will be finally added in other covariants, when the related coefficient between all activity variables and residual error drop to and other The same level of covariant.
Assuming that a total of n measurement sample, each sample have p covariant measurement and a response measurement value.VectorIt isA length be n covariant (=1,2 ..., p), y is in response to variable (length is also n),It is comprising returning The length of coefficient is the vector of p,It isA covariant regression coefficient (=1,2 ..., p), regression residuals r is long The vector (the corresponding sample of each element) that degree is n.The process of LAR algorithm can be summarized as follows:
1) all covariants of near infrared spectrum data are standardized, make their mean value zero and variance is 1.Residual error The initial value of r is equal to the response variable after placed in the middleization: (It is the mean value of y).All regression coefficients are zero:
2) it finds out and the maximally related covariant of residual error r
3) regression coefficientFrom 0 to its least square coefficient <(With the inner product of residual error r) it is mobile, until it is some its Its covariantIt is caught up with the related coefficient of current residueRelated coefficient;
4) simultaneously along current residue () on joint least-squares coefficient direction, mobile regression coefficientWith, Until some other covariantRelated coefficient catch up with;
5) continue this process, be equal to until covariant number in model or model is added in all covariants.Work as institute After having covariant that LAR model is added, as a result as common least square.
According to above algorithm steps, the covariant chosen sequentially enters model according to its significance level.Optimal mould Type can generally abandon some unrelated or unessential covariant, such as, k covariant before only retaining.Hyper parameter k, model Middle retained covariant number, can be determined by cross validation.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments, the present invention is carried out It is further described.
The quantitative analysis of agricultural product interior quality detects, and is that one of the quantitative analysis tech based near infrared spectrum is important The field of research and application.And the quantitative analysis application based near infrared spectrum of different research objects, used in method It communicates.Experiment illustration is used as using the quantitative analysis of the interior quality of navel orange.
The near infrared spectra collection of all samples, the absorption spectrum acquired using near infrared spectrometer in reflective-mode (log1/R).The wave-length coverage of scanning is from 1000nm to 2499nm, wavelength interval 1nm.Two 14.5 halogen lamp are as light Source.Spectrometer detection probe is perpendicular to sample fruit surface, apart from 10 millimeters of fruit surface.
Near infrared spectrum measures at the equator position of fruit sample.The maximum circumference institute of equator position, that is, fruit surface perimeter At position.The reflectance spectrum for being separated by about 120 degree of three points measurement pericarp surfaces each other is selected on the position of equator.These three point institutes Spectrum is surveyed to be averaged as the fruit sample ambitus surface spectrum measured value.
Interior quality data determination.Total soluble solids (Total Soluble Solids, TSS), titratable acid The true value of (Titratable Acidity, TA) and vitamin C (Vitamin C, VC) three kinds of inside quality parameters is by passing Destructive test chemical method of uniting measures.Total soluble solids (TSS) are that the sugar of one of most common index of quality and navel orange contains It measures also highly relevant.Titratable acid (TA) is the key parameter for embodying fruit internal quality, is the most important index for influencing taste One of.
Total soluble solids (TSS), titratable acid (TA) and vitamin C (VC), are that can reflect navel orange inside quality more comprehensively Three kinds of important parameters.Another important common index of quality is maturity (Gu sour ratio), is total soluble solids and titratable The ratio of acid, can also be calculated by surveyed parameter.The actual value of parameter passes through traditional destructive test method measurement.
Total soluble solids TSS assay: it squeezes the juice, is then filtered with double gauze, and make after navel orange sample removal pericarp Fruit juice is uniformly mixed.Then take supernatant therein at room temperature with Japan produce hand-held digital display refractometer (ATAGO, PAL-ES3, Japan total soluble solids TSS content) is measured.Titratable acid TA assay: using determination of acid-basetitration fruit containing acid Amount.Accurate 10mL fruit juice of drawing adds distilled water to be settled to scale and shakes up, take dilution 10mL extremely into 100mL volumetric flask In 100mL triangular flask, 1% phenolphthalein indicator 2 is added to drip, to terminal with sodium hydroxide solution titration, solution is aobvious uniformly pink For titration end-point.Record titrates consumed sodium hydroxide solution volume, calculates acid content according to the volume of consumption.Vitamin C Assay: fruit Vitamin C content is measured using 2,6- sodium dichlorphenol indophenolate method.It is accurate to draw 10mL raw juice extremely In 100mL volumetric flask, mass concentration is that the oxalic acid solution of 1g/100mL is settled to scale and shakes up, and takes dilution 2mL extremely In 50mL triangular flask, to terminal with the titration of 2,6- sodium dichlorphenol indophenolate standard solution, solution is titration end-point in uniform light red. Vitamin C content is calculated according to the 2,6- sodium dichlorphenol indophenolate liquor capacity of consumption.
3/4ths in these data samples are selected as training dataset, for constructing prediction inside quality parameter Regression model;Another a quarter sample is used as the estimated performance that test data set carrys out assessment models.
Precision of prediction is core and important performance indicator the most for machine learning algorithm with regress analysis method.If cannot Reach the precision of prediction in tolerance interval, then the regression forecasting result of quantitative analysis is with regard to nonsensical.Model prediction accuracy Evaluation index, has used trained coefficient R, and training root-mean-square error RMSEC tests correlation coefficient r, tests root-mean-square error Five kinds of indexs of RMSEP, deviation Bias etc..This five kinds of indexs are also to carry out comprehensive and accurate evaluation to forecast of regression model precision Most common index.
Near infrared spectrum data inevitably includes noise.Before being used to detect navel orange quality parameter, in order to mitigate Influence of noise, near infrared spectrum data have used rolling average smoothing method and standard normal variable to correct (standard Normal variate, SNV) method pre-processes.Rolling average smoothing method is the pretreatment side for removing high-frequency noise Method;It is a kind of row transform method that standard normal variable, which corrects SNV, carries out placed in the middle and change of scale to each individual spectrum, makes Each spectrum average is that 0 variance is 1, for removing light scattering effect bring influence of noise.
Minimum angles return LAR algorithm and are used to construct the Quantitative Analysis Model based near infrared spectrum, predict navel orange Several most important inside quality parameters, total soluble solids TSS, titratable acid TA and vitamin C.
The result shows that the estimated performance that LAR is returned is always than most widely used method --- PLS returns more preferable.It is non-linear Homing method least square method supporting vector machine LS-SVM is the prediction generally acknowledged in current many Near-Infrared Spectra for Quantitative Analysis researchs The highest method of precision.The precision of prediction of LAR algorithm is larger better than the amplitude of PLS, and more connects with the precision of prediction of LS-SVM Closely, gap is little.

Claims (1)

1. a kind of Near-Infrared Quantitative Analysis method based on sparse regression LAR algorithm, which is characterized in that including following main step It is rapid: (1) all covariants of near infrared spectrum data to be standardized, make their mean value zero and variance is 1, residual error The initial value of r is equal to the response variable after placed in the middleization, and all regression coefficients are zero;
(2) it finds out and the maximally related covariant of residual error
(3) regression coefficientFrom 0 to its least square coefficient <(With the inner product of residual error r) it is mobile, until it is some its Its covariantIt is caught up with the related coefficient of current residueRelated coefficient;(4) simultaneously along current residue () on Joint least-squares coefficient direction, mobile regression coefficientWith, until some other covariantRelated coefficient It catches up with;
(5) continue this process, be equal to until covariant number in model or model is added in all covariants, work as institute After having covariant that LAR model is added, as a result as common least square;
(6) according to above algorithm steps, the covariant chosen sequentially enters model, optimal model according to its significance level Some unrelated or unessential covariant can be generally abandoned, such as, k covariant before only retaining, hyper parameter k, in model The covariant number retained can be determined by cross validation.
CN201710793695.7A 2017-09-06 2017-09-06 A kind of Near-Infrared Quantitative Analysis method based on sparse regression LAR algorithm Withdrawn CN109459408A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710793695.7A CN109459408A (en) 2017-09-06 2017-09-06 A kind of Near-Infrared Quantitative Analysis method based on sparse regression LAR algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710793695.7A CN109459408A (en) 2017-09-06 2017-09-06 A kind of Near-Infrared Quantitative Analysis method based on sparse regression LAR algorithm

Publications (1)

Publication Number Publication Date
CN109459408A true CN109459408A (en) 2019-03-12

Family

ID=65605786

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710793695.7A Withdrawn CN109459408A (en) 2017-09-06 2017-09-06 A kind of Near-Infrared Quantitative Analysis method based on sparse regression LAR algorithm

Country Status (1)

Country Link
CN (1) CN109459408A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113345525A (en) * 2021-06-03 2021-09-03 谱天(天津)生物科技有限公司 Analysis method for reducing influence of covariates on detection result in high-throughput detection
CN116486969A (en) * 2023-06-25 2023-07-25 广东工业大学 Genetic algorithm-based material optimal correlation relation acquisition method and application

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105203498A (en) * 2015-09-11 2015-12-30 天津工业大学 Near infrared spectrum variable selection method based on LASSO

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105203498A (en) * 2015-09-11 2015-12-30 天津工业大学 Near infrared spectrum variable selection method based on LASSO

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CONG LIU ET AL.: "A comparative study for least angle regression on NIR spectra analysis to determine internal qualities of navel oranges", 《EXPERT SYSTEMS WITH APPLICATIONS》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113345525A (en) * 2021-06-03 2021-09-03 谱天(天津)生物科技有限公司 Analysis method for reducing influence of covariates on detection result in high-throughput detection
CN116486969A (en) * 2023-06-25 2023-07-25 广东工业大学 Genetic algorithm-based material optimal correlation relation acquisition method and application
CN116486969B (en) * 2023-06-25 2023-09-26 广东工业大学 Genetic algorithm-based material optimal correlation relation acquisition method and application

Similar Documents

Publication Publication Date Title
CN104062263B (en) The near-infrared universal model detection method of light physical property close fruit quality index
CN104062256B (en) A kind of flexible measurement method based near infrared spectrum
CN111855608B (en) Near-infrared nondestructive detection method for apple acidity based on fusion characteristic wavelength selection algorithm
CN105486655B (en) The soil organism rapid detection method of model is intelligently identified based on infrared spectroscopy
CN110455722A (en) Rubber tree blade phosphorus content EO-1 hyperion inversion method and system
Liu et al. A comparative study for least angle regression on NIR spectra analysis to determine internal qualities of navel oranges
CN111968080A (en) Hyperspectrum and deep learning-based method for detecting internal and external quality of Feicheng peaches
Yuan et al. Non-invasive measurements of ‘Yunhe’pears by vis-NIRS technology coupled with deviation fusion modeling approach
CN110987846B (en) Nitrate concentration prediction method based on iPLS-PA algorithm
CN104965973B (en) A kind of Apple Mould Core multiple-factor Non-Destructive Testing discrimination model and method for building up thereof
CN109324013A (en) A method of it is quickly analyzed using Gaussian process regression model building oil property near-infrared
CN101446548A (en) Device for realizing measurement of milk ingredient based on response conversion and method thereof
CN110320165A (en) The Vis/NIR lossless detection method of banana soluble solid content
WO2020186844A1 (en) Self-adaptive surface absorption spectrum analysis method and system, storage medium, and device
CN105548070A (en) Apple soluble solid near-infrared detection part compensation method and system
Liang et al. Determination and visualization of different levels of deoxynivalenol in bulk wheat kernels by hyperspectral imaging
CN102072767A (en) Wavelength similarity consensus regression-based infrared spectrum quantitative analysis method and device
Nturambirwe et al. Detecting bruise damage and level of severity in apples using a contactless nir spectrometer
CN102841069B (en) Method for rapidly identifying types of crude oil by using mid-infrared spectrum
CN106018331B (en) The method for estimating stability and pretreatment optimization method of multi-channel spectral system
CN109459408A (en) A kind of Near-Infrared Quantitative Analysis method based on sparse regression LAR algorithm
CN101788459B (en) Quasi-continuous spectroscopic wavelength combination method
CN109100315B (en) Wavelength selection method based on noise-signal ratio
Zhang et al. Uninformative Biological Variability Elimination in Apple Soluble Solids Content Inspection by Using Fourier Transform Near‐Infrared Spectroscopy Combined with Multivariate Analysis and Wavelength Selection Algorithm
CN113030011A (en) Rapid nondestructive testing method and system for sugar content of fruits

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20190312

WW01 Invention patent application withdrawn after publication