WO2002014812A1 - Systeme et procede automatises pour analyse spectroscopique - Google Patents
Systeme et procede automatises pour analyse spectroscopique Download PDFInfo
- Publication number
- WO2002014812A1 WO2002014812A1 PCT/US2001/025165 US0125165W WO0214812A1 WO 2002014812 A1 WO2002014812 A1 WO 2002014812A1 US 0125165 W US0125165 W US 0125165W WO 0214812 A1 WO0214812 A1 WO 0214812A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- derivative
- smoothing
- transform
- normalization
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 99
- 238000004611 spectroscopical analysis Methods 0.000 title claims description 12
- 230000003595 spectral effect Effects 0.000 claims abstract description 99
- 239000000470 constituent Substances 0.000 claims abstract description 90
- 238000010200 validation analysis Methods 0.000 claims abstract description 87
- 238000004458 analytical method Methods 0.000 claims abstract description 19
- 238000012628 principal component regression Methods 0.000 claims abstract description 14
- 239000013076 target substance Substances 0.000 claims abstract description 14
- 238000012417 linear regression Methods 0.000 claims abstract description 10
- 230000001537 neural effect Effects 0.000 claims abstract description 5
- 238000009499 grossing Methods 0.000 claims description 92
- 238000012937 correction Methods 0.000 claims description 87
- 230000006870 function Effects 0.000 claims description 65
- 238000001228 spectrum Methods 0.000 claims description 64
- 238000010606 normalization Methods 0.000 claims description 61
- 238000002835 absorbance Methods 0.000 claims description 32
- 238000004364 calculation method Methods 0.000 claims description 30
- 238000006243 chemical reaction Methods 0.000 claims description 28
- 230000005540 biological transmission Effects 0.000 claims description 19
- 238000002834 transmittance Methods 0.000 claims description 14
- 238000007620 mathematical function Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 238000000691 measurement method Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 239000008280 blood Substances 0.000 claims description 4
- 210000004369 blood Anatomy 0.000 claims description 4
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 claims description 2
- 238000013528 artificial neural network Methods 0.000 claims description 2
- 230000031018 biological processes and functions Effects 0.000 claims description 2
- 239000008103 glucose Substances 0.000 claims description 2
- 229930014626 natural product Natural products 0.000 claims description 2
- 238000011165 process development Methods 0.000 claims description 2
- 239000002994 raw material Substances 0.000 claims description 2
- 238000000411 transmission spectrum Methods 0.000 claims description 2
- 239000000523 sample Substances 0.000 description 67
- 238000004422 calculation algorithm Methods 0.000 description 36
- 241000209140 Triticum Species 0.000 description 10
- 235000021307 Triticum Nutrition 0.000 description 10
- 238000005259 measurement Methods 0.000 description 8
- 239000000203 mixture Substances 0.000 description 8
- 230000005855 radiation Effects 0.000 description 8
- 238000011068 loading method Methods 0.000 description 7
- 238000012360 testing method Methods 0.000 description 7
- 238000012935 Averaging Methods 0.000 description 6
- 238000013480 data collection Methods 0.000 description 5
- 239000000126 substance Substances 0.000 description 5
- 239000012491 analyte Substances 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 108090000623 proteins and genes Proteins 0.000 description 3
- 102000004169 proteins and genes Human genes 0.000 description 3
- 101100154579 Arabidopsis thaliana NTR2 gene Proteins 0.000 description 2
- 101100026204 Gallus gallus NEGR1 gene Proteins 0.000 description 2
- 238000004566 IR spectroscopy Methods 0.000 description 2
- 102100022223 Neuronal growth regulator 1 Human genes 0.000 description 2
- 101710163698 Norsolorinic acid synthase Proteins 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000013501 data transformation Methods 0.000 description 2
- 238000004128 high performance liquid chromatography Methods 0.000 description 2
- 238000002329 infrared spectrum Methods 0.000 description 2
- 238000010238 partial least squares regression Methods 0.000 description 2
- 101100030847 Arabidopsis thaliana PROT2 gene Proteins 0.000 description 1
- 238000001134 F-test Methods 0.000 description 1
- 101000983747 Homo sapiens MHC class II transactivator Proteins 0.000 description 1
- 102100026371 MHC class II transactivator Human genes 0.000 description 1
- 238000004497 NIR spectroscopy Methods 0.000 description 1
- 101150116218 PROT1 gene Proteins 0.000 description 1
- 230000002547 anomalous effect Effects 0.000 description 1
- 235000013405 beer Nutrition 0.000 description 1
- 235000013361 beverage Nutrition 0.000 description 1
- 238000011088 calibration curve Methods 0.000 description 1
- 239000002775 capsule Substances 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000013479 data entry Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 229910001385 heavy metal Inorganic materials 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 150000002894 organic compounds Chemical class 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000013341 scale-up Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
- 239000002904 solvent Substances 0.000 description 1
- 239000010421 standard material Substances 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000013024 troubleshooting Methods 0.000 description 1
- 238000011179 visual inspection Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01J—MEASUREMENT OF INTENSITY, VELOCITY, SPECTRAL CONTENT, POLARISATION, PHASE OR PULSE CHARACTERISTICS OF INFRARED, VISIBLE OR ULTRAVIOLET LIGHT; COLORIMETRY; RADIATION PYROMETRY
- G01J3/00—Spectrometry; Spectrophotometry; Monochromators; Measuring colours
- G01J3/28—Investigating the spectrum
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
- G01N21/27—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands using photo-electric detection ; circuits for computing concentration
- G01N21/274—Calibration, base line adjustment, drift correction
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
- G01N21/31—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/145—Measuring characteristics of blood in vivo, e.g. gas concentration, pH value; Measuring characteristics of body fluids or tissues, e.g. interstitial fluid, cerebral tissue
- A61B5/14532—Measuring characteristics of blood in vivo, e.g. gas concentration, pH value; Measuring characteristics of body fluids or tissues, e.g. interstitial fluid, cerebral tissue for measuring glucose, e.g. by tissue impedance measurement
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01J—MEASUREMENT OF INTENSITY, VELOCITY, SPECTRAL CONTENT, POLARISATION, PHASE OR PULSE CHARACTERISTICS OF INFRARED, VISIBLE OR ULTRAVIOLET LIGHT; COLORIMETRY; RADIATION PYROMETRY
- G01J3/00—Spectrometry; Spectrophotometry; Monochromators; Measuring colours
- G01J3/28—Investigating the spectrum
- G01J2003/2866—Markers; Calibrating of scan
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01J—MEASUREMENT OF INTENSITY, VELOCITY, SPECTRAL CONTENT, POLARISATION, PHASE OR PULSE CHARACTERISTICS OF INFRARED, VISIBLE OR ULTRAVIOLET LIGHT; COLORIMETRY; RADIATION PYROMETRY
- G01J3/00—Spectrometry; Spectrophotometry; Monochromators; Measuring colours
- G01J3/02—Details
- G01J3/0264—Electrical interface; User interface
Definitions
- the present invention relates to the field of qualitative and quantitative spectroscopic analysis.
- Infrared spectroscopy is a technique which is based upon the vibrational changes of the atoms of a molecule.
- an infrared spectrum is generated by transmitting infrared radiation through a sample of an organic compound and determining what portion of the incident radiation are absorbed by the sample.
- An infrared spectrum is a plot of absorbance (or transmittance) against wavenumber, wavelength, or frequency.
- Infrared radiation is radiation having a wavelength between about 750 nm and about 1000 ⁇ .
- Near-infrared radiation is radiation having a wavelength between about 750 nm and about 2500 nm.
- the near-infrared reflectance or transmittance of a sample is measured at several discrete wavelengths, converted to absorbance or its equivalent reflectance term and then multiplied by a series of regression or weighting coefficients calculated through multiple-linear-regression mathematics.
- the absorbance (A) of an analyte in a non-absorbing solution at a specified wavelength is represented by the equation abc, wherein a is the absorptivity constant, b is the pathlength of light through the samples and c is the concentration of the analyte.
- abc the concentration of the analyte.
- the calibration sample consisted of a predetermined set of standards (i.e. samples of a known composition) which were run under the same conditions as the unknown samples, thereby allowing for the determination of the concentration of the unknowns.
- an automated method for modeling spectral data is provided.
- the samples are analyzed and spectral data is collected by the method of diffuse reflectance, clear transmission, or diffuse transmission.
- one or more constituent values are measured.
- a constituent value is a reference value for the target substance in the sample which is measured by a independent measurement technique.
- a constituent value used in conjunction with identifying a target substance in a pharmaceutical tablet sample might be the concentration of that substance in the tablet sample as measured by high pressure liquid chromatography (HPLC) analysis.
- HPLC high pressure liquid chromatography
- the set of spectral data (with its associated constituent values) is divided into a calibration sub-set and a validation sub-set.
- the calibration sub-set is selected to represent the variability likely to be encountered in the validation sub-set.
- a plurality of data transforms is then applied to the set of spectral data.
- the transforms are applied singularly and two-at-a- time.
- the particular transforms used, and the particular combination pairs used are selected based upon the particular method used to analyze the spectral data (e.g. diffuse reflectance, clear transmission, or diffuse transmission as discussed in the detailed description).
- the entries are contained in an external data file, so that the user may change the list to conform to his own needs and judgement as to what constitutes sensible transform pairs.
- the plurality of transforms applied to the spectral data includes at least a second derivative and a baseline correction.
- transforms include, but are not limited to the following: performing a normalization of the spectral data, performing a first derivative on the spectral data, performing a second derivative on the spectral data, performing a multiplicative scatter correction on the spectral data, in performing smoothing transforms on the spectral data.
- both the normalization transform and the multiplicative scatter correction transform inherently also perform baseline corrections.
- the normalization transform is combined with each of the first derivative, second derivative, and smoothing transforms; the first derivative transform is combined with the normalization, and smoothing transforms; the second derivative transform is combined with the normalization and smoothing transforms; the multiplicative scatter correction transform is combined with absorption-to-reflection, first derivative, second derivative, Kubelka-Munk, and smoothing transforms; the Kubelka-Munk transform is combined with the normalization, first derivative, second derivative, multiplicative scatter correction, and smoothing transforms; the smoothing transform is combined with the absorption-to-reflection, normalization, first derivative, second derivative, multiplicative scatter correction, and Kubelka-Munk transforms; and the absorption-to- reflection transform is combined with the normalization, first derivative, second derivative, multiplicative scatter correction, and smoothing transforms.
- the absorption-to- reflection transform is combined with the normalization, first derivative, second derivative, multiplicative scatter correction, and smoothing transforms.
- the plurality of transforms applied to the spectral data may further include performing a Kubelka-Munk function, performing a Savitsky- Golay first derivative, performing a Savitsky-Golay second derivative, performing a mean-centering, or performing a conversion from reflectance/transmittance to absorbance.
- the data transforms include performing a second derivative on the spectral data; and performing a normalization, a multiplicative scatter correction or a smoothing transform of the spectral data.
- the data transforms include performing a normalization of the spectral data; and a smoothing transform, a Savitsky-Golay first derivative, or a Savitsky-Golay second derivative of the spectral data.
- the data transforms include performing a first derivative of the spectral data; and a normalization, a multiplicative scatter correction, or a smoothing transform on the spectral data.
- the plurality of data transforms in the embodiments described above may also include a ratio transform, wherein the ratio transform includes a numerator and a denominator and wherein at least one of the numerator and the denominator is another transform.
- the numerator comprises one of a baseline correction, a normalization, a multiplicative scatter correction, a smoothing transform, a Kubelka- Munk function, or conversion from reflectance/transmittance to absorbance when the denominator comprises a baseline correction;
- the numerator comprises a normalization when the denominator comprises a normalization;
- the numerator comprises a first derivative when the denominator comprises a first derivative;
- the numerator comprises a second derivative when the denominator comprises a second derivative;
- the numerator comprises a multiplicative scatter correction when the denominator comprises a multiplicative scatter correction;
- the numerator comprises a Kubelka-Munk f ⁇ nction when the denominator comprises
- One or more of a partial least squares, a principal component regression, a neural net, a classical least squares (often abbreviated CLS, and sometimes called The K-matrix Algorithm) or a multiple linear regression analysis (MLR calculations may, for example, be performed using software from The Near Infrared Research Corporation, 21 Terrace Avenue, Suffern, N.Y. 10901) are then performed on the transformed and untransformed (i.e. NULL transform) calibration data sub-sets to obtain corresponding modeling equations for predicting the amount of the target substance in a sample.
- the partial least squares, principal component regression and multiple linear regression are performed on the transformed and untransformed calibration and validation data sets.
- the modeling equations are ranked to select a best model for analyzing the spectral data.
- the system determines, for each modeling equation, how closely the value returned by the modeling equation is to the constituent value(s) for the sample.
- the best modeling equation is the modeling equation which, across all of the samples in the validation sub-set, returned the closest values to the constituent values: i.e., the modeling equation which provided the best correlation to the constituent values.
- the values are ranked according to a Figure of Merit (described in equations 1 and 2 below).
- a method for generating a modeling equation comprising the steps of (a) operating an instrument so as to generate and store a spectral data set of diffuse reflectance, clear transmission, or diffuse transmission spectrum data points over a selected wavelength range, the spectral data set including spectral data for a plurality of samples; (b) generating and storing a constituent value for each of the plurality of samples, the constituent value being indicative of an amount of a target substance in its corresponding sample (c) dividing the spectral data set into a calibration sub-set and a validation sub-set; (d) transforming the spectral data in the calibration sub-set and the validation sub-set by applying a plurality of a first mathematical functions to the calibration sub-set and the validation sub-set to obtain a plurality of transformed validation data sub-sets and a plurality of transformed calibration data sub-sets; (e) resolving each transformed calibration data sub-set in step (d) by at least one of a second
- co variance is defined, for the purposes of the present invention, as a measure of the tendency of two features to vary together. Where the variance is the average of the squared deviation of a feature from its mean, the covariance is the average of the products of the deviations of the feature values from their means.
- the covariance is a number that measures the dependence between two features.
- the covariance between features is graphed as data clusters.
- NULL transform is defined, for the purposes of the present invention as making no change to the data as originally collected.
- ABS2REFL transform is defined, for purposes of the present invention as converting absorbency to reflectance if the data was originally measured by reflectance, or converting absorbency to transmittance, if the data was originally measured by transmission (the mathematical operation being the same in either case).
- NORMALIZ transform is defined, for purposes of the present invention as a normalization transform normalization).
- the mean of each spectrum is subtracted from each wavelength's value for that spectrum, then each wavelength's value is divided by the standard deviation of the entire spectrum. The result is that each transformed spectrum has a mean of zero and a standard deviation of unity.
- BASECORR is defined, for purposes of the present invention as performing a baseline correction.
- the baseline correction shifts the background level of a measurable
- covariance is defined, for the purposes of the present invention, as a measure of the tendency of two features to vary together. Where the variance is the average of the squared deviation of a feature from its mean, the covariance is the average of the products of the deviations of the feature values from their means.
- the covariance is a number mat measures me dependence between two features.
- the covariance between features is graphed as data clusters.
- NULL transform is defined, for the purposes of the present invention as making no change to the data as originally collected.
- ABS2REFL transform is defined, for purposes of the present invention as converting absorbency to reflectance if the data was originally measured by reflectance, or converting absorbency to transmittance, if the data was originally measured by transmission (the mathematical operation being the same in either case).
- NORMALIZ transform is defined, for purposes of the present invention as a normalization transform normalization).
- the mean of each spectrum is subtracted from each wavelength's value for that spectrum, then each wavelength's value is divided by the standard deviation of the entire spectrum. The result is that each transformed spectrum has a mean of zero and a standard deviation of unity.
- BASECORR is defined, for purposes of the present invention as performing a baseline co ⁇ ection.
- the baseline correction shifts the background level of a measurable
- FLRSTDRN transform is defined, for purposes of the present invention as performing a first derivative in the following manner.
- An approximation to the first derivative of the spectrum is calculated by taking the first difference between data at nearby wavelengths.
- a spacing parameter together with the actual wavelength spacing in the data file controls how far apart the wavelengths used for this calculation are. Examples of spacing parameters include but are not limited to the values 1, 2, 4, 6, 9, 12, 15, 18, 21, and 25.
- a spacing value of 1 (unity) causes adjacent wavelengths to be used for the calculation.
- the resulting value of the derivative is assumed to co ⁇ espond to a wavelength halfway between the two wavelengths used in the computation. Since derivatives of wavelengths too near the ends of the spectrum cannot be computed, the spectrum is truncated to eliminate those wavelengths.
- the value of the spacing parameter is varied such that a FIRSTDRN transform includes a plurality of transforms, each having a different spacing parameter value.
- SEC ⁇ DDRN transform is defined, for purposes of the present invention as performing a second derivative by taking the second difference (i.e. the difference between data at nearby wavelengths of the FLRSTDRN) as an approximation to the second derivative.
- the spacing parameters, truncation, and other considerations described above with regard to the FIRSTDRN apply equally to the SEC ⁇ DDRN.
- the second derivative preferably includes variable spacing parameters.
- MULTSCAT transform is defined, for purposes of the present invention as Multiplicative Scatter Correction.
- spectra are rotated relative to each other by the effect of particle size on scattering. This is achieved for the spectrum of the i'th sample by fitting using a least squares equation
- MSC Multiplicative Scatter Correction
- SMOOTHNG transform smoothing
- a smoothing parameter specifies how many data points in the spectra are averaged together. Examples of values for smoothing parameters include but are not hmited to values of 2, 4, 8, 16, and 32.
- a smoothing value of 2 causes two adjacent wavelengths to be averaged together and the resulting value of the smoothed data is assumed to correspond to a wavelength halfway between the two end wavelengths used in the computation. Since wavelengths too near the ends of the spectrum cannot be computed, the spectrum is truncated to eliminate those wavelengths.
- the smoothing parameter value is varied such that a smoothing transform includes a plurality of smoothing transforms, each having a different smoothing parameter.
- the term KUBLMUNK transform is defined, for purposes of the present invention as a Kubelka-Munk transform.
- the Kubelka-Munk transform specifies a transform of the data corresponding to a theoretical study of the behavior of light in scattering samples, which specifies how the reflected light should vary as the composition of the samples vary.
- the transform is a two-step procedure: first the absorbency (log (1/R)) data is transformed to
- the reflectance is transformed to the Kubelka-Munk function.
- the Kubleka-Munk equation specifies that the absolute reflectance should be used, however the absolute reflectance is difficult to obtain.
- a more commonly used method uses the calculated reflectance of the sample (R 0 ). The calculated reflectance is obtained by measuring a diffuse reflector with high reflectance (as close to 100% reflectance as can be attained) and using this measurement to represent the radiation illuminating the sample.
- a more accurate value for the reflectance of the sample is obtained by using the actual reflectance of the reference standard.
- the value for reflectance can be set to unity, a known value other than unity, or if unknown, the value may be entered as an automatic variable value (similar to the smoothing and spacing parameters for the smoothing transform and derivative transform).
- RATIO transform is defined, for purposes of the present invention as a transform which divides a numerator by a denominator.
- the data to be used for numerator and denominator are separately and independently transformed. Neither numerator or denominator may itself be a ratio transform, but any other transform is permitted.
- MEANCNTR transform is defined, for the purposes of the present invention as a transform which calculates the mean of all the spectra in the data set computed, wavelength by wavelength. Then the difference of each individual spectrum from the mean spectrum is computed.
- SGDERTVl and SGDERJN2 transforms are defined, for the purposes of the present invention as transforms for smoothing fluctuations in the data using first derivative or second derivative respectively as described by the Savitsky-Golay method. Any order derivative is smoothed by applying a coefficient to the function. For example, a first derivative would have a coefficient value of 1, a second derivation would have a coefficient value of 2, and so on.
- These transforms preferably use variable spacing parameters in the
- Mahalanobis distance is defined, for purposes of this invention as the Mahalanobis distance:
- D 2 (X - meanX)'M(X - mean X)
- X is a multi-dimensional vector describing the value of the spectral data of a given sample at several wavelengths
- mean X is a multidimensional vector describing the mean value, at each of the wavelengths, of all of the samples in the calibration data set (the group mean)
- (X - mean X)' is the transpose of the matrix (X - mean X)
- M is a matrix describing the distance measures in space
- -D 2 is the square of the Mahalanobis distance between the given sample and the group mean of calibration data set. Mark,
- the system (and method) in accordance with the preferred embodiment described below is implemented as a software package which runs under a Desktop interpreter and uses the Chemometric Toolbox (Applied Chemometrics, 77 Beach Street, Sharon, Mass. 02067).
- the system allows a user to create and search through a large variety of data, to automatically perform multiple transforms on the data, and to automatically select the data giving the best results based on predetermined criteria.
- a manual mode is also available, which provides the operator with a method of quickly searching a large amount of data.
- the data may be selected from, but is not limited to, samples generated by agricultural processes including wheat data from a variety of wheat crops, process development samples including scale-up samples, raw materials samples or samples generated by biological processes including blood samples used in predicting clinical chemistry parameters (e.g. blood glucose levels).
- samples generated by agricultural processes including wheat data from a variety of wheat crops, process development samples including scale-up samples, raw materials samples or samples generated by biological processes including blood samples used in predicting clinical chemistry parameters (e.g. blood glucose levels).
- the program is started from the main Desktop command window by typing an appropriate command such as "ANALYZE".
- the "Data type Selection” window 99 will appear as shown in Figure 1. This window is divided roughly into two sets of functions: a set of primary functions
- the primary functions 100 allow the operator to choose the fundamental type of operation: automatic or manual search.
- the data type selections 110 allow the operator to specify options to be used and operations to be performed during the automatic search. Except for "User Name", these options are ignored if Manual operation is selected.
- the set of selections 100 is sub-divided into a set of data format selections 101 and a set of operation selections 102.
- the user may select from the following data formats: Vision data format, ASCLT data format, JCAMP data format, GRAMS data format and NSAS binary file output format.
- This selection allows the operator to specify the format co ⁇ esponding to the file containing the data to be analyzed.
- Also included in the data format selections 101 is a "Reanalyze results" option, which allows the user to reanalyze previously processed spectral data. This option will be described in more detail below.
- no default entry is provided, and if neither a data format nor "Reanalyze results" is selected, the program displays an error message and exits.
- the Vision data format refers to the data format used by the VISION program package, provided by FOSS/NTRSystems (FOSS/NTRSystems, Inc. 12101 Tech Road, Silver
- the VISION program has the capability of saving data in an ASCII file of a specified format with a .TXT file extension. Selecting the VISION data format from data format selection 101 causes the system package to accept, input and convert Vision format data files into the Desktop internal data structures that are compatible with the program package.
- JCAMP -DX is a public-domain standard data format created to ease the problem of transferring spectral information between otherwise incompatible data representations.
- the official specification for this standard format has been published (Applied Spectroscopy, 42(1), p.151 (1988)) and is provided as an auxiliary data format by many instrument manufacturers.
- GRAMS is a widely used software program provided by Galactic Industries (Galatic Industries, Corp., 395 Main Street, Salem, New Hampshire 03079). Among its features is a set of data converters that allow it to import many different proprietary data formats from instrument vendors, and even from other software programs.
- the data format for GRAMs is binary in nature. Files in this format carry the .SPC extension. It should be noted that files with the .SPC extension contain only spectral data. Constituent information about the samples in the file, which are needed to perform calibration calculations, are contained in an auxiliary file having the same name, with a .CFL extension. Both files must be present in the same directory in order for the system to perform calibration calculations using this format.
- NSAS binary file output format uses two files, one for the spectral data and one to contain the constituent information. These files have the same file name, the file containing the spectral data has the extension .DA and the file containing the constituent values has the extension .CN.
- ASCJJ format refers to a special simplified format for presenting data which may be selected for use with the preferred embodiment of the present invention.
- the format is defined as follows. A file containing ASCII data must contain only valid ASCLT
- each row of the file contains a set of related information. Each row must be terminated with a carriage rettm linefeed pair of characters. There should be no blank rows in the file.
- the first header row contains five numerical values: nspec numwave numconst firstwl lastwl, where "nspec” is the number of spectra contained in the file; “numwave” is the number of wavelengths representing each spectrum; “numconst” is the number of constituents associated with each spectrum; “firstwl” is the wavelength corresponding to the first spectral value in each data row; and “lastwl” is the wavelength corresponding to the last spectral value in each data row.
- the second header row contains a list of the names of the constituents whose values are in the dataset.
- the number of names must match the value specified in the first header row.
- Each name must be preceded by a space, and there may be no embedded spaces in any of the names: name(l) name(2)... name(numconsf).
- the data rows immediately follow the header rows. There is one data row for each spectrum; the number of data rows must match the value specified in the header.
- Each row contains three types of information: ID Spec(l)... Spec(numwave) Const(l)... Const(numconst), where:
- ID is any ASCLT identifier for the spectrum. It can not contain any embedded spaces;
- Spec represents the spectral data values in standard floating-point format. Each row must contain “numwave” values. If any values are missing, they may be represented by a zero. Each value must be preceded by at least one space.
- Const represents the constituent values in standard floating-point format. Each row must contain “numconst” values. If any values are missing, they may be represented by a zero. Each value must be preceded by at least one space.
- Figure 15 shows an exemplary data file in ASCLT format for an exemplary wheat data set.
- Figure 15 includes 31 spectra, each containing 176 readings over the wavelength range 1100 to 2500 nm, and 2 constituents whose names are "PROT1" (a first
- a constituent value is a reference value for the target substance in the sample which is measured by an independent measurement technique.
- two constituent values are used, which correspond to two measurements performed on the sample by the same instrument. It should be noted, however, that the multiple constituents could alternatively be measurements from different constituents (e.g., protein and moisture in wheat) or from different instruments, or different types of instruments, or different measurement techniques altogether.
- the ASCLT format provides a uniform format which a user can use if the data to be analyzed is not in any of the other supported formats. In this regard, the user can simply edit the data from another format to conform to the ASCII format described above.
- the "Reanalyze results" button may be selected instead of a data file format. This selection reuses the results from a previous automatic search, and an error will result if this is selected without having previously subjected a data file to an automatic search. If this option is selected, the data used will be subject to any previous (original) wavelength selections, sample selections, spectrum averaging, and will use the previously specified reference laboratory error value.
- the operations selections 102 include four choices: automatic quick search, automatic thorough search, manual control of model generation (which also includes a diagnostic capability) and Prediction.
- automatic quick search automatic thorough search
- manual control of model generation which also includes a diagnostic capability
- Prediction Prediction.
- the Quick and Thorough Automatic search options are used to perform an automatic search through the various combinations of data transformations and algorithms to produce an optimal modeling equation.
- the automatic quick search and the automatic thorough search perform all the transforms specified by the Data Type selection.
- the automatic quick search uses a default parameter value for transforms that require a parameter (e.g., derivative
- the manual search option allows user interaction and a quick search of the data for diagnosing data set (diagnostic capability), with attendant ability of the user to guide the process of generating a modeling equation.
- the prediction search option uses an existing modeling equation to predict the constituent values (i.e., the amount of the target substance in the sample) from an existing data file.
- the data type selections 110 are used to set various operational parameters for the automatic search procedure and (with the exception of the user name entry) are ignored when manual operation is selected.
- the user name entry 104.1 is used to identify the user of the program who generated a particular modeling equation file. This entry is copied into the header of each modeling equation file. If left blank, the user's name is replaced with the entry " ⁇ not specified>".
- the "user name” entry is the only entry in the data selection functions 110 that is operative for both the Automatic and Manual operation modes.
- the Comment field 104.2 allows the user to enter comments regarding search (for example, to identify the nature of the search). The contents of the comment field 104.2 are also copied into the header of each modeling equation file.
- the data type entry 105 identifies the manner in which the data being analyzed was collected.
- the method of data collection determines the transforms and transform pairs that make sense in a given situation.
- the set of data collection methods preferably includes:
- Diffuse reflectance abbreviated diffrefl
- Clear transmission abbreviated cleartrn
- Diffuse transmission abbreviated difftran
- Figures 16, 17, and 18 are matrix which illustrates which combinations of data transforms are used for diffuse reflectance, clear transmission, and diffuse transmission data collection methods. In the illustrated embodiment, the set of transform combinations is the same for each of these data collection methods.
- the transforms illustrated in Figures 16, 17 and 18, which are defined above, are NULLS, BASECORR, ABS2REFL, NORMALIZ , FIRSTDRV, SECNDDRV, MULTSCAT, KUBLMUNK, SMOOTHNG, RATIO, MEANCNTR, SGDERJV1, and SGDERJV2.
- the reference lab error entry 106 represents the standard deviation of the constituent values for the data.
- standard tests specific to certain known compounds which are performed to determine sample purity. Examples of such tests include but are not limited to standard error of analysis as provided by the National Standards Laboratory or laboratory specific method error determined by experimentation (e.g. moisture analysis, determination of heavy metal content of a sample, and standard chemical analysis).
- a spectroscopic calibration cannot be expected to correlate with the constituent values any better than the constituent values co ⁇ elate with a set of external and internal standards used for determining performance of these tests. Therefore, the reference laboratory error, which represents the degree to which the constituent values correlate with themselves, can be used as a criterion for selecting which modeling equations to use. If the reference laboratory e ⁇ or is unknown, other criteria can be used. The other criteria include but are not limited to F-test and statistical testing methods.
- the data Prior to performing an automatic search, the data is divided into calibration and
- the preselect data entry 107 is used to select whether this division is performed automatically or manually.
- the default value is "automatic”.
- the % in cahbration field 108 determines the approximate fraction of the total number of samples to be included in the calibration subset.
- the percentage of samples included in the calibration subset is preferably selected to provide, maximum robustness and variability and is dependent on the total number of samples analyzed.
- the remaining samples are included in the validation subset.
- the selection of which samples are included in which subset is made randomly. Therefore, on subsequent runs different samples may be in the two subsets. Samples for the set can be selected by randomly dividing the samples into subsets or by randomly assigning each sample a number between 0 and 1.
- the system then preferably set a cutoff value which corresponds to the % in calibration value (e.g.
- a cut-off value of 0.5 for % in calibration of 50% a cut-off value of 0.5 for % in calibration of 50%
- those samples falling below the value assigned to the calibration set and those falling at or above the value assigned to the validation set (or vice versa).
- the decision to group values equal to the cut-off value with the set of values "above" the cut-off value is arbitrary, and that values equal to the cut-off value could alternatively be grouped with the set of values below the cut-off value.
- the desired number of spectrally unique samples for a cahbration set may be selected from a large sample pool based on a Gauss- Jordan algorithm or Mahalanobis distance algorithm criterion.
- the Gauss- Jordan algorithm selects for spectral nonlinearities using the sample with the largest absolute value of absorbance ("A-sample”) as the sample-selection criterion. For example, to obtain a 50% calibration set from a sample set of 1000 samples using this algorithm, the A-sample, along with 499 samples with the value closest to the value of the A-sample would be selected for the calibration set.
- the Mahalanobis distance algorithm selects samples having the farthest position from the center of a circumscribed ellipse in a multidimensional distance. For example, to obtain a 50% calibration set from a sample set of 1000 samples, using this algorithm, the 500 samples with the greatest Mahalanobis distance would be selected for the cahbration
- the wavelength selection box 109 is selected when the user wishes to use only certain preselected portions of the available data spectra. If a check is entered in the box, then the operator is given the opportunity to select the wavelength ranges to use at a subsequent section of the program operation.
- the Spectrum averaging box 103 is checked when a user wishes to average together (wavelength by wavelength) several spectra from the same sample. This is sometimes useful, particularly if the spectra are noisy.
- a window opens labeled "# of spectra to average” in which the user can indicate how many spectra in the data file are to be averaged together to create each of the actual spectra used for the cahbration modeling. " The following criteria should be met before utilizing the spectrum averaging function:
- the spectra to be averaged together actually represent the same sample, even if they are measured from different aliquots of that sample.
- Each sample has been measured the same number of times, and that number of spectra, for each sample, included in the data file.
- the number entered in the "# of spectra to average" window equals the number of spectra actually collected for each sample, or left at the default value of unity.
- the system automatically checks that the total number of samples in the data file is an integer multiple of the number specified to be averaged together. However, as the system
- each spectrum in the data file is assumed to be the spectrum of a different sample.
- the Bias in FOM 111 box controls the calculation of the Figure of Merit used to compare the various modeling equations. If left unchecked, the Figure of Merit is calculated from the SEE and the SEP. If box 111 is checked, then a text-input box opens up, labeled "Bias weight" and the relative weight of the bias used to determine the Figure of Merit is entered in this box. The default value is unity (1), which causes it to have the same amount of weight in determining the Figure of Merit as the SEE. The formula used is described below.
- SEE is the standard deviation, corrected for degrees of freedom, for the residuals due to differences between actual values (which, in this context, are the constituent values) and the MR predicted values within the calibration set (which, in this context, are the values returned by applying the spectral data in the calibration sub-set (which corresponds to the constituent values) to the modeling equation for which the FOM is being calculated).
- SEP is the standard deviation for the residuals due to differences between actual values (which, in this context, are the constituent values) and the MR predicted values outside the calibration set (which, in this context, are the values returned by applying the spectral data in the validation sub-set (which corresponds to the constituent values) to the modeling equation for which the FOM is being calculated).
- the Select button 112 causes the system to proceed to the next step of program operation using the state of the options existing when the button is pressed, and the cancel button 113 clears the Desktop memory area and exits the program, returning the operator to the Desktop command window.
- a user enters one of Quick Automatic, Thorough Automatic, Manual or Predict modes by making the appropriate selection from the select operation menu 102.
- Selection of automatic mode causes the system to perform an automatic search through multiple data transforms, algorithms and parameters, and to select the best modeling equation as described below.
- the computer first displays a sequence of standard file selection windows.
- the first file selection menu prompts the user to select the directory and file name (assigned by the user) containing the input spectral data.
- OPEN is pressed from this window, an OUTPUT FILE SELECTION window appears.
- the user then enters the directory and file name to receive the modeling equation and other results.
- the Save button is then pressed.
- a Constituent Selection window 200 will then appear as illustrated in Figure 2. From this window, the user selects one or more constituents for calibration from the select constituents menu 201. If the data file contains more than one constituent, the constituent list will also appear in an "AUX/Indicator variables" menu 202. The user can select any of the constituents as an auxiliary or indicator variable as long as that constituent is not also selected in the select constituents menu 201. Duplication will result in an error. Contiguous sets of constituents may be selected by pressing and holding down the left-hand mouse button while running the cursor over the desired constituents. Noncontiguous sets of constituents may be selected by pressing down the
- Selection of automatic mode may also cause other input windows to appear, depending upon which options were activated from the Data type Selection window. These windows are listed below in the order in which they appear; keeping in mind that depending on the setting selected in previous windows, not all these will appear in any given program run.
- wavelength selection window 205 Figures 3(a-b)
- wavelength selection can be performed either graphically or numerically.
- graphical selections and manual selections may be freely combined.
- one or more wavelength ranges may be graphically selected for analysis by clicking the left mouse button while the cursor is at the wavelength in the spectrum corresponding to the beginning of the desired wavelength range. As the cursor is moved over the spectrum, the selected wavelengths are indicated by having the area underneath them filled in, as shown in figure 3B. The left mouse button is then pressed a second time to select the end of the desired wavelength range. If multiple wavelength ranges overlap or are contiguous, they will be combined into a single wavelength range.
- the user wishes to enter the wavelength range(s) manually, he or she presses the "Manual" button 207 on the Wavelength selection window 205. This causes the manual selection window 210 to be displayed as shown in Figure 4. From window 210, the user can manually type in numerical values for the lower and upper limits for each desired wavelength range. Referring to Figure 4, the manual selections are entered in contiguous fields beginning with the top fields 211. In the illustrated embodiment, the manual entry window provides for entering only eight wavelength ranges. If the user wishes to enter the wavelength range(s) manually, he or she presses the "Manual" button 207 on the Wavelength selection window 205. This causes the manual selection window 210 to be displayed as shown in Figure 4. From window 210, the user can manually type in numerical values for the lower and upper limits for each desired wavelength range. Referring to Figure 4, the manual selections are entered in contiguous fields beginning with the top fields 211. In the illustrated embodiment, the manual entry window provides for entering only eight wavelength ranges. If the user
- pressing the "Cancel” button 214 from the Manual Wavelength Selection window 210 will close that window and discard any entries made, returning operation to the graphical input mode.
- Pressing the "Graph” button 213 from the Manual Wavelength Selection window 210 in contrast, will return operation to the graphical wavelength selection window 205 without discarding the manual entries previously made.
- the manually entered wavelength range values will be retained and added to those selected graphically.
- pressing the "Select" button 212 from the Manual Wavelength Selection window 210 will close that window and add any numerical values entered to the list of wavelength ranges to be used in the search.
- the manual sample selection window 230 allows the user to select particular samples (and only those samples) to be used for the calibration and validation subsets.
- the corresponding sample ID for each sample in the data file is displayed in both the calibration samples box 231 and the validation samples box 232.
- Contiguous sets of samples may be selected for inclusion in the validation or calibration subsets by pressing and holding down the left-hand mouse button while running the cursor over the desired samples.
- Noncontiguous sets of samples may be selected by releasing the mouse button, moving the cursor to the desired sample LD and pressing down the CONTROL key before pressing the left mouse button again to select the noncontiguous samples.
- no sample is included in both the calibration subset and the validation subset
- Pressing the Select button 233 from the window 230 continues program operation with the commencement of the automatic search, and pressing the cancel button 234 clears the Desktop memory area and exits the program, returning the operator to the Desktop command window.
- the manual sample selection window 230 (if active) is the last window that the operator will observe before the program enters the automatic search functions. If window 230 is not active, the last window observed will be one of the windows 200, 205, and 210, depending on the particular selections made in the data type selection window 99 of Figure 1.
- the system displays progress reports to indicate that the program is indeed active, and also to allow the operator to estimate how far along the search sequence the computer is at any given time.
- the search process begins. Initially, the computer performs the sample selection, data and wavelength editing specified during the configuration process, and then begins the search process.
- the data transforms determined by the specifications contained in the Data type window 99 are executed on the calibration sub-set and on the validation sub-set.
- transforms are performed during the automatic search. Rather, only those pairs which make chemical, spectroscopic and/or physical sense are performed.
- the particular transform pairs which "make sense" in a given spectroscopic situation are selected based on the method of data collection, i.e. those pairs that make chemical spectroscopic and physical sense for Diffuse Reflectance, Clear Transmission, or Diffuse Transmission.
- the Kubelka-Munk transform would normally be used only in conjunction with diffuse reflectance measurements.
- MEANCTR transform normally used as an early step in most cahbration algorithms, it is not included in the automatic search protocol. Instead, this transform is preferably used only for visual inspection of the data during manual calibration/troubleshooting operation.
- any pair of transforms may be performed in succession (whether they "make sense” or not). The user is warned, however, that inappropriate transforms may result in, for example, divide-by-zero errors, or other anomalous results).
- Some of the transforms particularly the smoothing transform and derivative transforms, have parameters (e.g.. spacing of data points for the first derivative, second derivative, Savitsky-Golay first and second derivative and smoothing transforms) associated with them.
- parameters e.g.. spacing of data points for the first derivative, second derivative, Savitsky-Golay first and second derivative and smoothing transforms.
- the smoothing transform can be used as transform #1 251, transform #2 252, or both a transform#l and transform #2. If the smoothing transform is used for both the first and second transform then both transforms share the same value of the smoothing factor. If a first derivative or second derivative transform is used for both transforms, or if one of each type of derivative transform is used for each transform in either order, then they share the same value of the spacing parameter.
- the smoothing transform or the first derivative and second derivative are used as part of a ratio transform, either can appear in any of four places: as the numerator or denominator transform of either the first or second ratio transform. Regardless of where or how many times the smoothing transform or the derivative transform appears as part of a ratio transform, the same value of the smoothing parameter or spacing parameter (which may have the same or different values as the corresponding parameter for the first or second transform taken from the same set of available values), are used as a transform.
- the first derivative or second derivative can be used as transform #1, transform #2, or both a first and second transform. If the first derivative and second derivative are used for both the first or second transform, or if one of each type of derivative transform is used for each transform in either order, then the derivatives share the same value of the spacing parameter.
- the data is divided into calibration and validation subsets. As described above, this division may be performed automatically, with the computer randomly selecting which samples to include in each subset. Alternatively, the division of the data may be performed manually, with the operator selecting the samples to include in each subset.
- the algorithms include PLS (partial least squares), PCR (principal component regression), and MLR (multiple linear regression), thereby producing three modeling equations for each set of transformed cahbration data.
- Additional algorithms may include artificial neural networks, selections based on Mahalanobis Distances or through use of the Gauss- Jordan algorithm.
- Each of the modeling equations is then applied to the validation data sub-set.
- the validation data applied to each modeling equation has been transformed in the same manner as the calibration data which generated that modeling equation.
- the "best" modeling equation is selected on the basis of a "Figure of Merit", which is computed using a weighted sum of the SEE and SEP, the SEP being given twice the weight of the SEE.
- This Figure Of Merit (FOM) is also displayed in the manual mode, along with the SEE and SEP.
- the FOM is calculated using one of two equations as follows, depending on whether the "Bias in FOM" box is checked:
- FOM -J ⁇ SEE 2 + 2 * SEP 2 ) / 3
- FOM ⁇ (SEE 2 + 2 * SEE 2 + W*fr 2 ) / ⁇ 3 + W)
- SEE is the Standard Error of Estimate from the calculations on the
- SEP is the Standard Error of Estimate from the calculations on the validation data
- bias of the validation data bias being the mean difference between the predicted values and corresponding constituent values for the sample
- W is the weighting transform for the bias, entered in the "Bias weight” box that is displayed when the bias box 103 is checked.
- bias (b) is calculated for a set of 14 validation samples:
- the bias (b) is the mean residual, -0.0712 in this case. It may be computed either as the mean of all the individual residual values, or as the difference between the means of the actual (reference) values and predicted (by the instrument/model) values.
- the parameter values and the performance statistics are all saved internally.
- the results are sorted according to the Figure of Merit (FOM), and the modeling equation corresponding to the data transform and algorithm providing the lowest value for the FOM is determined, and designated as the best modeling equation.
- This file is then saved as an equation file (.EQN extension) using the file name specified from the OUTPUT FILE SELECTION window described below.
- the intermediate results are saved in a file having the same name as the equation file, but having a .MAT extension. This file is required to be present in the same subdirectory as the equation file if the "Reanalyze" selection of the Data Type window 99 is to be used.
- Manual mode may be used to manually perform a search on a formatted data file. It is also used in conjunction with the Reanalyze function on the Data type selection window 99. If a data file format is specified then the search is performed as described under "Automatic" operation, except no results are saved until a manual save is performed. If “Reanalyze” is selected, then the previously saved results from the search are reloaded and used. This saves considerable time as compared to re-running the search each time. In addition, as described below, the "manual" reanalyze function allows the user to add models to the models from the previously executed automatic search. As described previously, when Manual mode is selected, all entries in the right-hand side of the Data Type window are inactive, except User Name.
- 35 SELECTION window From this window, the user specifies the file to reload.
- the file is used for both input and output.
- the manual functions also allows the operator to add models to the ones resulting from the previously executed automatic search. Thus, at this point a previously created file containing the models from the automatic search is selected.
- the computer then reloads the file and displays a PLOT CONTROL window 240.
- the plot control window 240 is the primary gateway to the functions and operations of Manual mode. This window is divided into five main sections: Result Window 245, Transform Control 242, Algorithm Control 243, Prior Result Control 241, Operation Control Buttons 244. Transform parameters and functions become visible as they are required.
- Figure 6 shows the window with no expansion for Transform #1 or for Transform #2.
- Figure 7 shows a single level of expansion for Transform #1 and maximum expansion of Transform #2.
- the system at all times, internally contains a modeling equation, which can be regarded as the current modeling equation. This modeling equation is updated as required.
- the Result Window 245 displays six performance statistics which correspond to the current modeling equation for the constituent that is displayed in the "Constituents" selection window 247 of the Algorithm Control section 243: the SEE for the calibration data, the Correlation Coefficient for the calibration data (R(cal)), the SEP for the validation data, the Correlation Coefficient for the validation data (R(val)), the bias of the predictions on the validation data; and the Figure of Merit. This provides a capsule summary of the performance statistics for the current modeling equation.
- the transform control section 242 shows the data transform apphed to the calibration and validation spectral data at the top of this section, and the current value of any required parameters are displayed below. Since this is a manual mode window, any of the transforms described above can be selected as the first or second transform. There is no requirement (as there is in the automatic search) that the transforms or combinations thereof that "make sense" for a given set of data. As illustrated in Figure 6(a), if a transform does not require any parameters, then none are shown. Referring to Figure
- selection box(es) are provided in the window 242 to allow the user to select or otherwise input values for the appropriate parameters.
- Each of the selection boxes for the parameters indicated may be scrolled through all their allowed values independently.
- the internal current modeling equation is updated to reflect the modeling equation corresponding to the new transform, and the statistical summary presented in the Result Window is also updated to present the performance statistics corresponding to the new modeling equation.
- the algorithm control section 243 is displayed as two noncontiguous sections of the window 240 which contain the following related features: Algorithm selection 248, Constituent selection 247, and the Number of Factors selection 249.
- the Number of Factors selection box 249 allows the user to choose the number of factors (for PLS or PCR cahbration calculations) or the number of wavelengths (for MLR cahbration calculations) to use in calculating the modeling equation.
- the Algorithm selection box 248 provides the user with the capability of determining which of the available algorithms are to be used to create a modeling equation from the data (as transformed according to the specifications of the Transform Confrol section). As discussed above, the algorithms preferably include: PLS, PCR ,and MLR. Each time a new algorithm is selected, the internal modeling equation is updated according to the data transform specifications and number of factors, and the statistics displayed in the Result Window are updated accordingly.
- the user can have the results for any of the constituents selected for calibration displayed in the Result Window 245. Selecting a different constituent also updates the entries in the Prior Result Control section 241.
- the results from the automatic search are sorted according to the Figure of Merit, and then the modeling equation corresponding to the transform and algorithm resulting in the lowest value of the Figure of Merit is saved in the .EQN file.
- the entries in the Plot Control window 240 i.e., the default entries
- the data transforms 251, 253 , parameter values 253, 254, algorithm 248, number of factors 249 and performance statistics 245 are set to correspond to the modeling equation saved in the EQN file of re-analyzed data.
- the save equation button 256 is used to save the current modeling equation.
- the current modeling equation is the model which corresponds to the settings of the Transform Control section 242 and Algorithms Confrol section 243 of the plot confrol window 240.
- the modeling equation is saved in the ASCLT file corresponding to the specified file name.
- the modeling equation is saved at the end of the file, and is added to any modeling equations already saved in that file.
- the Save Equation function should not be used if the file specified is open to any other program, including the Edit Equation function .
- the exit button 255 causes the system to return to the desktop control window.
- the Edit Equation button 257 invokes the text editor and automatically loads the file containing the modeling equations selected by the operator.
- the Plot loading button 258 is used to invoke a plotting function
- Selecting the Plot calib. data button 259 from Figure 6 causes the system to plot all the spectra in the calibration data set.
- the plotted cahbration data is transformed according to the specifications set forth in the Transform Control section 242.
- An exemplary plot is presented in figure 10, wherein the cahbration data is transformed to a first derivative.
- the sample ID for that spectrum is displayed in a text box.
- the ID for the spectrum nearest the cursor is displayed.
- Selecting the Calib mean, SD button 261 causes the system to plot the mean and standard deviation of the cahbration/validation spectral data after the data has been transformed according to the specifications in the data transform section 242 of the Plot Control window 240.
- Figure 11 shows an illustrative plot from the test wheat data set of Figure 15 with a first derivative fransform applied.
- FIG. 39 Selecting the Calib Corr Plot button 263 (or valid Corr Plot 264) causes the system to plot the correlation coefficient between the constituent values and the cahbration/validation data, transformed according to the specifications in the Transform Control section 242.
- Figure 12 shows an exemplary plot using wheat test data of Figure 15 transformed to a first derivative.
- the Scatterplots section 246 of Figure 6 includes a calibration button 265, a validation button 266, an x-variable box 267, and a y-variable box 268.
- the x-variable and y- variable boxes indicate the variable to be plotted along the corresponding axis.
- the user may select one of the following values: Actual, Predicted, Residual, Scores, Mahal Dist, or Seq. No.
- Selection of the calibration button 265 or the validation button 266 causes the corresponding data set to be scatterplotted with the x and y variables contained in the boxes 267 and 268.
- the plots are made on a sample-by-sample basis.
- the value of the variable indicated in the "X-axis” window is plotted against the value of the variable indicated in the "Y-axis” window. If the mouse button is pressed while the cursor is within the boundaries of the plot area, the sample ID corresponding to the plotted data closest to the cursor is displayed..
- two lines are plotted encompassing the data. These lines are constructed at +/- 2 times the SEE of the calibration model from the 45-degree line (which itself is not shown) representing perfect matching of the predicted and actual values.
- the "Actual" value is the value of the constituent indicated in the Constituent window of the Plot Control window
- the Predicted value is the value of the constituent as calculated by the current modeling equation, using the data as fransformed according to the specifications in the Transform Control section of the window, and the model as specified by the algorithm section
- the residual value is the difference between the Actual and Predicted values as described above.
- the Mahalanobis value refers to the distance of the sample, based on the data as
- the Seq. No. refers to the order in which the samples are present in the data; usually the order in which they were measured.
- the scores value is dependent on the algorithm used to create the model in use. If the algorithm is PLS or PCR, then the scores corresponding to the factor number indicated in the "# of factors" window are used for the corresponding axis. If the algorithm is MLR then the data, transformed according to the specifications in the Transform Confrol section of the window are used for the corresponding axis, with the wavelength chosen being the one indicated in the "# of Factors" window.
- selecting the Prediction function from the select operation menu 102 allows the user to apply a modeling equation to a set of spectroscopic data in order to calculate the analytical values for the constituents (i.e. target substance of the sample which generated the spectroscopic data) corresponding to the modeling equation.
- Predict is selected from the Data Type Selection window 99, most of the menu 99 is inactive, with only the various input data formats 100 are available for selection.
- the select button 112 actuated, the user is prompted, via three file-selection screens, to select a data file, a modeling equation file, and a results file.
- the data in the data file is subjected to the same data transformations as the data which gave rise to the modeling equations in the modeling equation file.
- the system displays a constituent selection window 300 ( Figure 14), which prompts the user to specify which constituent in the data file is to be compared to each of the constituents predicted by the modeling equation.
- the screen shown in figure 14 is presented multiple times: once for each modeling equation in the modeling equation file.
- the header line 301 identifies which constituent from the modeling equation file is
- the entries in the select constituent box 302 are the selections from the data file to compare the predicted value to. If the selection ⁇ none> is chosen, then the predicted value is written to the result file without a comparison value from the data file, and without residuals, or any other calculated statistics.
- the user in order to predict the amount of the target substance (i.e. constituent) in a future sample (for which the constituent value is not known), the user simply selects "none" in the constituent box 302, and the system returns the predicted value for the amount of the constituent in the same.
Landscapes
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Investigating Or Analysing Materials By Optical Means (AREA)
Abstract
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/344,333 US20040064299A1 (en) | 2001-08-10 | 2001-08-10 | Automated system and method for spectroscopic analysis |
AU2001286439A AU2001286439A1 (en) | 2000-08-10 | 2001-08-10 | Automated system and method for spectroscopic analysis |
EP01965883A EP1315953A4 (fr) | 2000-08-10 | 2001-08-10 | Systeme et procede automatises pour analyse spectroscopique |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/636,041 | 2000-08-10 | ||
US09/636,041 US6549861B1 (en) | 2000-08-10 | 2000-08-10 | Automated system and method for spectroscopic analysis |
US22663700P | 2000-08-21 | 2000-08-21 | |
US60/226,637 | 2000-08-21 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2002014812A1 true WO2002014812A1 (fr) | 2002-02-21 |
Family
ID=26920723
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2001/025165 WO2002014812A1 (fr) | 2000-08-10 | 2001-08-10 | Systeme et procede automatises pour analyse spectroscopique |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1315953A4 (fr) |
AU (1) | AU2001286439A1 (fr) |
WO (1) | WO2002014812A1 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1353156A2 (fr) * | 2002-04-12 | 2003-10-15 | Toyota Jidosha Kabushiki Kaisha | Méthode d'estimation d'une réflectance |
WO2012049666A2 (fr) | 2010-10-15 | 2012-04-19 | Verrana, Llc | Analyse de mots de données par spectroscopie |
CN116008208A (zh) * | 2023-03-27 | 2023-04-25 | 山东省科学院海洋仪器仪表研究所 | 一种海水硝酸盐浓度特征光谱波段的选择方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5243546A (en) * | 1991-01-10 | 1993-09-07 | Ashland Oil, Inc. | Spectroscopic instrument calibration |
US5610836A (en) * | 1996-01-31 | 1997-03-11 | Eastman Chemical Company | Process to use multivariate signal responses to analyze a sample |
US5830133A (en) * | 1989-09-18 | 1998-11-03 | Minnesota Mining And Manufacturing Company | Characterizing biological matter in a dynamic condition using near infrared spectroscopy |
US6115673A (en) * | 1997-08-14 | 2000-09-05 | Instrumentation Metrics, Inc. | Method and apparatus for generating basis sets for use in spectroscopic analysis |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5435309A (en) * | 1993-08-10 | 1995-07-25 | Thomas; Edward V. | Systematic wavelength selection for improved multivariate spectral analysis |
US5641962A (en) * | 1995-12-05 | 1997-06-24 | Exxon Research And Engineering Company | Non linear multivariate infrared analysis method (LAW362) |
US5606164A (en) * | 1996-01-16 | 1997-02-25 | Boehringer Mannheim Corporation | Method and apparatus for biological fluid analyte concentration measurement using generalized distance outlier detection |
US6223133B1 (en) * | 1999-05-14 | 2001-04-24 | Exxon Research And Engineering Company | Method for optimizing multivariate calibrations |
-
2001
- 2001-08-10 WO PCT/US2001/025165 patent/WO2002014812A1/fr not_active Application Discontinuation
- 2001-08-10 AU AU2001286439A patent/AU2001286439A1/en not_active Abandoned
- 2001-08-10 EP EP01965883A patent/EP1315953A4/fr not_active Withdrawn
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5830133A (en) * | 1989-09-18 | 1998-11-03 | Minnesota Mining And Manufacturing Company | Characterizing biological matter in a dynamic condition using near infrared spectroscopy |
US5243546A (en) * | 1991-01-10 | 1993-09-07 | Ashland Oil, Inc. | Spectroscopic instrument calibration |
US5610836A (en) * | 1996-01-31 | 1997-03-11 | Eastman Chemical Company | Process to use multivariate signal responses to analyze a sample |
US6115673A (en) * | 1997-08-14 | 2000-09-05 | Instrumentation Metrics, Inc. | Method and apparatus for generating basis sets for use in spectroscopic analysis |
Non-Patent Citations (1)
Title |
---|
See also references of EP1315953A4 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1353156A2 (fr) * | 2002-04-12 | 2003-10-15 | Toyota Jidosha Kabushiki Kaisha | Méthode d'estimation d'une réflectance |
EP1353156A3 (fr) * | 2002-04-12 | 2004-05-12 | Toyota Jidosha Kabushiki Kaisha | Méthode d'estimation d'une réflectance |
US7283244B2 (en) | 2002-04-12 | 2007-10-16 | Toyota Jidosha Kabushiki Kaisha | Reflectance estimating method |
WO2012049666A2 (fr) | 2010-10-15 | 2012-04-19 | Verrana, Llc | Analyse de mots de données par spectroscopie |
US8517274B2 (en) | 2010-10-15 | 2013-08-27 | Verrana Llc | Data word analysis by spectroscopy |
CN116008208A (zh) * | 2023-03-27 | 2023-04-25 | 山东省科学院海洋仪器仪表研究所 | 一种海水硝酸盐浓度特征光谱波段的选择方法 |
Also Published As
Publication number | Publication date |
---|---|
EP1315953A1 (fr) | 2003-06-04 |
EP1315953A4 (fr) | 2005-03-02 |
AU2001286439A1 (en) | 2002-02-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6549861B1 (en) | Automated system and method for spectroscopic analysis | |
US20040064299A1 (en) | Automated system and method for spectroscopic analysis | |
US7194369B2 (en) | On-site analysis system with central processor and method of analyzing | |
US6560546B1 (en) | Remote analysis system | |
US6791674B2 (en) | Analytical method and apparatus for blood using near infrared spectroscopy | |
US8494818B2 (en) | Analyzing spectral data for the selection of a calibration model | |
EP3236245B1 (fr) | Système d'analyse d'échantillons | |
AU2002318275A1 (en) | On-site analysis system with central processor and method of analysing | |
Swierenga et al. | Comparison of two different approaches toward model transferability in NIR spectroscopy | |
Chalus et al. | Near-infrared determination of active substance content in intact low-dosage tablets | |
US20060190137A1 (en) | Chemometric modeling software | |
Chiappini et al. | MVC1_GUI: A MATLAB graphical user interface for first-order multivariate calibration. An upgrade including artificial neural networks modelling | |
CN106932360A (zh) | 便携式近红外光谱食品快速检测与建模一体化系统和方法 | |
Eliaerts et al. | Comparison of spectroscopic techniques combined with chemometrics for cocaine powder analysis | |
US8108170B1 (en) | Method and system for increasing optical instrument calibration and prediction accuracy within and across different optical instrument platforms | |
EP0578798A1 (fr) | Appareil visant a analyser un prelevement medical. | |
EP1315953A1 (fr) | Systeme et procede automatises pour analyse spectroscopique | |
Dreassi et al. | Transfer of calibration in near-infrared reflectance spectrometry | |
Blanco et al. | Wavelength calibration transfer between diode array UV-visible spectrophotometers | |
Dunko et al. | Moisture assay of an antifungal by near-infrared diffuse reflectance spectroscopy | |
Chaminade et al. | Data treatment in near infrared spectroscopy | |
JP2002506991A (ja) | 自動較正方法 | |
Minet et al. | Local vs global methods applied to large near infrared databases covering high variability | |
US7378283B2 (en) | Reaction monitoring of chiral molecules using fourier transform infrared vibrational circular dichroism spectroscopy | |
Zhang et al. | Spectral noise-to-signal ratio priority method with application for visible and near-infrared analysis of whole blood viscosity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2001965883 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2001965883 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10344333 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2001965883 Country of ref document: EP |