US5446681A  Method of estimating property and/or composition data of a test sample  Google Patents
Method of estimating property and/or composition data of a test sample Download PDFInfo
 Publication number
 US5446681A US5446681A US08300016 US30001694A US5446681A US 5446681 A US5446681 A US 5446681A US 08300016 US08300016 US 08300016 US 30001694 A US30001694 A US 30001694A US 5446681 A US5446681 A US 5446681A
 Authority
 US
 Grant status
 Grant
 Patent type
 Prior art keywords
 sup
 sub
 matrix
 sample
 model
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Expired  Lifetime
Links
Images
Classifications

 G—PHYSICS
 G01—MEASURING; TESTING
 G01R—MEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
 G01R35/00—Testing or calibrating of apparatus covered by the preceding groups
 G01R35/005—Calibrating; Standards or reference devices, e.g. voltage or resistance standards, "golden" references

 G—PHYSICS
 G01—MEASURING; TESTING
 G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
 G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using infrared, visible or ultraviolet light
 G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
 G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
 G01N21/31—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
 G01N21/35—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
 G01N21/359—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light

 G—PHYSICS
 G01—MEASURING; TESTING
 G01R—MEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
 G01R23/00—Arrangements for measuring frequencies; Arrangements for analysing frequency spectra
 G01R23/16—Spectrum analysis; Fourier analysis

 G—PHYSICS
 G01—MEASURING; TESTING
 G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
 G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using infrared, visible or ultraviolet light
 G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
 G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
 G01N21/31—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
 G01N21/35—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
 G01N21/3577—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light for analysing liquids, e.g. polluted water

 G—PHYSICS
 G01—MEASURING; TESTING
 G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
 G01N33/00—Investigating or analysing materials by specific methods not covered by the preceding groups
 G01N33/22—Fuels, explosives
Abstract
A method of operating a spectrometer to determine property and/or composition data of a sample comprises an online spectral measurement of the sample using a computer controlled spectrometer, statistical analysis of the sample data based upon a statistical model using sample calibration data, and automatically identifying a sample if necessary based upon statistical and expert system (rulebased) criteria. Sample collection based upon statistical criteria consists of performing a statistical or rulebased check against the model to identify measurement data which are indicative of chemical species not in the samples already stored in the model. Samples identified either by a statistical criteria or using an expert system (rulebased decisions) are used preferably to isolate the liquid sample using a computer controlled automatic sampling system. The samples can be saved for subsequent laboratory analysis in removable sample containers. The results of the laboratory analysis together with the online spectroscopic measurements are preferably used to update the model being used for the analysis.
Description
This is a continuation of application Ser. No. 07990715, filed on Dec. 15, 1992, which is a continuation of Ser. No. 07/596,435 filed on Oct. 12, 1990 which are both abandoned.
This invention relates to a method of estimating unknown property and/or composition data (also referred to herein as "parameters") of a sample. Examples of property and composition data are chemical composition measurements (such as the concentration of individual chemical components as, for example, benzene, toluene, xylene, or the concentrations of a class of compounds as, for example, paraffins), physical property measurements (such as density, index of refraction, hardness, viscosity, flash point, pour point, vapor pressure), performance property measurement (such as octane number, cetane number, combustibility), and perception (smell/odor, color).
The infrared (12500400 cm^{1}) spectrum of a substance contains absorption features due to the molecular vibrations of the constituent molecules. The absorptions arise from both fundamentals (single quantum transitions occurring in the midinfrared region from 4000400 cm^{1}) and combination bands and overtones (multiple quanta transitions occurring in the mid and the nearinfrared region from 125004000 cm^{1}). The position (frequency or wavelength) of these absorptions contain information as to the types of molecular structures that are present in the material, and the intensity of the absorptions contains information about the amounts of the molecular types that are present. To use the information in the spectra for the purpose of identifying and quantifying either components or properties requires that a calibration be performed to establish the relationship between the absorbances and the component or property that is to be estimated. For complex mixtures, where considerable overlap between the absorptions of individual constituents occurs, such calibrations must be accomplished using multivariate data analysis methods.
In complex mixtures, each constituent generally gives rise to multiple absorption features corresponding to different vibrational motions. The intensities of these absorptions will all vary together in a linear fashion as the concentration of the constituent varies. Such features are said to have intensities which are correlated in the frequency (or wavelength) domain. This correlation allows these absorptions to be mathematically distinguished from random spectral measurement noise which shows no such correlation. The linear algebra computations which separate the correlated absorbance signals from the spectral noise form the basis for techniques such as Principal Components Regression (PCR) and Partial Least Squares (PLS). As is well known in the art, PCR is essentially the analytical mathematical procedure of Principal Components Analysis (PCA) followed by regression analysis. Reference is directed to "An Introduction to Multivariate Calibration and Analysis", Analytical Chemistry, Vol. 59, No. 17, Sep. 1, 1987, pages 1007 to 1017, for an introduction to multiple linear regression (MLR), PCR and PLS.
PCR and PLS have been used to estimate elemental and chemical compositions and to a lesser extent physical or thermodynamic properties of solids and liquids based on their midor nearinfrared spectra. These methods involve: [1] the collection of mid or nearinfrared spectra of a set of representative samples; [2] mathematical treatment of the spectral data to extract the Principal Components or latent variables (e.g. the correlated absorbance signals described above); and [3] regression of these spectral variables against composition and/or property data to build a multivariate model. The analysis of new samples then involves the collection of their spectra, the decomposition of the spectra in terms of the spectral variables, and the application of the regression equation to calculate the composition/properties.
Providing the components of the sample under test are included in the calibration samples used to build the predictive model, then, within the limits of the inherent accuracy of the predictions obtainable from the model, an accurate estimate of the property and/or composition data of the test sample will be obtained from its measured spectrum. However, if one or more of the components of the test sample are not included in the calibration samples on which the model is based, then prediction of the property and/or composition data will be inaccurate, because the predictive model produces a "best fit" of the calibration data to the test sample where some of the calibration data is inappropriate for that test sample. The present invention addresses, and seeks to overcome, this problem.
The method of the present invention is for estimating property and/or composition data of a test sample. An application of particular practical importance is the analysis of hydrocarbon test samples or for ascertaining the hydrocarbon content of a hydrocarbon/water mixture, whether in phase separated or emulsion form. The inventive method involves a number of steps. Firstly, a spectral measurement is performed on the test sample. Then, property and/or composition data of the test sample can be estimated from its measured spectrum on the basis of a predictive model correlating calibration sample spectra to known property and/or composistion data of those calibration samples. In the present method, a determination is made, on the basis of a check of the measured spectrum against the predictive model, as to whether or not the measured spectrum is within the range of calibration sample spectra in the model. If the result of the check is negative, a response is generated, accordingly.
In this way, if the result of the check is affirmative (i.e., the measured spectrum is indicative of chemical compositions encompassed by the samples included in the predictive model), then the person performing the analysis can be satisfied that the corresponding property and/or composition prediction is likely to be accurate (of course within the limits of the inherent accuracy of the predictive model). However, even then, further tests may be made to optimize the effectiveness of the checking of the sample under test, in order to increase the confidence level of the prediction made for each test sample which passes all the tests. This preferred way of performing the invention will be described in more detail hereinbelow. Correspondingly, if the result of the check is negative, then the analyst knows that any corresponding prediction is likely to provide unreliable results.
The response to a negative result of the check can take one of many forms. For example, it could be simply to provide a warning or alarm to the operator. The prediction of property and/or composition data can be made even when a warning or alarm is generated, but the warning or alarm indicates to the analyst that the prediction is likely to be unreliable. Preferably, any test sample for which the result of the check is negative (i.e. the measures spectrum is not within the range of calibration sample spectra in the model) is physically isolated. It can then be taken to the laboratory for independent analysis of its property and/or composition (determined by the standard analytical technique used in generating the property and/or composition data for the initial model). Preferably, the model is adapted to allow the data separately determined in this way, together with the corresponding measured spectral data, to be entered into the model database, so that the model thereby becomes updated with this additional data, so as to enlarge the predictive capability of the model. In this way, the model "learns" from identification of a test sample for which it cannot perform a reliable prediction, so that next time a similar sample is tested containing chemical species of the earlier sample (and assuming any other chemical species it contains correspond to those of other calibration samples in the model), a reliable prediction of property and/or composition data for that similar sample will be made.
Various forms of predictive model are possible for determining the correlation between the spectra of the calibration samples and their known property and/or composition data. Thus, the predictive model can be based on principal components analysis or partial least squares analysis of the calibration sample spectra. In any eigenvectorbased predictive model such as the foregoing, whether or not the measured spectrum of the test sample is within the range of the calibration sample spectra in the model can be determined in the following manner. A simulated spectrum for the test sample is determined by deriving coefficients for the measured test spectrum from the dot (scalar) products of the measured test spectrum with each of the model eigenspectra and by adding together the model eigenspectra scaled by the corresponding coefficient. Then, a comparison is made between the simulated spectrum calculated in this way and the measured spectrum as an estimate of whether or not the measured spectrum is within the range of the calibration sample spectra in the model. This comparison, according to a preferred way of performing the invention, can be made by determining a residual spectrum as the difference between the simulated spectrum and the measured spectrum, by calculating the Euclidean norm by summing the squares of the magnitudes of the residual spectrum at discrete frequencies, and by evaluating the magnitude of the Euclidean norm. A large value, determined with reference to a preselected threshold distance, is indicative that the required data prediction of the test sample cannot accurately be made, while a Euclidean norm lower than the threshold indicates an accurate prediction can be made.
The preferred way of performing the invention described above employs a statistical validity check against the predictive model. However, as an alternative to a statistical check, a rulebased check may be made. Examples of rulebased checks are pattern recognition techniques and/or comparison with spectra of computerized spectral libraries.
The calibration sample spectra may contain spectral data due to the measurement process itself e.g. due to baseline variations and/or exsample interferences (such as due to water vapor or carbon dioxide). This measurement process spectral data can be removed from the calibration sample spectra prior to defining the predictive model by orthogonalizing the calibration sample spectra to one or more spectra modeling the measurement process data. This will be described in further detail herein below under the heading "CONSTRAINED PRINCIPAL SPECTRA ANALYSIS (CPSA)".
Even though a test sample has passed the abovedescribed validity check, further checking may be desirable. For example, despite passing the validity check, the property and/or compositions data prediction may be an extrapolation from the range of data covered by the calibration samples used for forming the predictive model. It is therefore preferred that the Mahalanobis distance is determined for the measured spectrum and the test sample "accepted" from this further test if the magnitude of the Mahalanobis distance is below an appropriate predetermined amount selected by the analyst. If the calculated Mahalanobis distance is above the appropriate predetermined amount, a similar response as described hereinabove for a negative check is initiated.
Another statistical check is to ascertain whether the test sample is lying in a region in which the number of calibration samples in the predictive model is sparse. This check can be made by calculating the Euclidean norm derived for each test sample/calibration sample pair and comparing the calculated Euclidean norms with a threshold value which, if exceeded, indicates that the sample has failed to pass this additional statistical check. In which case, a similar response as described hereinabove for a negative check is initiated.
The method disclosed herein finds particular application to online estimation of property and/or composition data of hydrocarbon test samples. Conveniently and suitably, all or most of the abovedescribed steps are performed by a computer system of one or more computers with minimal or no operator interaction required.
It has been remarked above that the prediction can be based on principal components analysis, and also that spectral data in the calibration sample spectra due to measurement process data itself can be removed by an orthogonalization procedure. The combination of principal components analysis and the aforesaid orthogonalization procedure is referred to herein as Constrained Principal Spectra Analysis, abbreviated to "CPSA". The present invention can employ any numerical analysis technique (such as PCR, PLS or MLK) through which the predictive model can be obtained to provide an estimation of unknown property and/or composition data. It is preferred that the selected numerical analysis technique be CPSA. CPSA is described in detail in the present assignees U.S. patent application Ser. No. 07/597,910 of James M. Brown, filed on Oct. 15, 1990 (now U.S. Pat. No. 5,121,337), the contents of which are expressly incorporated herein by reference. The relevant disclosure of this patent application of James M. Brown will be described below.
In another aspect, the invention provides apparatus for estimating property and/or composition data of a hydrocarbon test sample. The apparatus comprises spectrometer means for performing a spectral measurement on a test sample, and also computer means. The computer means serves three main purposes. The first is for estimating the property and/or composition data of the test sample from its measured spectrum on the basis of a predictive model correlating calibration sample spectra to known property and/or composition data for those calibration samples. The second is for determining, on the basis of a check (such as described hereinabove) of the measured spectrum against the predictive model, whether or not the measured spectrum is within the range of the calibration sample spectra in the model. The third function of the computer means is for generating a response (the nature of which has been described hereinabove in detail with reference to the inventive method) if the result of the check is negative.
The computer means is generally arranged to determine the predictive model according to all the calibration sample spectra data and all the known property and/or composition data of the calibration samples in its database. The computer means may be further arranged to respond to further such data inputted to its database for storage therein, so that the predictive model thereby becomes updated according to the further such data. The inputted property and/or composition data is derived by a separate method, such as by laboratory analysis.
The Constrained Principal Spectra Analysis (CPSA), being a preferred implementation of the inventive method and apparatus, will now be described in detail.
In CPSA, the spectral data of a number (n) of calibration samples is corrected for the effects of data arising from the measurement process itself (rather than from the sample components). The spectral data for n calibration samples is quantified at f discrete frequencies to produce a matrix X (of dimension f by n) of calibration data. The first step in the method involves producing a correction matrix U.sub. of dimension f by m comprising m digitized correction spectra at the discrete frequencies f, the correction spectra simulating data arising from the measurement process itself. The other step involves orthoganalizing X with respect to U.sub. to produce a corrected spectra matrix X_{c} whose spectra are orthogonal to all the spectra in U.sub. . Due to this orthogonality, the spectra in matrix X_{c} are statistically independent of spectra arising from the measurement process itself. If (as would normally be the case) the samples are calibration samples used to build a predictive model interrelating known property and composition data of the n samples and their measured spectra so that the model can be used to estimate unknown property and/or composition data of a sample under consideration from its measured spectrum, the estimated property and/or composition data will be unaffected by the measurement process itself. In particular, neither baseline variations nor spectra due for example to water vapor or carbon dioxide vapor in the atmosphere of the spectrometer will introduce any error into the estimates. It is also remarked that the spectra can be absorption spectra and the preferred embodiments described below all involve measuring absorption spectra. However, this is to be considered as exemplary and not limiting on the scope of the invention as defined by the appended claims, since the method disclosed herein can be applied to other types of spectra such as reflection spectra and scattering spectra (such as Raman scattering). Although the description given herein relates to NIR (nearinfrared) and MIR (midinfrared), nevertheless, it will be understood that the method finds applications in other spectral measurement wavelength ranges including, for example, ultraviolet, visible spectroscopy and Nuclear Magnetic Resonance (NMR) spectroscopy.
Generally, the data arising from the measurement process itself are due to two effects. The first is due to baseline variations in the spectra. The baseline variations arise from a number of causes such as light source temperature variations during the measurement, reflectance, scattering or absorbances from the cell windows, and changes in the temperature (and thus the sensitivity) of the instrument detector. These baseline variations generally exhibit spectral features which are broad (correlate over a wide frequency range). The second type of measurement process signal is due to exsample chemical compounds present during the measurement process, which give rise to sharper line features in the spectrum. For current applications, this type of correction generally includes absorptions due to water vapor and/or carbon dioxide in the atmosphere in the spectrometer. Absorptions due to hydroxyl groups in optical fibers could also be treated in this fashion. Corrections for contaminants present in the samples can also be made, but generally only in cases where the concentration of the contaminant is sufficiently low as to not significantly dilute the concentrations of the sample components, and where no significant interactions between the contaminant and sample component occurs. It is important to recognize that these corrections are for signals that are not due to components in the sample. In this context, "sample" refers to that material upon which property and/or component concentration measurements are conducted for the purpose of providing data for the model development. By "contaminant," we refer to any material which is physically added to the sample after the property/component measurement but before or during the spectral measurement.
The present corrective method can be applied to correct only for the effect of baseline variations, in which case these variations can be modeled by a set of preferably orthogonal, frequency (or wavelength) dependent polynomials which form the matrix U.sub. of dimension f by m where m is the order of the polynomials and the columns of U.sub. are preferably orthogonal polynomials, such as Legendre polynomials. Alternatively the corrective method can be applied to correct only for the effect of exsample chemical compounds (e.g. due to the presence in the atmosphere of carbon dioxide and/or water vapor). In this case, the spectra that form the columns of U.sub. are preferably orthogonal vectors that are representative of the spectral interferences produced by such chemical compounds. It is preferred, however, that both baseline variations and exsample chemical compounds are modelled in the manner described to form two correction matrices U_{p} of dimension f by p and X_{s}, respectively. These matrices are then combined into the single matrix U.sub. , whose columns are the columns of U_{p} and X_{s} arranged sidebyside.
In a preferred way of performing the invention, in addition to matrix X of spectral data being orthogonalized relative to the correction matrix U.sub. , the spectra or columns of U.sub. are all mutually orthogonal. The production of the matrix U.sub. having mutually orthogonal spectra or columns can be achieved by firstly modeling the baseline variations by a set of orthogonal frequency (or wavelength) dependent polynomials which are computer generated simulations of the baseline variations and form the matrix U_{p}, and then at least one, and usually a plurality, of spectra of exsample chemical compounds (e.g. carbon dioxide and water vapor) which are actual spectra collected on the instrument, are supplied to form the matrix X_{s}. Next the columns of X_{s} are orthogonalized with respect to U_{p} to form a new matrix X_{s} '. This removes baseline effects from exsample chemical compound corrections. Then, the columns of X_{s} ' are orthogonalized with respect to one another to form a new matrix U_{s}, and lastly U_{p} and U_{s} are combined to form the correction matrix U.sub. , whose columns are the columns of U_{p} and U_{s} arranged sidebyside. It would be possible to change the order of the steps such that firstly the columns of X_{s} are orthogonalized to form a new matrix of vectors and then the (mutually orthogonal) polynomials forming the matrix U_{p} are orthogonalized relative to these vectors and then combined with them to form the correction matrix U.sub. . However, this is less preferred because it defeats the advantage of generating the polynomials as being orthogonal in the first place, and it will also mix the baseline variations in with the spectral variations due to exsample chemical compounds and make them less useful as diagnostics of instrument performance.
In a real situation, the sample spectral data in the matrix X will include not only spectral data due to the measurement process itself but also data due to noise. Therefore, once the matrix X (dimension f by n) has been orthogonalized with respect to the correction matrix U.sub. (dimension f by m), the resulting corrected spectral matrix X_{c} will still contain noise data. This can be removed in the following way. Firstly, a singular value decomposition is performed on matrix X_{c} in the form X_{c} =UΣV^{t}, where U is a matrix of dimension f by n and contains the principal component spectra as columns, Σ is a diagonal matrix of dimension n by n and contains the singular values, and V is a matrix of dimension n by n and contains the principal component scores, V^{t} being the transpose of V. In general, the principal components that correspond to noise in the spectral measurements in the original n samples will have singular values which are small in magnitude relative to those due to the wanted spectral data, and can therefore be distinguished from the principal components due to real sample components. Accordingly, the next step in the method involves removing from U, Σ and V the k+1 through n principal components that correspond to the noise, to form the new matrices U', Σ' and V' of dimensions f by k, k by k and n by k, respectively. When these matrices are multiplied together, the resulting matrix, corresponding with the earlier corrected spectra matrix X_{c}, is free of spectral data due to noise.
For the selection of the number (k) of principal components to keep in the model, a variety of statistical tests suggested in the literature could be used but the following steps have been found to give the best results. Generally, the spectral noise level is known from experience with the instrument. From a visual inspection of the eigenspectra (the columns of matrix U resulting from the singular value decomposition), a trained spectroscopist can generally recognize when the signal levels in the eigenspectra are comparable with the noise level. By visual inspection of the eigenspectra, an approximate number of terms, k, to retain can be selected. Models can then be built with, for example, k2, k1, k, k+1, k+2 terms in them and the standard errors and PRESS (Predictive Residual Error Sum of Squares) values are inspected. The smallest number of terms needed to obtain the desired precision in the model or the number of terms that give the minimum PRESS value is then selected. This choice is made by the spectroscopist, and is not automated. A Predicted Residual Error Sum of Squares is calculated by applying a predictive model for the estimation of property and/or component values for a test set of samples which were not used in the calibration but for which the true value of the property or component concentration is known. The difference between the estimated and true values is squared, and summed for all the samples in the test set (the square root of the quotient of the sum of squares and the number of test samples is sometimes calculated to express the PRESS value on a per sample basis). A PRESS value can be calculated using a cross validation procedure in which one or more of the calibration samples are left out of the data matrix during the calibration, and then analyzed with the resultant model, and the procedure is repeated until each sample has been left out once.
The polynomials that are used to model background variations are merely one type of correction spectrum. The difference between the polynomials and the other "correction spectra" modeling exsample chemical compounds is twofold. First, the polynomials may conveniently be computergenerated simulations of the background (although this is not essential and they could instead be simple mathematical expressions or even actual spectra of background variations) and can be generated by the computer to be orthogonal. The polynomials may be Legendre polynomials which are used in the actual implementation of the correction method since they save computation time. There is a specific recursive algorithm to generate the Legendre polynomials that is disclosed in the description hereinafter. Generally, each row of the U_{p} matrix corresponds to a given frequency (or wavelength) in the spectrum. The columns of the U_{p} matrix will be related to this frequency. The elements of the first column would be a constant, the elements of the second column would depend linearly on the frequency, the elements of the third column would depend on the square of the frequency, etc. The exact relationship is somewhat more complicated than that if the columns are to be orthogonal. The Legendre polynomials are generated to be orthonormal, so that it is not necessary to effect a singular value decomposition or a GramSchmidt orthogonalization to make them orthogonal. Alternatively, any set of suitable polynomial terms could be used, which are then orthogonalized using singular value decomposition or a GramSchmidt orthogonalization. Alternatively, actual spectra collected on the instrument to simulate background variation can be used and orthogonalized via one of these procedures. The other "correction spectra" are usually actual spectra collected on the instrument to simulate interferences due to exsample chemical compounds, e.g. the spectrum of water vapor, the spectrum of carbon dioxide vapor, or the spectrum of the optical fiber of the instrument. Computer generated spectra could be used here if the spectra of water vapor, carbon dioxide, etc. can be simulated. The other difference for the implementation of the correction method is that these "correction spectra" are not orthogonal initially, and therefore it is preferred that they be orthogonalized as part of the procedure. The polynomials and the exsample chemical compound "correction spectra" could be combined into one matrix, and orthogonalized in one step to produce the correction vectors. In practice, however, this is not the best procedure, since the results would be sensitive to the scaling of the polynomials relative to the exsample chemical compound "correction spectra". If the exsample chemical compound "correction spectra" are collected spectra, they will include some noise. If the scaling of the polynomials is too small, the contribution of the noise in these "correction spectra" to the total variance in the correction matrix U.sub. would be larger than that of the polynomials, and noise vectors would end up being included in the exsample chemical compound correction vectors. To avoid this, preferably the polynomials are generated first, the exsample chemical compound "correction spectra" are orthogonalized to the polynomials, and then the correction vectors are generated by performing a singular value decomposition (described below) on the orthogonalized "correction spectra".
As indicated above, a preferred way of performing the correction for measurement process spectral data is firstly to generate the orthogonal set of polynomials which model background variations, then to orthoganalize any "correction spectra" due to exsample chemical compounds (e.g. carbon dioxide and/or water vapor) to this set to produce a set of "correction vectors", and finally to orthogonalize the resultant "correction vectors" among themselves using singular value decomposition. If multiple examples of "correction spectra", e.g. several spectra of water vapor, are used, the final number of "correction vectors" will be less than the number of initial "correction spectra". The ones eliminated correspond with the measurement noise. Essentially, principal components analysis (PCA) is being performed on the orthogonalized "correction spectra" to separate the real measurement process data being modelled from the random measurement noise.
It is remarked that the columns of the correction matrix U.sub. do not have to be mutually orthogonal for the correction method to work, as long as the columns of the data matrix X are orthogonalized to those of the correction matrix U.sub. . However, the steps for generating the U.sub. matrix to have orthogonal columns is performed to simplify the computations required in the orthogonalization of the spectral data X of the samples relative to the correction matrix U.sub. , and to provide a set of statistically independent correction terms that can be used to monitor the measurement process. By initially orthogonalizing the correction spectra X_{s} due to exsample chemical compounds to U_{p} which models background variations, any background contribution to the resulting correction spectra is removed prior to the orthogonalization of these correction spectra among themselves. This procedure effectively achieves a separation of the effects of background variations from those of exsample chemical compound variations, allowing these corrections to be used as quality control features in monitoring the performance of an instrument during the measurement of spectra of unknown materials, as will be discussed hereinbelow.
When applying the technique for correcting for the effects of measurement process spectral data in the development of a method of estimating unknown property and/or composition data of a sample under consideration, the following steps are performed. Firstly, respective spectra of n calibration samples are collected, the spectra being quantified at f discrete frequencies (or wavelengths) and forming a matrix X of dimension f by n. Then, in the manner described above, a correction matrix U.sub. of dimension f by m is produced. This matrix comprises m digitized correction spectra at the discrete frequencies f, the correction spectra simulating data arising from the measurement process itself. The next step is to orthogonalize X with respect to U.sub. to produce a corrected spectra matrix X_{c} whose spectra are each orthogonal to all the spectra in U.sub. . The method further requires that c property and/or composition data are collected for each of the n calibration samples to form a matrix Y of dimension n by c (c≧1). Then, a predictive model is determined correlating the elements of matrix Y to matrix X_{c}. Different predictive models can be used, as will be explained below. The property and/or composition estimating method further requires measuring the spectrum of the sample under consideration at the f discrete frequencies to form a matrix of dimension f by 1. The unknown property and/or composition data of the samples is then estimated from its measured spectrum using the predictive model. Generally, each property and/or component is treated separately for building models and produces a separate f by 1 prediction vector. The prediction is just the dot product of the unknown spectrum and the prediction vector. By combining all the prediction vectors into a matrix P of dimension f by c, the prediction involves multiplying the spectrum matrix (a vector of dimension f can be considered as a 1 by f matrix) by the prediction matrix to produce a 1 by c vector of predictions for the c properties and components.
As mentioned in the preceding paragraph, various forms of predictive model are possible. The predictive model can be determined from a mathematical solution to the equation Y=X_{c} ^{t} P+E, where X_{c} ^{t} is the transpose of the corrected spectra matrix X_{c}, P is the predictive matrix of dimension f by c, and E is a matrix of residual errors from the model and is of dimension n by c. The validity of the equation Y=X_{c} ^{t} P+E follows from the inverse statement of Beer's law, which itself can be expressed in the form that the radiationabsorbance of a sample is proportional to the optical pathlength through the sample and the concentration of the radiationabsorbing species in that sample. Then, for determining the vector y_{u} of dimension 1 by c containing the estimates of the c property and/or composition data for the sample under consideration, the spectrum x_{u} of the sample under consideration, x_{u} being of dimension f by 1, is measured and y_{u} is determined from the relationship y_{u} =x_{u} ^{t} P, x_{u} ^{t} being the transpose of matrix x_{u}.
Although, in a preferred implementation of this invention, the equation Y=X_{c} ^{t} P+E is solved to determine the predictive model, the invention could also be used with models whose equation is represented (by essentially the statement of Beer's law) as X_{c} =AY^{t} +E, where A is an f by c matrix. In this case, the matrix A would first be estimated as A=X_{c} Y(Y^{t} Y)^{1}. The estimation of the vector y_{u} of dimension 1 by c containing the c property and/or composition data for the sample under consideration from the spectrum x_{u} of the sample under consideration would then involve using the relationship y_{u} =x_{u} A(A^{t} A)^{1}. This calculation, which is a constrained form of the Kmatrix method, is more restricted in application, since the required inversion of Y^{t} Y requires that Y contain concentration values for all sample components, and not contain property data.
The mathematical solution to the equation Y=X_{c} ^{t} P+E (or X_{c} =AY^{t} +E) can be obtained by any one of a number of mathematical techniques which are known per se, such as linear least squares regression, sometimes otherwise known as multiple linear regression (MLR), principal components analysis/regression (PCA/PCR) and partial least squares (PLS). As mentioned above, an introduction to these mathematical techniques is given in "An Introduction to Multivariate Calibration and Analysis", Analytical Chemistry, Vol. 59, No. 17, Sep. 1, 1987, Pages 1007 to 1017.
The purpose of generating correction matrix U.sub. and in orthogonalizing the spectral data matrix X to U.sub. is twofold: Firstly, predictive models based on the resultant corrected data matrix X_{c} are insensitive to the effects of background variations and exsample chemical components modeled in U.sub. , as explained above. Secondly, the dot (scaler) products generated between the columns of U.sub. and those of X contain information about the magnitude of the background and exsample chemical component interferences that are present in the calibration spectra, and as such, provide a measure of the range of values for the magnitude of these interferences that were present during the collection of the calibration spectral data. During the analysis of a spectrum of a material having unknown properties and/or composition, similar dot products can be formed between the unknown spectrum, x_{u}, and the columns of U.sub. , and these values can be compared with those obtained during the calibration as a means of checking that the measurement process has not changed significantly between the time the calibration is accomplished and the time the prodictive model is applied for the estimation of properties and components for the sample under test. These dot products thus provide a means of performing a quality control assessment on the measurement process.
The dot products of the columns of U.sub. with those of the spectral data matrix X contain information about the degree to which the measurement process data contribute to the individual calibration spectra. This information is generally mixed with information about the calibration sample components. For example, the dot product of a constant vector (a first order polynomial) will contain information about the total spectral integral, which is the sum of the integral of the sample absorptions, and the integral of the background. The information about calibration sample components is, however, also contained in the eigenspectra produced by the singular value decomposition of X_{c}. It is therefore possible to remove that portion of the information which is correlated to the sample components from the dot products so as to recover values that are uncorrelated to the sample components, i.e. values that represent the true magnitude of the contributions of the measurement process signals to the calibration spectra. This is accomplished by the following steps:
(1) A matrix V.sub. of dimension n by m is formed as the product of X^{t} U.sub. , the individual elements of V.sub. being the dot products of the columns of X with those of U.sub. ;
(2) The corrected data matrix X_{c} is formed, and its singular value decomposition is computed as UΣV^{t} ;
(3) A regression of the form V.sub. =VZ+R is calculated to establish the correlation between the dot products and the scores of the principal components: VZ represents the portion of the dot products which is correlated to the sample components and the regression residuals R represent the portion of the dot products that are uncorrelated to the sample components, which are in fact the measurement process signals for the calibration samples;
(4) In the analysis of a sample under test, the dot products of the unknown spectrum with each of the correction spectra (columns of U.sub. ) are calculated to form a vector v.sub. , the corrected spectrum x_{c} is calculated, the scores for the corrected spectrum are calculated as v=x_{c} ^{t} UΣ^{1}, and the uncorrelated measurement process signal values are calculated as r=v.sub. vZ. The magnitude of these values is then compared to the range of values in R as a means of comparing the measurement process during the analysis of the unknown to that during the calibration.
It will be appreciated that the performance of the above disclosed correction method and method of estimating the unknown property and/or composition data of the sample under consideration involves extensive mathematical computations to be performed. In practice, such computations are made by computer means comprising a computer or computers, which is connected to the instrument. In a measurement mode, the computer means receives the measured output spectrum of the calibration sample, exsample chemical compound or test sample. In a correction mode in conjunction with the operator, the computer means stores the calibration spectra to form the matrix X, calulates the correction matrix U.sub. , and orthogonalizes X with respect to the correction matrix U.sub. . In addition, the computer means operates in a storing mode to store the c known property and/or composition data for the n calibration samples to form the matrix Y of dimension n by c (c≧1). In a model building mode, the computer means determines, in conjunction with the operator, a predictive model correlating the elements of matrix Y to those of matrix X_{c}. Lastly, the computer means is arranged to operate in a prediction mode in which it estimates the unknown property and/or compositional data of the sample under consideration from its measured spectrum using the determined predictive model correlating the elements of matrix Y to those of matrix X_{c}.
In more detail, the steps involved according to a preferred way of making a prediction of property and/or composition data of a sample under consideration can be set out as follows. Firstly, a selection of samples for the calibration is made by the operator or a laboratory technician. Then, in either order, the spectra and properties/composition of these samples need to be measured, collected and stored in the computer means by the operator and/or laboratory technician, together with spectra of exsample chemical compounds to be used as corrections. In addition, the operator selects the computergenerated polynomial corrections used to model baseline variations. The computer means generates the correction matrix U.sub. and then orthogonalizes the calibration sample spectra (matrix X) to produce the corrected spectral matrix X_{c} and, if PCR is used, performs the singular value decomposition on matrix X_{c}. The operator has to select (in PCR) how many of the principal components to retain as correlated data and how many to discard as representative of (uncorrelated) noise. Alternatively, if the PLS technique is employed, the operator has to select the number of latent variables to use. If MLR is used to determine the correlation between the corrected spectral matrix X_{c} and the measured property and/or composition data Y, then a selection of frequencies needs to be made such that the number of frequencies at which the measured spectra are quantized is less than the number of calibration samples. Whichever technique is used to determine the correlation (i.e. the predictive model) interrelating X_{c} and Y, having completed the calibration, the laboratory technician measures the spectrum of the sample under consideration which is used by the computer means to compute predicted property and/or composition data based on the predictive model.
The object of Principal Components Analysis (PCA) is to isolate the true number of independent variables in the spectral data so as to allow for a regression of these variables against the dependent property/composition variables. The spectral data matrix, X, contains the spectra of the n samples to be used in the calibration as columns of length f, where f is the number of data points (frequencies or wavelengths) per spectrum. The object of PCA is to decompose the f by n X matrix into the product of several matrices. This decomposition can be accomplished via a Singular Value Decomposition:
X=UΣV.sup.t ( 1)
where U (the left eigenvector matrix) is of dimension f by n, Σ (the diagonal matrix containing the singular values σ) is of dimension n by n, and V^{t} is the transpose of V (the right eigenvector matrix) which is of dimension n by n. Since some versions of PCA perform the Singular Value Decomposition on the transpose of the data matrix, X^{t}, and decompose it as VΣU^{t}, the use of the terms left and right eigenvectors is somewhat arbitrary. To avoid confusion, U will be referred to as the eigenspectrum matrix since the individual columnvectors of U (the eigenspectra) are of the same length, f, as the original calibration spectra. The term eigenvectors will only be used to refer to the V matrix. The matrices in the singular value decomposition have the following properties:
U.sup.t U=I.sub.n ( 2)
VV.sup.t =V.sup.t V=I.sub.n ( 3)
X.sup.t X=VΛV.sup.t and XX.sup.t =UΛU.sup.t ( 4)
where I_{n} is the n by n identify matrix, and Λ is the matrix containing the eigenvalues, λ (the squares of the singular values), on the diagonal and zeros off the diagonal. Note that the product UU^{t} does not yield an identity matrix for n less than f. Equations 2 and 3 imply that both the eigenspectra and eigenvectors are orthonormal. In some version of PCA, the U and Σ are matrices are combined into a single matrix. In this case, the eigenspectra are orthogonal but are normalized to the singular values.
The object of the variable reduction is to provide a set of independent variables (the Principal Components) against which the dependent variables (the properties or compositions) can be regressed. The basic regression equation for direct calibration is
Y=X.sup.t P (5)
where Y is the n by c matrix containing the property/composition data for the n samples and c properties/components, and P is the f by c matrix of regression coefficients which relate the property/composition data to the spectral data. We will refer to the c columns of P as prediction vectors, since during the analysis of a spectrum x (dimension f by 1), the prediction of the properties/components (y of dimension 1 by c) for the sample is obtained by
y=x.sup.t P (6)
Note that for a single property/component, the prediction is obtained as the dot product of the spectrum of the unknown and the prediction vector. The solution to equation 5 is
[X.sup.t ].sup.1 Y=[X.sup.t ].sup.1 X.sup.t P=P (7)
where [X^{t} ]^{1} is the inverse of the X^{t} matrix. The matrix X^{t} is of course nonsquare and rank deficient (f>n), and cannot be directly inverted. Using the singular value decompositions, however, the inverse can be approximated as
[X.sup.t ].sup.1 =UΣ.sup.1 V.sup.t ( 8)
where Σ^{1} is the inverse of the square singular value matrix and contains 1/σ on the diagonal. Using equations 7 and 8, the prediction vector matrix becomes
P=UΣ.sup.1 V.sup.t Y (9)
As was noted previously, the objective of the PCA is to separate systematic (frequency correlated) signal from random noise. The eigenspectra corresponding to the larger singular values represent the systematic signal, while those corresponding to the smaller singular values represent the noise. In general, in developing a stable model, these noise components will be eliminated from the analysis before the prediction vectors are calculated. If the first k<n eigenspectra are retained, the matrices in equation 1 become U' (dimension f by k), Σ' (dimension k by k) and V' (dimension n by k).
X=U'Σ'V'.sup.t +E (10)
where E is an f by n error matrix. Ideally, if all the variations in the data due to sample components are accounted for in the first k eigenspectra, E contains only random noise. It should be noted that the product V'V'^{t} no longer yields an identity matrix. To simplify notation the ' will be dropped, and U,Σ and V will henceforth refer to the rank reduced matrices. The choice of k, the number of eigenspectra to be used in the calibration, is based on statistical tests and some prior knowledge of the spectral noise level.
Although the prediction of a property/component requires only a single prediction vector, the calculation of uncertainties on the prediction require the full rank reduced V matrix. In practice, a two step, indirect calibration method is employed in which the singular value decomposition of the X matrix is calculated (equation 1), and then the properties/compositions are separately regressed against the eigenvectors
Y=VB+ (11)
B=V.sup.t Y (12)
During the analysis, the eigenvector for the unknown spectrum is obtained
v=x.sup.t UΣ.sup.1 ( 13)
and the predictions are made as
y=vB (14)
The indirect method is mathematically equivalent to the direct method of equation 10, but readily provides the values needed for estimating uncertainties on the prediction.
Equation 6 shows how the prediction vector, P, is used in the analysis of an unknown spectrum. We assume that the unknown spectrum can be separated as the sum of two terms, the spectrum due to the components in the unknown, x_{c}, and the measurement process related signals for which we want to develop constraints, x_{s}. The prediction then becomes
y=x.sup.t P=x.sub.c.sup.t P+x.sub.s.sup.t P (15)
If the prediction is to be insensitive to the measurement process signals, the second term in equation 15 must be zero. This implies that the prediction vector must be orthogonal to the measurement process signal spectra. From equation 10, the prediction vector is a linear combination of the eigenspectra, which in turn are themselves linear combination of the original calibration spectra (U=XVΣ^{1}). If the original calibration spectra are all orthogonalized to a specific measurement process signal, the resulting prediction vector will also be orthogonal, and the prediction will be insensitive to the measurement process signal. This orthogonalization procedure serves as the basis for the Constrained Principal Spectra Analysis algorithm.
In the Constrained Principal Spectra Analysis (CPSA) program, two types of measurement process signals are considered. The program internally generates a set of orthonormal, frequency dependent polynomials, U_{p}. U_{p} is a matrix of dimension f by p where p is the maximum order (degree minum one) of the polynomials, and it contains columns which are orthonormal Legendre polynomials defined over the spectral range used in the analysis. The polynomials are intended to provide constraints for spectral baseline effects. In addition, the user may supply spectra representative of other measurement process signals (e.g. water vapor spectra). These correction spectra (a matrix X_{s} of dimension f by s where s is the number of correction spectra) which may include multiple examples of a specific type of measurement process signal, are first orthogonalized relative to the polynomials via a GramSchmidt orthogonalization procedure
X.sub.s '=X.sub.s U.sub.p (U.sub.p.sup.t X.sub.s) (16)
A Singular Value Decomposition of the resultant correction spectra is then performed,
X.sub.s '=U.sub.s Σ.sub.s V.sub.s.sup.t ( 17)
to generate a set of orthonormal correction eigenspectra, U_{s}. The user selects the first s' terms corresponding to the number of measurement related signals being modeled, and generates the full set of correction terms, U.sub. , which includes both the polynomials and selected correction eigenspectra. These correction terms are then removed from the calibration data, again using a GramSchmidt orthogonalization procedure
X.sub.c =XU.sub.m (U.sub.m.sup.t X) (17)
The Principal Components Analysis of the corrected spectra, X_{c}, then proceeds via the Singular Value Decomposition
X.sub.c =U.sub.c Σ.sub.c V.sub.c.sup.t ( 18)
and the predictive model is developed using the regression
Y=V.sub.c B (19)
The resultant prediction vector
P.sub.c =U.sub.c Σ.sub.c.sup.1 V.sub.c.sup.t Y (20)
is orthogonal to the polynomial and correction eigenspectra, U_{m}. The resulting predictive model is thus insensitive to the modeled measurement process signals. In the analysis of an unknown, the contributions of the measurement process signals to the spectrum can be calculated as
v.sub. =Σ.sub. .sup.1 U.sub. .sup.t x (21)
and these values can be compared against the values for the calibration, V.sub. , to provide diagnostic as to whether the measurement process has changed relative to the calibration.
The results of the procedure described above are mathematically equivalent to including the polynomial and correction terms as spectra in the data matrix, and using a constrained least square regression to calculate the B matrix in equation 12. The constrained least square procedure is more sensitive to the scaling of the correction spectra since they must account for sufficient variance in the data matrix to be sorted into the k eigenspectra that are retained in the regression step. By orthogonalizing the calibration spectra to the correction spectra before calculating the singular value decomposition, we eliminate the scaling sensitivity.
The Constrained Principal Spectra Analysis method allows measurement process signals which are present in the spectra of the calibration samples, or might be present in the spectra of samples which are latter analyzed, to be modeled and removed from the data (via a GramSchmidt orthogonalization procedure) prior to the extraction of the spectral variables which is performed via a Singular Value Decomposition (16). The spectral variables thus obtained are first regressed against the pathlengths for the calibration spectra to develop a model for independent estimation of pathlength. The spectral variables are rescaled to a common pathlength based on the results of the regression and then further regressed against the composition/property data to build the empirical models for the estimation of these parameters. During the analysis of new samples, the spectra are collected and decomposed into the constrained spectral variables, the pathlength is calculated and the data is scaled to the appropriate pathlength, and then the regression models are applied to calculate the composition/property data for the new materials. The orthogonalization procedure ensures that the resultant measurements are constrained so as to be insensitive (orthogonal) to the modeled measurement process signals. The internal pathlength calculation and renormalization automatically corrects for pathlength or flow variations, thus minimizing errors due to data scaling.
The development of the empirical model consists of the following steps:
(1.1) The properties and/or component concentrations for which empirical models are to be developed are independently determined for a set of representative samples, e.g the calibration set. The independent measurements are made by standard analytical tests including, but not limited to: elemental compositional analysis (combustion analysis, Xray fluorescence, broad line NMR); component analysis (gas chromatography, mass spectroscopy); other spectral measurements (IR, UV/visible, NMR, color); physical property measurements (API or specific gravity, refractive index, viscosity or viscosity index); and performance property measurements (octane number, cetane number, combustibility). For chemicals applications where the number of sample components is limited, the compositional data may reflect weights or volumes used in preparing calibration blends.
(1.2) Absorption spectra of the calibration samples are collected over a region or regions of the infrared, the data being digitized at discrete frequencies (or wavelengths) whose separation is less than the width of the absorption features exhibited by the samples.
(2.0) The Constrained Principal Spectra Analysis (CPSA) algorithm is applied to generate the empirical model. The algorithm consists of the following 12 steps:
(2.1) The infrared spectral data for the calibration spectra is loaded into the columns of a matrix X, which is of dimension f by n where f is the number of frequencies or wavelengths in the spectra, and n is the number of calibration samples.
(2.2) Frequency dependent polynomials, U_{p}, (a matrix whose columns are orthonormal Legendre polynomials having dimension f by p where p is the maximum order of the polynomials) are generated to model possible variations in the spectral baseline over the spectral range used in the analysis.
(2.3) Spectra representative of a other types of measurement process signals (e.g. water vapor spectra, carbon dioxide, etc.) are loaded into a matrix X_{s} of dimension f by s where s is the number of correction spectra used.
(2.4) The correction spectra are orthogonalized relative to the polynomials via a GramSchmidt orthogonalization procedure
X.sub.s '=X.sub.s U.sub.p (U.sub.p.sup.t X.sub.s) (2.4)
(2.5) A Singular Value Decomposition of the correction spectra is then performed,
X.sub.s '=U.sub.s Σ.sub.s V.sub.s.sup.t ( 2.5)
to generate a set of orthonormal correction eigenspectra, U_{s}. Σ_{s} are the corresponding singular values, and V_{s} are the corresponding right eigenvectors, ^{t} indicating the matrix transpose.
(2.6) The full set of correction terms, U.sub. =U_{p} +U_{s}, which includes both the polynomials and correction eigenspectra are then removed from the calibration data, again using a GramSchmidt orthogonalization procedure
X.sub.c =XU.sub. (U.sub. .sup.t X) (2.6)
(2.7) The Singular Value Decomposition of the corrected spectra, X_{c}, is then performed
X.sub.c =U.sub.c Σ.sub.c V.sub.c.sup.t ( 2.7)
(2.8) The eigenspectra from step (2.7) are examined and the a subset of the first k eigenspectra which correspond to the larger singular values in Σ_{c} are retained. The k+1 through n eigenspectra which correspond to spectral noise are discarded.
X.sub.c =U.sub.k Σ.sub.k V.sub.k.sup.t +E.sub.k ( 2.8)
(2.9) The k right eigenvectors from the singular value decomposition, V_{k}, are regressed against the pathlength values for the calibration spectra, Y_{p} (an n by 1 row vector),
Y.sub.p =V.sub.k B.sub.p +E.sub.p ( 2.9a)
where E_{p} is the regression error. The regression coefficients, B_{p}, are calculated as
B.sub.p =(V.sub.k.sup.t V.sub.k).sup.1 V.sub.k.sup.t Y.sub.p =V.sub.k.sup.t Y.sub.p ( 2.9b)
(2.10) An estimation of the pathlengths for the calibration spectra is calculated as
Y.sub.p =V.sub.k B.sub.p ( 2.10)
A n by n diagonal matrix N is then formed, the i^{th} diagonal element of N being the ratio of the average pathlength for the calibration spectra, y_{p}, divided by the estimated pathlength values for the i^{th} calibration sample (the i^{th} element of Y_{p}).
(2.11) The right eigenvector matrix is then renormalized as
V.sub.k '=NV.sub.k ( 2.11)
(2.12) The renormalized matrix is regressed against the properties and or concentrations, Y (Y, a n by c matrix containing the values for the n calibration samples and the c property/concentrations) to obtain the regression coefficients for the models,
V=V.sub.k 'B+E (2.12a)
B=(V.sub.k '.sup.t V.sub.k ').sup.1 V.sub.k 'Y (2.12b)
(3.0) The analysis of a new sample with unknown properties/components proceeds by the following steps:
(3.1) The absorption spectrum of the unknown is obtained under the same conditions used in the collection of the calibration spectra.
(3.2) The absorption spectrum, x_{u}, is decomposed into the constrained variables,
x.sub.u =U.sub.k Σ.sub.k v.sub.u.sup.t ( 3.2a)
v.sub.u =Σ.sup.1 U.sub.k.sup.t x.sub.u ( 3.2b)
(3.3) The pathlength for the unknown spectrum is estimated as
y.sub.p =v.sub.u B.sub.p ( 3.3)
(3.4) The eigenvector for the unknown is rescaled as
v.sub.u '=v.sub.u (y.sub.p /y.sub.p) (3.4)
where y_{p} is the average pathlength for the calibration spectra in (2.10).
(3.5) The properties/concentrations are estimated as
y.sub.u =v.sub.u 'B (3.5)
(4.1) The spectral region used in the calibration and analysis may be limited to subregions so as to avoid intense absorptions which may be outside the linear response range of the spectrometer, or to avoid regions of low signal content and high noise.
(5.1) The samples used in the calibration may be restricted by excluding any samples which are identified as multivariate outliers by statistical testing.
(6.1) The regression in steps (2.9) and (2.12) may be accomplished via a stepwise regression (17) or PRESS based variable selection (18), so as to limit the number of variables retained in the empirical model to a subset of the first k variables, thereby eliminating variables which do not show statistically significant correlation to the parameters being estimated.
(7.1) The Mahalanobis statistic for the unknown, D_{u} ^{2}, given by
D.sub.u.sup.2 =v.sub.u '(V.sub.i '.sup.t V.sub.k ').sup.1 v.sub.u '.sup.t ( 7.1)
can be used to determine if the estimation is based on an interpolation or extrapolation of the model by comparing ,the value for the unknown to the average of similar values calculated for the calibration samples.
(7.2) The uncertainty on the estimated value can also be estimated based on the standard error from the regression in (2.12) and the Mahalanobis statistic calculated for the unknown.
(8.1) In the analysis of an unknown with spectrum x_{u}, the contributions of the measurement process signals to the spectrum can be calculated as
v.sub. =Σ.sub.c.sup.1 U.sub. .sup.t x.sub.u ( 8.1)
These values can be compared against the values for the calibration, V.sub. , to provide diagnostics as to whether the measurement process has changed relative to the calibration.
Those and other features and advantages of the invention will now be described, by way of example, with reference to the accompanying single drawing.
The single drawing is a flow chart indicating one preferred way of performing the method of this invention.
The flow chart of the single drawing gives an overview of the steps involved in a preferred way of carrying out the inventive method. The reference numerals used in the drawing relate to the method operations identified below.
1), 2), 3), and 4)
These deal with updating the estimation model and will be described later.
5) Perform Online Measurement
An infrared absorption spectrum of the sample under consideration is measured. However, the method is applicable to any absorption spectrum. The methodology is also be applicable to a wide variety of other spectroscopic measurement techniques including ultraviolet, visible light, Nuclear Magnetic Resonance (NMR), reflection, and photoacoustic spectroscopy, etc.
The spectrum obtained from performing the online measurement is stored on a computer used to control the analyzer operation, and will hereafter be referred to as the test spectrum of the test sample.
6) Data Collection Operation Valid
The spectrum and any spectrometer status information that is available is examined to verify that the spectrum collected is valid from the stand point of spectrometer operations (not statistical comparison to estimation models). The principal criteria for the validity checks are that there is no apparent invalid data which may have resulted from mechanical or electronic failure of the spectrometer. Such failings can most readily be identified by examining the spectrum for unusual features including but not limited to severe baseline errors, zero data or infinite (overrange) data.
If the data collection operation is deemed to be valid, processing continues with the analysis of the data which is collected. If the data collection operation is deemed to be invalid, diagnostic routines are executed in order to perform spectrometer and measurement system diagnostics [16] (numbers in [ ] refer to operation numbers in the attached FIGURE). These may consist of specially written diagnostics or may consist of internal diagnostics contained in the spectrometer system. In either event, the results of the diagnostics are stored on the operational computer and process operations are notified that there is a potential malfunction of the spectrometer [17]. Control returns to operation [5] in order to perform the online measurement again since some malfunctions may be intermittent and collection of valid data may be successfully resumed upon retry.
The objective of the diagnostics performed are to isolate the cause of failure to a system module component for easy maintenance. Therefore, as part of the diagnostic procedure, it may be necessary to introduce calibration and/or standard reference samples into the sample cell in order to perform measurements under a known set of conditions which can be compared to an historical database stored on the computer. The automatic sampling system is able to introduce such samples into the sample cell upon demand from the analyzer computer.
7) Calculate Coefficients and Inferred Spectrum from Model and Measured Spectrum
The measured spectrum of the sample under test, the measured spectrum being the spectral magnitudes at several discrete measurement frequencies (or wavelengths) across the frequency band of the spectrum, is used with the model to calculate several model estimation parameters which are intermediate to the calculation of the property and/or composition data parameter estimates. In the case where the model is an eigenvectorbased model, such as is the case when PCA, PLS, CPSA or similar methods are used, the dot (scalar) product of the measured test spectrum with the model eigenspectra yield coefficients which are a measure of the degree to which the eigenspectra can be used to represent the test spectrum. If, in CPSA and PCR, the coefficients are further scaled by 1/σ, the result would be the scores, v_{u}, defined in equation 3.2b of the foregoing section of the specification describing CPSA. Such scaling is not required in the generation of the simulated spectrum. Back calculation of the simulated test sample spectrum is performed by adding together the model eigenspectra scaled by the corresponding coefficients. For models which are not eigenvectorbased methods, calculations can be defined which can be used to calculate the simulated spectrum of the test sample corresponding to the parameter estimation model.
The residual between the measured test sample spectrum and the simulated test sample spectrum is calculated at each measurement wavelength or frequency. The resulting residual spectrum is used in operation [8].
8) Calculate Measurement Statistical Test Values
From the coefficients and residual spectra available from operation [7] and the measured test sample spectrum from operation [5], several statistical test values can be calculated which are subsequently used in operations [911]. Preferred statistics are described in the discussion of operations [911] and are particularly useful for eigenvectorbased methods. The calculation in the current operation is to provide statistical measures which can be used to assess the appropriateness of the model for estimating parameters for the test sample. Any method, statistical test or test(s), any inferential test, or any rulebased test which can be used for model assessment either singly or in combination may be used.
9) Does Test Sample Spectrum Fail within the Range of the Calibration Spectra in the Model
In the case of a principal components (or PLS) based analysis, this test refers to an examination of the Euclidean norm calculated from the residual spectrum by summing the squared residuals calculated at each measurement frequency or wavelength. The simulated spectrum only contains eigenspectra upon which the model is based. Therefore spectral features representing chemical species which were not present in the original calibration samples used to generate the model will be contained in the residual spectrum. The Euclidean norm for a test sample containing chemical species which were not included in the calibration samples used to generate the model will be significantly larger than the Euclidean norms calculated for the calibration spectra used to generate the model. As noted in operation [8], any statistic, test or procedure may be used which provides an assessment of whether chemical species are present in the test sample which are not contained in the calibration samples. In particular, pattern recognition techniques and/or comparison to spectra contained in computerized spectral libraries may be used in conjunction with the residual spectrum.
In a preferred way of performing the invention, the magnitude of the Euclidean norm is evaluated to see if the test sample spectrum falls within the range of the calibration sample spectra used to generate the model, i.e. is the Euclidean norm small with respect to the Euclidean norms of the calibration sample spectra calculated in a similar fashion. A small Euclidean norm is taken as indication that no chemical species are present in the test sample that were not present in the calibration samples. If negative (a large Euclidean norm), the sample spectrum is archived and a spot sample collected for further laboratory analysis. This is performed in operation [12]. The sampling system with the analyzer is capable of automatically capturing spot samples upon command by the analyzer control computer.
In the context of this test, chemical species are being thought of as chemical components which are contained in the sample as opposed to external interferences such as water vapor which will also show up here and must be distinguished from chemical components which are present in the sample. This can be done by modelling the measured water vapor spectrum and by orthogonalizing the calibration spectra thereto, as described above in relation to CPSA.
10) Does Test Sample Parameter Estimation involve Interpolation of the Model
If the sample is selected as acceptable in operation [9], it is preferable to examine the validity of the model with respect to accurately estimating properties of this sample. Any method of determining statistical accuracy of parameter estimates or confidence levels is appropriate. A preferred way of achieving this is for the Mahalanobis distance (as defined above in equation (7.1) of the section of this specification describing the development of an empirical model in CPSA) to be used to determine the appropriateness of the model calibration data set for estimating the sample. The Mahalanobis distance is a metric which is larger when a test sample spectrum is farther from the geometric center of the group of spectra used for the model calculation as represented on the hyperspace defined by the principal components or eigenspectra used in the model. Thus, a large value of the Mahalanobis distance indicates that the property estimate is an extrapolation from the range of data covered by the model calibration. This does not necessarily mean that the estimate is wrong, only that the uncertainty in the estimate may be larger (or the confidence level smaller) than desirable and this fact must be communicated in all subsequent uses of the data.
If the estimate is found to be uncertain (large Mahalanobis distance), it is desirable to archive the sample spectrum and capture a spot sample for subsequent laboratory analysis using the computer controlled automatic sampling system [operation 12].
11) Does Test Sample Spectrum Fall in a Polpulated Region of Data in Calibration Model
Even though the sample may lie within the data space covered by the model (small value of Mahalanobis distance), the sample may lie in a region in which the number of calibration samples in the model set is sparse. In this case, it is desirable to archive the sample spectrum and capture a spot sample [12] so that the model may be improved. Any standard statistical test of distance may be used in order to make this determination. In particular the intersample Mahalanobis distance calculated for each test sample/calibration sample pair may be examined in order to arrive at the decision as to whether or not the samples should be saved. An intersample Mahalanobis distance is defined as the sum of the squares of the differences between the scores for the test sample spectrum and those for the calibration sample spectrum, scores being calculated by equation (3.2b) of the section of this specification describing the development of an empirical model in CPSA. A negative response results if all the intersample Mahalanobis distances are greater than a predetermined threshold value selected to achieve the desired distribution of calibration sample spectra variability, in which case, it is desirable to archive the sample spectrum and capture a spot sample for subsequent laboratory analysis using the computer controlled automatic sampling system [12].
13) Calculate Parameter and Confidence Interval Estimates
After having performed the statistical tests indicated in operations [9], [10] and [11] and possibly collecting a spot sample as indicated in step [12], the parameters are now estimated from the model. For CPSA, this involves calculating the scores (equation 3.2b) and then estimating the parameters (equations 3.3 to 3.5). The actual numerical calculation performed will depend upon the type of model being used. For the case of a eigenvectorbased analysis (such as PCR, PLS), the method is a vector projection method identical to that described above as CPSA.
14) Transmit Estimate(s) to Process Monitor/Control Computer
Having calculated the parameter estimates and the statistical tests, the parameter estimates and estimates of the parameter uncertainties are now available. These may be transmitted to a separate process monitor or control computer normally located in the process control center. The results may be used by operations for many purposes including process control and process diagnostics. Data transmission may be in analog or digital form.
15) Transmit Estimate(s) to Analyzer Workstation
The analyzer is normally operated totally unattended (stand alone). It is desirable for the results of the analysis and statistical tests to be transferred to a workstation which is generally available to analyzer and applications engineers. This is indicated in operation [15]. While the availability of the data on a separate workstation may be convenient, it is not essential to the operation of the analyzer system.
1) Archived Test Spectrum and Lab Data for Model Update Present
In the event that the samples have been captured and spectra archived for subsequent model updating, it is necessary to update the estimation model. This can only be carried out once the results of laboratory analysis are available along with the archived spectrum.
If model updating is not needed, operation continues with [5].
Model Updating
Model updating consists of operations [2], [3], and [4]. Any or all of the operations may be performed on the analyzer computer or may be performed offline on a separate computer. In the latter case, the results of the updated model must be transferred to the analyzer control computer.
2) Full Model and Regression Calculation Necessary
If the sample which is being included in the model did not result from a negative decision in operation [9], it is not necessary to carry out the calculation which produces the model eigenspectra. This is because operation [9] did not identify the inclusion of additional eigenspectra into the model as being necessary. In this case, only a new regression is required, and the process continues with operation [4].
3) Calculate New Model Using CPSA or Equivalent
The calibration model data base is updated by including the additional spectrum and corresponding laboratory data. The database may be maintained on a separate computer and models developed on that computer. The entire model generation procedure is repeated using the expanded set of data. This, for example, would mean rerunning the CPSA model or whichever numerical methods have been used originally. If this step is performed offline, then the updated eigenspectra must be transferred to the analyzer computer.
Model updating methods could be developed which would allow an updated model to be estimated without having to rerun the entire model building procedure.
4) Perform New Regression and Update Model Regression Coefficients
A regression is performed using the scores for the updated cailbration set and the laboratory measurements of composition and/or property parameters to obtain regression coefficients which will be used to perform the parameter and confidence interval estimation of operation [13]. The regression step is identical to that described above for CPSA (equations 2.9a and b in the section on the development of an emperical model hereinabove). If this step is performed offline, then the regression coefficients must be transferred to the analyzer computer.
The steps described above allow the estimation of property and/or composition parameters by performing online measurements of the absorbance spectrum of a fluid or gaseous process stream. Mathematical analysis provides high quality estimates of the concentration of chemical components and the concentrations of classes of chemical components. Physical and performance parameters which are directly or indirectly correlated to chemical component concentrations are estimable. Conditions for the measurement of the absorbance spectra are specified so as to provide redundant spectral information thereby allowing the computation of method diagnostic and quality assurance measures.
The steps comprising the methodology are performed in an integrative manner so as to provide continuous estimates for method adjustment, operations diagnosis and automated sample collection. Different aspects of the methodology are set out below in numbered paragraphs (1) to (10).
(1.) Selection of the subset region for the absorbance spectra measurements
(1.1) The measurement of the infrared spectrum in the various subregions can be accomplished by the use of different infrared spectrometer equipment. Selection of the appropriate subregion(s) is accomplished by obtaining the spectrum of a representative sample in each of the candidate subregions and selecting the subregion(s) in which absorptions are found which are due directly or indirectly to the chemical constituent(s) of interest. Criteria for selection of the appropriate subregion(s) may be summarized as follows:
The procedure of this paragraph is applicable over a wide range of possible absorption spectrum measurements. No single spectrometer can cover the entire range of applicability. Therefore, it is necessary to select a subset region which matches that which is available in a spectrometer as well as providing significant absorption features for the chemical constituents which are in the sample and which are correlated to the composition and/or property parameter for which a parameter estimate is to be calculated. The criteria for selecting the preferred wavelength subset region include subjective and objective measurements of spectrometer performance, practical sample thickness constraints, achievable sample access, and spectrometer detector choice considerations. The preferred subset region for measuring liquid hydrocarbon process streams is one in which the thickness of the sample is approximately 1 cm. This is achievable in the region from 800 nm to 1600 nm which corresponds to a subset of the near infrared region for which spectrometer equipment which is conveniently adaptable to online measurement is currently available. The named region is a further subset of the range which is possible to measure using a single spectrometer. The further restriction on range is preferred in order to include sufficient range to encompass all absorptions having a similar dynamic range in absorbance and restricted to one octave in wavelength.
(2.) Criteria for selection and measurement of samples for the calibration model calculation.
(2.1) Samples are collected at various times to obtain a set of samples (calibration samples) which are representative of the range of process stream composition variation.
(2.2) Absorption spectra of the samples may be obtained either online during the sample collection procedure or measured separately in a laboratory using the samples collected.
(2.3) The property and/or composition data for which the calibration model is to be generated are separately measured for the collected samples using standard analytical laboratory techniques.
(3.) Calibration model calculation
(3.1) The calibration model is obtained using any one of several multivariate methods and the samples obtained are designated calibration samples. Through the application of the method, a set of eigenspectra are obtained which are a specific transformation of the calibration spectra. They are retained for the property/composition estimation step. An important preferred feature of the invention allows for the updating of the predictive model by collecting samples during actual operation. This will permit a better set of samples to be collected as previously unrecognized samples are analyzed and the relevent data entered into the predictive model. Therefore it is not particularly important how the samples are obtained or which model calculation method is used for the initial predictive model. It is preferable that the initial calibration model be developed using the same method which is likely to be used for developing the model from the updated sample set. Methods which can be used for the calibration model calculation are:
(b 3.1.1) Constrained Principal Spectra Analysis as described hereinabove is the preferred method.
(3.1.2) Principal components regression discussed above is an alternative method.
(3.1.3) Partial least squares analysis, which is a specific implementation of the more general principal components regression.
(3.1.4) Any specific algorithm which is substantially the same as the above.
(3.1.4) A neurological network algorithm, such as back propagation, which is used to produce a parameter estimation model. This technique may have particular advantage for handling nonlinear property value estimation.
(4.) Property/composition estimation
(4.1) Property and/or composition data are estimated according to the following equation as explained above (equation 3.5):
y.sub.u =v.sub.u 'B
(5.) Calibration model validation
Calibration model validation refers to the process of determining whether or not the initial calibration model is correct. Examples of validating the calibration model would be crossvalidation or PRESS referred to hereinabove.
(5.1) Additional samples which are not used in the calibration model calculation (paragraph (3) above) are collected (test set) and measured.
(5.1.1) Spectra are measured for these samples either online or in a laboratory using the samples which have been collected.
(5.1.2) Property and/or composition data are obtained separately from the same standard analytical laboratory analyses referred to in paragraph (2.3) above.
(5.2) Property and/or composition data are estimated using equations (3.33.5) in the description of CPSA hereinabove and validated by comparison to the laboratory obtained property and/or composition data.
(6.) Online absorption spectrum measurement
(6.1) Any infrared spectrometer having measurement capabilities in the subset wavelength region determined in paragraph (1) above may be used.
(6.2) Sampling of the process steam is accomplished either by extracting a sample from the process stream using a slip stream or by inserting an optical probe into the process stream.
(6.2.1) Slip stream extraction is used to bring the sample to an absorption spectrum measuring cell. The spectrum of the sample in the cell is measured either by having positioned the cell directly in the light path of the spectrometer or indirectly by coupling the measurement cell to the spectrometer light path using fiber optic technology. Slip stream extraction with indirect fiber optic measurement technology is the preferred online measurement method. During the measurement, the sample may either be continuously flowing, in which case the spectrum obtained is a time averaged spectrum, or a valve may be used to stop the flow during the spectral measurement.
(6.2.2) Insertion sampling is accomplished by coupling the optical measurement portion of the spectrometer to the sample stream using fiber optic technology.
(7.) Process parameter (online property and/or composition) estimation.
(7.1) Spectra are measured online for process stream samples during process operation. Several choices of techniques for performing the spectral measurement are available as described in paragraph (6) immediately above.
(7.2) Parameter estimation is carried out using the equation in paragraph (4.1) above.
(8.) Calibration model updating
(8.1) Spot test samples for which the estimated parameter(s) are significantly different from the laboratory measured parameter(s) as determined in paragraphs (9) and (10) below are added to the calibration set and the calibration procedure is repeated starting with paragraph (3) to obtain an updated calibration model as set out in the equation in paragraph (3.1) above.
(8.2) Samples which are measured online are compared to the samples used in the calibration model using the methods described in paragraphs (9) and (10) below. Samples for which fall the tests in (9) or (10) are noted and aliquots are collected for laboratory analysis of the property/composition and verification of the spectrum. The online measured spectrum and the laboratory determined property/composition data for any such sample is added to the calibration data set and the calibration procedure is repeated starting with paragraph (3) to obtain updated calibration model.
(9.) Diagnostic and quality assurance measures
(9.1) Diagnostics are performed by calculating several parameters which measure the similarity of the test sample spectrum to the spectra of samples used in the calibration.
(9.1.1) Vectorbased distance and similarity measurements are used to validate the spectral measurements. These include, but are not limited to,
(9.1.1.1) Mahalanobis distances and/or Euclidean norms to determine the appropriateness of the calibration set for estimating the sample.
(9.1.1.2) Residual spectrum (the difference between the actual spectrum and the spectrum estimated from the eigenspectra used in the parameter estimation) to determine if unexpected components having significant absorbance are present.
(9.1.1.3) Values of the projection of the spectrum onto any individual eigenspectrum or combination of the eigenspectra to determine if the range of composition observed is included in the calibration set.
(9.1.1.4) Vector estimators of spectrometer system operational conditions (such as wavelength error, radiation source variability, and optical component degradation) which would affect the validity of the parameter estimation or the error associated with the parameter estimated.
(9.1.2) Experiencedbased diagnostics commonly obtained by control chart techniques, frequency distribution analysis, or any similar techniques which evaluate the current measurement (either spectral or parameter) in terms of the past experience available either from the calibration sample set or the past online sample measurements.
(10.) Process control, optimization and diagnosis
(10.1) Parameters are calculated in realtime which are diagnostic of process operation and which can be used for control and/or optimization of the process and/or diagnosis of unusual or unexpected process operation conditions.
(10.1.1) Examples of parameters which are based on the spectral measurement of a single process stream include chemical composition measurements (such as the concentration of individual chemical components as, for example, benzene, toluene, xylene, or the concentration of a class of compounds as, for example, paraffins); physical property measurements (such as density, index of refraction, hardness, viscosity, flash point, pour point, vapor pressure); performance property measurement (such as octane number, cetane number, combustibility); and perception (such as smell/odor, color).
(10.1.2) Parameters which are based on the spectral measurements of two or more streams sampled at different points in the process, thereby measuring the difference (delta) attributable to the process included between the sampling points along with any delayed effect of process between to the sampling points.
(10.1.3) Parameters which are based on one or more spectral measurements along with other process operational measurements (such as temperatures, pressures, flow rates) are used to calculate a multiparameter (multivariate) process model.
(10.2) Realtime parameters as described in paragraph (10.1) can be used for:
(10.2.1) Process operation monitoring.
(10.2.2) Process control either as part of a feedback or a feedforward control strategy.
(10.2.3) Process diagnosis and/or optimization by observing process response and trends.
It will be appreciated that many modifications to or additions in the inventive method and apparatus disclosed herein are possible without departing from the spirit and scope of the invention or defined by the appended claims. All such modifications and additions will be obvious to the skilled addressee, are contemplated by the disclosure of the present invention, and will not be further described herein.
The concepts and mathematical techniques disclosed in this patent specification are relatively complex. Although the invention has been fully described in the foregoing pages, there is set out hereinafter, under the heading "Appendix," a detailed description of the mathematical techniques used, starting from first principles. This Appendix forms part of the disclosure of the present specification.
1.0 Definition of Mathematical Terms, Symbols and Equations:
1.1 Scalars:
Variables representing scalar quanitites will be designated by lowercase, normal typeface letters, e.g. x or y. Scalar quanities are real numbers. Examples would include but not be limited to the concentration of a single component in a single sample, the value of a single property for a single sample, the absorbance at a single frequency for a single sample.
1.2 Vectors:
Vectors will be represented as lowercase, boldface letters. Vectors are a one dimensional array of scalar values. Two types of vectors will be distinguished. ##EQU1## The corresponding row vector,
v.sup.t =(v.sub.1 v.sub.2 v.sub.3 v.sub.4 v.sub.5 v.sub.6 v.sub.7 v.sub.8 v.sub.9)
is also of dimension 9. The superscript ^{t} indicates that the row vector is the transpose of the corresponding column vector.
Normal face, lowercase letters with subscripts will be used to refer to the individual elements of the vector, e.g v_{1}, v_{2}, v_{3}, etc., which are scalar quanities. The subscripts _{1},2,3, etc. indicate the position of the quantity in the vector. The expression v_{i} will be used to represent a general position i in the vector. Bold face, lowercase letters with subscripts will be used to represent one of a set of vectors, i.e. v_{1}, v_{2}, . . . v_{n} would be a set of n vectors.
1.2.1 Product of a Scalar Times a Vector: ##EQU2## If v^{t} =(v_{1} v_{2} v_{3} v_{4} v_{5} v_{6} v_{8} v_{9}) then av^{t} =(av_{1} av_{2} av_{3} av_{4} av_{5} av_{6} av_{7} av_{8} av_{9})
1.2.2 Dot Product of two vectors:
The dot product of a column and a row vector is a scalar quanity. The dot product is defined only if the vectors are of the same dimension. The dot product is the sum of the products of the individual elements of the two vectors, i.e. ##EQU3## Two vectors will be referred to as being orthogonal if their dot product is equal to zero. If a vector z is orthogonal to vectors u and v, it is orthogonal to any linear combination of u and v, i.e.
if z^{t} u=z^{t} v=0 and y=ua+vb for any scalar coefficients a and b then z^{t} y=z^{t} ua+z^{t} vb=0a+0b=0.
1.2.3 Length of a Vector:
The length of a vector is the square root of the dot product of the vector with its transpose, and will be designated by ∥v∥, i.e.
∥v∥=∥v.sup.t ∥=(v.sup.t v).sup.0.5
A vector will be referred to as normal, unit or normalized if the length of the vector is equal to one. A zero vector refers to a vector of zero length, i.e. one whose elements are all zero.
1.2.4 An Orthonormal Set of Vectors:
A set of vectors u_{1}, u_{2}, . . . u_{n}, form an orthonormal set if the dot products u_{i} ^{t} u_{i} =1 and u_{i} ^{t} u_{j} =0 for _{i} not equal to _{j} where _{i} and _{j} run from _{1} to _{n}. The individual vectors are normalized, and the dot products between any two different vectors is equal to zero.
1.2.5 Orthogonalization and Normalization of Vectors:
Given a vector u of length ∥u∥, it is possible to define a new, normal vector u' as u'=u/∥u∥. u' is obtained from u by dividing each of the elements by the length of the vector. This procedure is referred to as normalization of the vector.
If u is a normal vector, and v is a vector such that the dot product u^{t} v=d, the vector z=vdu is orthogonal to u, i.e. u^{t} z=u^{t} vdu^{t} u=dd=0. The procedure for calculating z is referred to as orthogonalization of the vectors.
Given a set of n vectors, x_{1}, x_{2}, . . . x_{n}, it is possible to derive a new set of n vectors, u_{1}, u_{2}, . . . u_{n}, that form an orthonormal set. This procedure which is referred to as GramSchmidt Orthogonalization involves the following steps:
[1] chosing an initial vector, x_{1}, and normalizing it to produce u_{1} ;
[2] choosing a second vector, x_{2}, orthogonalizing it to u_{1} normalizing the resultant vector to produce u_{2} ;
[3] choosing a third vector, x_{3}, orthogonalizing it to both u_{1} and u_{2}, and normalizing the result to produce u_{3} ;
[4] continuing the process of choosing vectors x_{i}, orthogonalizing them to the vectors u_{1}, u_{2}, . . . u_{i1} which were already calculated, and normaling the resultant vectors to produce new u_{i} until all vectors have been processed.
1.3 Matrices:
A matrix is a two dimensional array of scalar values. Matrices will be designated as uppercase, boldfaced letters. For example, ##EQU4## The general element of the matrix X, x_{ij}, indicates the value in the i^{th} row and the j^{th} column. The dimensions of a matrix, i.e. the numbers of rows and columns, will be indicated as italicized, lowercase letters, i.e. a matrix X of dimension f by n where in the above example f is 9, and n is 4. The individual columns of a matrix are column vectors, e.g. the vector x_{1} would be the vector containing the same elements as the first column of the matrix X. The individual rows of a matrix are row vectors, e.g. the vector x_{i} ^{t} is the i^{th} row of the matrix X.
1.3.1 Matrix Transpose:
The transpose of a matrix X is obtained by interchanging the rows and columns, and will be designated by X^{t}. For example, ##EQU5## If the matrix X is of dimension f by n, then the matrix X^{t} is of dimension n by f.
1.3.2 Matrix Multiplication:
If U is a matrix of dimension f by k, and V is a matrix of dimension k by n. then the product matrix Z=UV is a matrix of dimension f by n. The z_{ij} element of Z is calculated by taking the dot product of the i^{th} row vector u_{i} ^{t} from U and the j^{th} column vector v_{j} for V. ##EQU6## The product is only defined if the inner dimensions of the two matrices, e.g k in this example, are the same. The vectors formed by the columns of the product matrix Z are linear combinations of the column vectors of U, i.e. ##EQU7##
The product of more that two matrices, e.g. ABC, is obtained by taking the products of the matrices two at a time, i.e. the product of AB times C, or A times the product of BC.
The transpose of a product of two matrices is the product of the transposes in opposite order, i.e.
(AB).sup.t =B.sup.t A.sup.t.
1.3.3 Products of Vectors and Matrices:
The product of a row vector u^{t} of dimension k by a matrix V of dimension k by n is a column vector z of dimension n, i.e. ##EQU8##
then z=(z_{1} z_{2} z_{3})=u^{t} V=(u_{1} v_{11} +u_{2} v_{21} u_{1} v_{12} +u_{2} v_{22} u_{1} v_{13} +u_{2} v_{23})
The individual elements of z, z_{i}, is calculated as the dot product of u^{t} with the individual column vectors of V, v_{i}. The product of a matrix U of dimension f by k with a column vector v of dimension k, is a column vector z which is calculated by taking the dot product of the row vectors of U, u_{i} ^{t}, with v. ##EQU9## 1.3.4 Square, Diagonal and Unit and Unitary Matrices:
A square matrix is a matrix whose two dimensions are equal. The diagonal elements of a matrix are the elements x_{ii}. The offdiagonal elements of a matrix are the elements x_{ij} for which _{i} ≠_{j}. A diagonal matrix is a matrix for which the offdiagonal elements are zero. A unit matrix, wherein designated as I, is a square, diagonal matrix where all the diagonal elements are equal to one. The product of a matrix with an unit matrix is itself, i.e. AI=A. A unitary matrix is a square matrix U for which
U.sup.t U=UU.sup.t =I.
The column and row vectors of a unitary matrix are orthonormal sets of vectors.
1.3.5 Matrix Inverse:
For a square matrix A of dimension n by n, the inverse of A, A^{1}, is the matrix for which
AA.sup.1 =A.sup.1 A=I.
The inverse of A may not exist if the rank of A (see section 1.3.6) is less than n. Various computer algorithms are available for calculating the inverse of a square matrix (1).
A nonsquare matrix U of dimension f by n may have a right inverse,
UU.sup.1 =I if f<n
or it may have a left inverse,
U.sup.1 U=I if f>n.
If f<n, and U.sup.1 U=I then U.sup.1 =U.sup.t (UU.sup.t).sup.1.
If f>n, and UU.sup.1 =I then U.sup.1 =(U.sup.t U).sup.1 U.sup.t.
1.3.6 Eigenvalues and Eigenvectors:
For a square matrix A of dimension n by n, the equation
Av=vλ.sub.i
is called the eigenvalue equation for A. The eigenvalues of A, λ_{i}, are the scalars for which the equation possesses nonzero solutions in v. The corresponding nonzero vectors v_{i} are the eigenvectors of A. The eigenvectors of A form an orthonormal set of vectors. The eigenvalue equation may also be written as
AV=VΛ
or as
V.sup.t AV=Λ
where Λ is a diagonal matrix containing the eigenvalues λ_{i}, and V is the unitary matrix whose columns are the individual eigenvectors. The eigenvalues of A can be obtained by solving the equation
det(A=ΛI)=0
where det indicates the determinant of the matrix. Various computer algorithms are availabe for solving for the eigenvalues and eigenvectors of a square matrix (1).
The rank of a matrix A is equal to the number of nonzero eigenvalues of A. The rank of a square matrix A of dimension n by n is at most n. If the rank of A is less than n, then A is singular and cannot be inverted.
1.3.7 Singular Value Decomposition of a Matrix:
For a nonsquare matrix X of dimension f by n, the singular value decomposition of X is given by
X=UΣV.sup.t
If f>n, then U (dimension f by n), Σ (dimension n by n), and V (dimension n by n) matrices have the following properties:
U.sup.t U=I
Σ is a diagonal matrix
V.sup.t V=VV.sup.t =I
If f<n, then U (dimension f by f), Σ (dimension f by f), and V (dimension n by f) matrices have the following properties:
U.sup.t U=UU.sup.t =I
Σ is a diagonal matrix
V.sup.t V=I
The vectors which make up the columns of U are refered to as the left eigenvectors of X, and the vectors which make up the columns of V are referred to as the right eigenvectors of X. The diagonal element of Σ, σ_{ii}, are referred to as the singular values of X, and are in the order σ_{11} >σ_{22} >σ_{33} > . . . σ_{kk} where _{k} is the smaller of f or n.
The singular value decomposition of X^{t} is VΣU^{t}.
The singular value decomposition is related to the eigenvalue analysis in the following manner. The eigenvalue equation for the square matrix X^{t} X can be written as
X.sup.t XV=VΛ.
By substituting the singular value decomposition for X and X^{t} we obtain
X.sup.t XV=VΣ.sup.t U.sup.t UΣV.sup.t V=VΛ.
Using the fact that U^{t} U=I and V^{t} V=I and that Σ^{t} =Σ we obtain
X.sup.t XV=VΣ.sup.2
where Σ^{2} =ΣΣ. From this it can be seen that V is the matrix of eigenvectors for the X^{t} X matrix, and that
Σ.sup.2 =Λ
i.e. the singular values of X are the square roots of the eigenvalues of X^{t} X. Similarly, for the matrix XX^{t}
XX.sup.t U=UΣV.sup.t VΣU.sup.t U=UΣ.sup.2 =UΛ
U is the matrix of eigenvectors of the XX^{t} matrix. Note that if f≧n, X^{t} X and XX^{t} have at most n nonzero eigenvalues. Similarly if f≦n, X^{t} X and XX^{t} have at most f nonzero eigenvalues.
If X is of dimension f by n, then the inverse of X is given by VΣ^{1} U^{t}, i.e.
X.sup.1 X=VΣ.sup.1 U.sup.t X=VΣ.sup.1 U.sup.t UΣV.sup.t =VV.sup.t
XX.sup.1 =XVΣ.sup.1 U.sup.t =UΣV.sup.t VΣ.sup.1 U.sup.t =UU.sup.t
where Σ^{1} is a diagonal matrix containing 1/σ_{i} on the diagonal. Similarly, the inverse of X^{t} is given by UΣ^{1} V^{t}, i.e.
X.sup.t (X.sup.t).sup.1 =X.sup.t UΣ.sup.1 V.sup.t =VΣU.sup.t UΣ.sup.1 V.sup.t =VV.sup.t
(X.sup.t).sup.1 X.sup.t =UΣ.sup.1 V.sup.t X.sup.t =UΣ.sup.1 V.sup.t VΣU.sup.t =UU.sup.t
If X if square and of rank k=f=n, then
VV.sup.t =I and UU.sup.t =I
In this case, X and X^{t} both have right and left inverses.
If f>n, UU^{t} is not an identity matrix, and the expressions for the right inverse of X and for the left inverse of X^{t} are only approximate. Similarly, if f<n, VV^{t} is not an identity matrix, and the expressions given for the left inverse of X and for the right inverse of X^{t} are only approximate.
If Σ contains k nonzero singular values, then
X=UΣV.sup.t =U'Σ'V'.sup.t
where Σ' is the k by k matrix containing the nonzero singular values, and U' and V' are the f by k and n by k matrices obtained from U and V by eliminating the columns which correspond to the singular values that are zero. Similarly,
X.sup.t =V'Σ'U'.sup.t
X.sup.1 =V'Σ'.sup.1 U'.sup.t
(X.sup.t).sup.1 =U'Σ'.sup.1 V'.sup.t
Similarly, if the k+1 through n singular values of X are dose to zero, these singular values and the corresponding columns of U and V may be eliminated to provide an approximation to the X matrix X≅U'Σ'V'^{t}. Approximations for X^{t} and for the inverses of X and X^{t} are obtained similarly.
In developing regression models, it is generally assumed that small singular values correspond to random errors in the X data, and that removal of these values improves the stability and robustness of the regression models. To simplify the notation in the discussion below, the ' will be dropped, and the symbols U, Σ, and V will represent the f by k, k by k, and n by k matrices obtained after eliminating the singular values that are at, or close to zero.
The matrix U is can be calculated as
U=XVΣ.sup.1
which implies that the column vectors in U are linear combinations of the column vectors in X.
It should be noted that in some versions of Principal Component Analysis, the X matrix is decomposed into only two matrices
X=LV
The L matrix is the product of the U and Σ matrices, such that
L.sup.t L=Λ.
1.4.1 Multiple Linear Regression, Least Squares Regression:
If y is a column vector of length n, and X is a matrix of dimension f by n, then the matrix equation
y=X.sup.t p+e
is a linear regression equation relating the dependent variable y to the independent variables X. p is a column vector of length f containing the regression coefficients, and e is the vector containing the residuals for the regression model. A least square regression is one that minimizes e^{t} e, i.e. the sum of the square of the residuals.
1.4.2 Solution for f≦n:
If X if f by n, y is of dimension n, and f≦n, then the solution to the regression equation that minimizes ∥e∥ is obtained by multiplying both sides of the equation in 1.4.1 by the left inverse of X^{t}, i.e.
y=X.sup.t p+e→(XX.sup.t).sup.1 Xy=(XX.sup.t).sup.1 XX.sup.t p+(XX.sup.t).sup.1 Xe=p+(XX.sup.t).sup.1 Xe
The quanity (XX^{t})^{1} Xy is the least square estimate of p, and is designated p.
p=(XX.sup.t).sup.1 Xy
The regression vector p is a linear combination of the column vectors in X. The estimates of y, i.e. y, are given by
y=X.sup.t p=X.sup.t (XX.sup.t).sup.1 Xy
The residuals for the model are
e=yy
The Standard Error of Estimate (SEE) for the model is ##EQU10##
The regression model may be used to estimate the scalar value y_{u} corresponding to a new vector of independent variables x_{u}
y.sub.u =x.sub.u.sup.t p=x.sub.u.sup.t (XX.sup.t).sup.1 Xy
Based on the model, the probability is a that the true value y_{u} lies in range ##EQU11## where t is the student's t distribution value for probability a and nf degrees of freedom, and d^{2} is given by
d.sup.2 =x.sub.u.sup.t (XX.sup.t).sup.1 x.sub.u
An equivalent solution for this case can be obtained using the singular value decomposition of X^{t}, i.e.
UΣ.sup.1 V.sup.t y=p+UΣ.sup.1 V.sup.t e
The solution can be seen to be equivalent since
(XX.sup.t).sup.1 X=(UΣV.sup.t VΣU.sup.t).sup.1 UΣV.sup.t =(UΣ.sup.2 U.sup.t)UΣV.sup.t =
UΣ.sup.1 V.sup.t.
1.4.3 Solutions for f>n:
If f>n, the matrix X^{t} does not possess a left inverse since XX^{t} is a f by f matrix of rank at most n. In this case, an approximate solution to the regression equations can be derived using methods such as Principal Components Regression or Partial Least Squares.
1.4.3.1 Principal Components Regression:
Principal Components Regression (PCR) uses the singular value decomposition of the X^{t} matrix to approximate a solution to the regression equation, i.e. the equation
y=X.sup.t p+e
is solved as
UΣ.sup.1 V.sup.t y=UΣ.sup.1 V.sup.t X.sup.t p+UΣ.sup.1 V.sup.t e=p+UΣ.sup.1 V.sup.t e
where U is of dimension f by k, Σ and Σ^{1} are k by k, and V is n by k. The estimate of the regression vector is
p=UΣ.sup.1 V.sup.t y=XVΣ.sup.2 V.sup.t y
p is a linear combination of the column vectors of both U and X. The estimate of y is
y=X.sup.t p=X.sup.t UΣ.sup.1 V.sup.t y=VΣU.sup.t UΣ.sup.1 V.sup.t y=VV.sup.t y.
The residuals from the model are
e=yy=yVV.sup.t y
The Standard Error of Estimate is ##EQU12## For a new vector of independent variables x_{u}, the scalar estimate y_{u} is obtained as
y.sub.u =x.sub.u.sup.t p=x.sub.u.sup.t UΣ.sup.1 V.sup.t y=v.sub.u.sup.t V.sup.t y where v.sub.u.sup.t =x.sub.u.sup.t UΣ.sup.1
Based on the model, the probability is a that the true value y_{u} lies in range
y.sub.u t·SEE·√1+d.sup.2 ≦y.sub.u ≦y.sub.u +t·SEE·√1+d.sup.2
where t is the student's t distribution value for probability a and nk degrees of freedom, and d^{2} is given by
d.sup.2 =x.sub.u.sup.t (XX.sup.t).sup.1 x.sub.u =v.sub.u.sup.t ΣU.sup.t (UΣV.sup.t VΣU.sup.t).sup.1 UΣv.sub.u =
v.sub.u.sup.t ΣU.sup.t (UΛU.sup.t).sup.1 UΣv.sub.u
d.sup.2 =v.sub.u.sup.t ΣU.sup.t UΛ.sup.1 U.sup.t UΣv.sub.u =v.sub.u.sup.t ΣΛ.sup.1 Σv.sub.u =v.sub.u.sup.t v.sub.u
decomposing the X matrix into UΣV^{t}, and regressing y versus V, i.e.
X=UΣV.sup.t and y=Vb+e
where b is a vector of regression coefficients. The solution is
V.sup.t y=V.sup.t Vb+V.sup.t e=b+V.sup.t e.
The estimate of the coefficients is
b=(V.sup.t V).sup.1 V.sup.t y=V.sup.t y
The estimate y is
y=Vb=VV.sup.t y
which is the same as obtained above. For a new vector of independent variables x_{u}, v_{u} ^{t} is calculated as
v.sub.u.sup.t =x.sub.u.sup.t UΣ.sup.1
and the estimation of the scalar y_{u} is
y.sub.u =v.sub.u.sup.t b=v.sub.u.sup.t V.sup.t y=x.sub.u.sup.t UΣ.sup.1 V.sup.t y=x.sub.u.sup.t p
which is again the same as that obtained above. The residuals, SEE and d^{2} are also the same as that shown for the first PCR method.
1.4.3.2 Stepwise and PRESS Regression:
An advantage to using the alternative method for calculating a Principal Components Regression is that the coefficients b_{i} which are small and statistically equivalent to zero may be detected and removed from the model (i.e. set to zero) prior to estimation of y, thereby reducing the variation in the estimate. The coefficients which can be set to zero may be determined using, for example, a stepwise regression algorithm (2).
In the first step of a possible stepwise regression calculation, a series of equations of the form
y=v.sub.i b.sub.i +e.sub.i
where v_{i} are the column vectors of the V matrix. The scalar coefficient b_{i}
b.sub.i =v.sub.i.sup.t y
that produces the minimal sum of squares of residuals, e_{i} ^{t} e_{1}, is initially chosen. If b_{k} is the coefficient that produced the minimum sum of square of resiudal, the the 1st and k^{th} columns of V are interchanged. The one variable model at the end of the first step is then
y=v.sub.1 b.sub.1 +e.sub.1
In the second step, a series of equations of the form
y=v.sub.1 b.sub.1 +v.sub.i b.sub.i +e.sub.i
are then constructed for each column vector v_{i} where _{i} is not equal to 1. The equations are solved for the various combinations of b_{1} and b_{i}, and the residual sum of squares e_{i} ^{t} e_{i} is calculated for all the combinations. If b_{k} is the coeffiecient that produced the minimum residual sum of squares, then e_{k} ^{t} e_{k} is compared to e_{1} ^{t} e_{1} via an Ftest (3). If the inclusion of b_{k} produces a statistically significant improvement in the sum of the squares of the residuals, the 2^{nd} and k^{th} columns of V are interchanged. At the end of the second step, two variable model then becomes
y=v.sub.1 b.sub.1 +v.sub.2 b.sub.2 +e.sub.2
If no coefficient b_{k} produces a statistically significant decrease in the sum of squares of the residuals, then the one variable model previously determined is the final model, and the stepwise procedure is completed.
If, at the end of the second step, a two variable model is found, a series of models of the form
y=v.sub.1 b.sub.1 +v.sub.2 b.sub.2 +v.sub.i b.sub.i +e.sub.i
are tested for all i not equal to 1 or 2. If, in this third step, a coefficient b_{k} is found such that e_{k} ^{t} e_{k} is less than e_{2} ^{t} e_{2} by a statistically signficant amount, then the 3^{rd} and k^{th} columns of V are interchanged, and the three variable model becomes
y=v.sub.1 b.sub.1 +v.sub.2 b.sub.2 +v.sub.3 b.sub.3 +e.sub.3
At this point, models of the form
y=v.sub.1 b.sub.1 +v.sub.3 b.sub.3 +e.sub.13
y=v.sub.2 b.sub.2 +v.sub.3 b.sub.3 +e.sub.23
are formed to test for deletion of variables from the model. If e_{13} ^{t} e_{13} is less than e_{3} ^{t} e_{e} by a statistically significant amount, then the 2^{nd} and 3^{rd} column vectors of V are interchanged and a new two variable model is obtained. Similarly, if e_{23} ^{t} e_{23} is less than e_{3} ^{t} e_{3} then the 1^{st} and 3^{rd} columns of V are interchanged to produce a new two variable model. At this point procedure then repeats the third step of testing possible three variable models. If a new three variable model which produces a statistically significant reduction in the sum of the squares of the resiudals is found, tests are again conducted on deleting one of the three variables. If no new three variable model is found, the two variable model is the final model, and the stepwise procedure is finished. If a three variable model is generated from which no variables can be deleted, the third step is completed, and the procedure continues with a fourth step in which four variable models are tested. If a four variable model is generated from which no variables can be deleted, a fifth step is conducted to test five variable models. This stepwise procedure continues until no varibles can be added or deleted to produce a statistically significant change in the residuals, or until all the k variables are included in the model, whichever comes first. Algorithms for efficiently conducting this stepwise procedure have been published(2).
An alternative to the stepwise regression calculation is a stepwise Predictive Residual Sum of Squares (PRESS) algorithm. This algorithm can also be used to determine which coefficients to include in the model. For PRESS, a model is built setting the i^{th} dependent variable y_{i}, and the corresponding row vector, v_{i} ^{t}, to zero.
The model has the form
(y=y.sub.i)=(V=v.sub.i.sup.t)b(i)+e
where (yy_{i}) is the vector obtained form y by setting y_{i} to zero, and (Vv_{i} ^{t}) is the matrix obtained from V by setting the i^{th} row to zero. For example, ##EQU13## The estimate of the coefficient vector for this model is
b(i)=[(Vv.sub.i.sup.t).sup.t (Vv.sub.i.sup.t)].sup.1 (Vv.sub.i.sup.t)(yy.sub.i)
b(i)=[(V.sup.t V).sup.1 +(V.sup.t V).sup.1 v.sub.i (1v.sub.i.sup.t (V.sup.t V).sup.1
v.sub.i).sup.1 v.sub.i.sup.t (V.sup.t V).sup.1 ](Vv.sub.i.sup.t)(yy.sub.i)
The estimate of y_{i} based on this model, y(i), is then given by
y(i)=v.sub.i.sup.t b=v.sub.i.sup.t [(V.sup.t V).sup.1 +(V.sup.t V).sup.1 v.sub.i (1
v.sub.i.sup.t (V.sup.t V).sup.1 v.sub.i.sup.t (V.sup.t V).sup.1 ](Vv.sub.i.sup.t)(yy.sub.i)
The estimate may be simplified to yield
y(i)=(1q.sub.i).sup.1 y.sub.i q.sub.i (1q.sub.i).sup.1 y.sub.i
where
q.sub.i =v.sub.i.sup.t (V.sup.t V).sup.1 v.sub.i
and y_{i} is the estimate based on a model including all the variables
y.sub.i =v.sub.i.sup.t (V.sup.t V).sup.1 V.sup.t y.
The residual for the i^{th} dependent variable is then
y.sub.i y(i)=(1q.sub.i).sup.1 (y.sub.i y.sub.i)
The PRESS statistic is the sum of these residuals over all i=1 to n.
The first step in the PRESS algorithm, n by 1 matrices V_{j} are formed from the k columns of V. The n values of q_{i} are calculated as
q.sub.i =v.sub.i.sup.t (V.sub.j.sup.t V.sub.j).sup.1 v.sub.i
and the residuals and PRESS values are calculated using the equations above. If V_{k} is the matrix which yields the minimum value of PRESS, then the 1^{st} and the k^{th} columns of V are interchanged.
In the second step, n by 2 matrices V_{j} are formed using the 1st and j^{th} (j≠1) columns of V. The q_{i}, residuals and PRESS values are again calculated. If no matrix V_{k} yields a PRESS value less than that calculated in step 1, then the one variable model from the first step is the final model and the procedure is finished. If a V_{k} is found that yields a smaller PRESS value, then the 2^{nd} and k^{th} columns of V are interchanged, and a third step is conducted.
The third step involves forming n by 3 matrices V_{j} using the 1^{st}, 2^{nd} and j^{th} (j≠1,2) columns of V. q_{i}, residuals, and PRESS values are again calculated. If no matrix V_{k} yields a PRESS value lower than that calculated in step 2, then the two variable is the final model, and the procedure is finished. Otherwise, the 3^{rd} and k^{th} columns of V are interchanged and the procedure continues.
The m^{th} step in the n by m matrices V_{j} using the 1^{st}, 2^{nd}, . . . (m1) and j^{th} (j≠1,2, . . . , m1) columns of V, calculating q_{i}, the residuals and PRESS. If no matrices yield PRESS values less than that from the previous step, the model having m1 variables is the final model, and the procedure is finished. If a matrix V_{k} yields a PRESS value that is less than that calculated in the previous step, the m^{th} and k^{th} columns of V are interchanged, and another step is initiated.
A computer algorithm for efficiently doing a stepwise PRESS selection of variables was given by Allen (4).
Since the models calculated using these stepwise procedures may contain k'<k nonzero coefficients, the degrees of freedom used in the calculation of SEE becomes nk', i.e. ##EQU14## where e are the residuals for the final model. Based on the model, the probability is a that the true value y_{u} lies in range ##EQU15## where t' is the student's t distribution value for probability a and nk' degrees of freedom, and d^{2} is given by
d.sup.2 =v'.sub.u tv'.sub.u
where v' is a vector of dimension k' obtained from v by deleting the elements whose corresponding coeffiecients were set to zero during the regression.
1.4.3.2 Partial Least Squares Regression:
Partial Least Squares (PLS) Regression is similar to Principal Components Regression in that the f by n matrix X is decomposed into the product of orthogonal matrices, i.e.
X=LS.sup.t
where S is a n by k matrix, L is a f by k matrix. The matrices L and S have the following properties:
L.sup.t L=I
S.sup.t S=D
where D is a diaginal matrix. The dependent variable is regressed against one of these matrices, i.e.
y=Sb+e
PLS differs from PCR in the way that the matrices are determined. A step in the PLS regression proceeds by the following 10 calculations:
[1] A vector u_{1} of dimension n is set equal to y, u_{1} =y
[2] A weighting vector of dimension f is then calculated as
w'.sub.1 =Xu.sub.1
[3] The weighting vector is normalized
w.sub.1 =w'.sub.1 /∥w'.sub.1 ∥
[4] A scores vector s'_{1} of dimension n is calculated as
s'.sub.1 =X.sup.t w.sub.1
[5] A loadings vector of dimension f is calculated as
l'.sub.1 =Xs'.sub.1 /∥s'.sub.1 ∥
[6] The loadings vector is normalized as
l.sub.1 =l'.sub.1 /∥l.sub.1 ∥
[7] The scores vector is scaled as
s.sub.1 =s'.sub.1 ∥l.sub.1 ∥
[8] The residuals for the X matrix are calculated as
E.sub.1 =Xls.sup.t
[9] A estimate of the coefficient b_{1} is calculated from the relationship
y=s.sub.1 b.sub.1 +e
b.sub.1 =(s.sub.1.sup.t s.sub.1).sup.1 s.sub.1.sup.t y
[10] The residuals from the y vector are calculated as
e.sub.1 =ys.sub.1 b.sub.1.
At this point, the a new vector u_{2} is defined as the y residuals from the first step
u.sub.2 =e.sub.1
and X is replaced by the X residuals from the first step, E_{1}. The vectors w_{2}, s_{2}, and l_{2} and the coefficient b_{2} are calculated in the same manner. The i^{th} step through the algorithm involves equating u_{i} to the residual vector from the previous step, e_{i1}, substituting the residual matrix E_{i1} for the X matrix, and calculating w_{i}, s_{i}, and l_{i}. The procedure continues until the y residuals are sufficiently small, i.e. until e_{1} ^{t} e_{i} reaches the desired level of error for the model. The matrices L and S are formed from the column vectors l_{i} and s_{i}, and the coefficient vector b from the scalars b_{i}.
The residuals from the model are
e=yy=ySb=ySS.sup.t y
The Standard Error of Estimate is ##EQU16## For a new vector of independent variables x_{u}, the scalar estimate y_{u} is obtained as
x.sub.u =Ls.sub.u →S.sub.u =(L.sup.t L).sup.1 L.sup.t x.sub.u =L.sup.t x.sub.u
y.sub.u =s.sub.u.sup.t b=s.sub.u.sup.t (S.sup.t S).sup.1 S.sup.t y
Based on the model, the probability is a that the true value y_{u} lies in range ##EQU17## where t is the student's t distribution value for probability a and nk degrees of freedom, and d^{2} is given by
d.sup.2 =s.sub.u.sup.t (SS.sup.t).sup.1 s.sub.u
The PLS model can also be expressed in the form
y=X.sup.t p
where the regression vector p is
p=Lb=L(S.sup.t S).sup.1 S.sup.t y=XS(S.sup.t S).sup.2 S.sup.t y
The regression vector is a linear combination of the columns of X.
1.4.3.3 Cross Validation:
The number of Principal Components or latent variables (i.e. the k dimension of the matrices U, Σ and V for PCR and of L and S for PLS) which are to be used in PCR or PLS models can be tested using a cross validation procedure. The X matrix is split into two matrices X_{1} and X_{2} which are of dimension f by nj and f by j, respectively, where 1≦j≦n/2. y is similarly split into two corresponding vectors y_{1} (dimension nj) and y_{2} (dimension j), e.g ##EQU18## would be three possible divisions of the matrices. Various models are built using X_{1} and y_{1}, i.e. for PCR.
b= V.sub.1.sup.t y where X.sub.1 =U.sub.1 Σ.sub.1 V.sub.1.sup.t
and for PLS
b=S.sub.1.sup.t y.sub.1 where X.sub.1 =L.sub.1 S.sub.1.sup.t
The number of columns retained in the V_{1} and S_{1} matrices in the models is varied over the range to be tested. The models are used to estimate the values for y_{2}. For PCR, a matrix V_{2} is calculated from
X.sub.2 =U.sub.1 Σ.sub.1 V.sub.2.sup.t, V.sub.2 =X.sub.2.sup.t U.sub.1 Σ.sub.1
and the estimates are made as
y.sub.2 =V.sub.2 b
For P1S, a matrix S_{2} is calculated from
X.sub.2 =L.sub.1 S.sub.2.sup.t, S.sub.2 =X.sub.2.sup.t L.sub.1
and the estimates are made as
y.sub.2 =S.sub.2 b
The prediction residuals are calculated as
e.sub.p =y.sub.2 y.sub.2
The Predictive Residual Sum of Squares for this step is
PRESS=e.sub.p.sup.t e.sub.p
The procedure is repeated with different splits of X and y until each column of X and corresponding element in y have been in X_{2} and y_{2} once, e.g. using all three of the divisions of X and y show above. The total PRESS is the sum of the PRESS values calculated for each division. The optimum number of vectors to use in the model is that which minimizes the total PRESS.
1.4.3.4 Weighted Regression:
In applying linear least squares regression techniques to solve the regression equation
y=X.sup.t p+e
it is assumed that the elements of the residual vector are independent and normally distributed. In some applications, this assumption may not hold. For instance, if the elements y_{i} of y are measured quantities, the error associated with the measurement of y_{i} may depend on the magnitude of y_{i}. Alternatively, some of the individual y_{i} values in y may represent the average of several measurements. In these cases, a weighted linear least square regression is used to develop the regression model.
If the quanity y is measured n times, then the average value y is
y=(y.sub.1 +y.sub.2 + . . . y.sub.n)/n
The quantity s^{2},
s.sup.2 =[(y.sub.1 y).sup.2 +(y.sub.2 y).sup.2 + . . . +(y.sub.n y).sup.2 ]/(n1)
is an unbiased estimate of the variance in the measurement of y. s is the standard deviation in the measurement. As n approached ∞, s^{2} approaches the true variance.
If each of the y_{i} elements in vector y was measured once, and the measurement of y_{i} is known to have variance s_{i} ^{2}, then the regression equation relating y (dimension n) to V (dimension n by k) that provides for normally distributed residuals is given by
Wy=WVb+e
where W is a n by n diagonal matrix whose elements are
w.sub.ii =1/s.sub.i
The coefficients are estimated as
b=(V.sup.t W.sup.2 V).sup.1 VW.sup.2 y
The estimate y is
y=Vb=V(V.sup.t W.sup.2 V).sup.1 VW.sup.2 y
The residuals are
e=yy=yV(V.sup.t W.sup.2 V).sup.1 VW.sup.2 y=(IV(V.sup.t W.sup.2 V).sup.1 VW.sup.2)y
If the variance in the measurement of all individual y_{i} is the same, and y is a vector of dimension n whose individual elements y_{i} represent the average of n_{i} measurements, then the weighting matrix for weighted least squares regression of y_{i} against V is given by ##EQU19## individual measurement of element y_{i} is s_{i} ^{2}, then the weighting matrix for weighted least squares is defined by ##EQU20## 2.0 Definitions Relating to Quantitative Spectral Measurements:
If i_{o} (f) is the intensity of electromagnetic radiation at frequency (wavelength) f which is incident on a sample, and i(f) is the intensity at the same frequency (wavelength) that is transmitted through the sample, then t(f), the transmission of the sample at frequency f is given by
t(f)=i(f)/i.sub.o (f)
The absorbance of the sample at frequency f is
a(f)=log.sub.10 [t(f)]
The absorption spectrum of a sample will be repesented by a vector x of dimension f whose individual elements are the absorbance of the sample at each of f frequencies (wavelengths). For convenience, the elements in x are generally ordered in terms of increasing or decreasing frequency.
Quantitative spectral analysis is based on Beer's Law (4). For a mixture of chemical components, Beer's Law can be represented by the matrix equation
x=Act or x/t=Ac
where A is an f by c matrix whose columns are the absorption spectra a_{i} of the c individual pure components (measured at unit pathlength), where c is a vector whose elements are the concentrations of the components in the mixture, and where t is a scalar representing the pathlength (thickness) of the sample.
Beer's Law implies a linear relationship between the measured absorption spectrum, and the concentrations of mixture components. If the spectra of the components is known, the equation for Beer's Law can be solved for the component concentrations, i.e.
(A.sup.t A).sup.1 A.sup.t x/t=c
If y is a property of the sample that depends linearly on the concentrations of the sample components, then
y=r.sup.t c=r.sup.t (A.sup.t A).sup.1 A.sup.t x/t
where r is the vector of coefficients that define the linear dependence. This expression indicates that there is a linear relationship between the property and the absorption spectrum of the sample.
In general, the vectors in the matrix A are not known, and the relationship between component concentrations and/or properties and the sample spectrum is represented by
y=p.sup.t x=x.sup.t p
where p is vector of dimension f that relates the spectrum to the desired concentration or property. The objective of quantitative spectral analysis is to define the vector p.
2.1 Calibration:
Calibration is the process of creating a regression model to relate the component concentration and/or property data to the absorption spectra. The calibration of a spectral model will generally consist of the following steps:
[1] n representative samples of the materials for which the model is to be developed will be collected, their absorption spectra (x_{i}, i=1,n) will be obtained, and the component concentrations and/or property data (y_{i}, i=1,n) will be measured. The samples that are used in the calibration are referred to as references, and the spectra of these samples are referred to as reference spectra.
2] A regression model of the form
y=X.sup.t p+e
will be constructed for each component and/or property which is to be modeled, where y is the vector of dimension n containing the individual concentration or property values for the n reference samples, and X if a f by n matrix whose columns are the spectra of the n reference samples measured at f frequencies (wavelengths). The mathematical method that is used in calculating the regression model will depend on the number of reference samples relative to the number of frequencies (wavelengths).
2.1.1 Calibration Models for f≦n:
If the number of reference samples, n, used in the calibration exceeds the number of frequencies per spectrum, f, then the calibration model can be obtained via linear least square regression (section 1.4.2).
y=X.sup.t p+e→(XX.sup.t).sup.1 Xy=(XX.sup.t).sup.1 XX.sup.t p+
(XX.sup.t).sup.1 Xe=p+(XX.sup.t).sup.1 Xe
The regression model could also be obtained using PCR or PLS, but in this case, these methods aren't required since the (XX^{t}) matrix can be inverted.
If the number of frequencies (wavelengths) per spectrum, f, exceeds the number of reference samples, n, then a subset of f' frequencies can be chosen such that f'≦n, and linear least square regression can be used to develop the calibration model
y=X'.sup.t p+e→(X'X'.sup.t).sup.1 X'y=p+(X'X'.sup.t).sup.1 X'e
where X' is the f' by n matrix obtained from X by deleting the ff' rows. The choice of the frequencies (wavelengths) that are to be used in developing the regression model can be based on a prior knowledge of the component absorptions present in the spectra (i.e. spectroscopy expertise), or they can be chosen using a statistical wavelength (frequency) selection algortihm (5).
If the concentration and/or property data in y does not have uniform measurement variance, and/or represents the average of differing numbers of experimental measurements, then the calibration model can be developed using a weighted linear least squares regression (section 1.4.3.4).
2.1.2 Calibration Models for f≧n:
If the number of frequencies per spectrum, f, exceeds the number of reference spectra used in the calibration, n, then the calibration model can be built using either Principal Components Regression (section 1.4.3.1)
y=X.sup.t p+e→UΣ.sup.1 V.sup.t y=UΣ.sup.1 V.sup.t X.sup.t p+UΣ.sup.1 V.sup.t e=p+UΣ.sup.1 V.sup.t e
or via Partial Least Square Regression (section 1.4.3.2)
y=X.sup.t p+e→X=LS.sup.t, b=(S.sup.t S).sup.1 S.sup.t y→p=Lb
For Principal Components Regression, the model may be developed using a two step calculation
X=UΣV.sup.t, y=Vb→p=UΣ.sup.1 V.sup.t y=XVΣ.sup.2 V.sup.t y=XVΣ.sup.2 b
where the nonzero coefficients, b_{i}, which are used in the model may be determined using stepwise or PRESS based regressions (section 1.4.3.2). If the individual component concentrations and/or properties in y are of unequal measurement variance, or represent the average of unequal numbers of experimental determinations, then the stepwise or PRESS based regression may be conducted using the matrices from the weighted least squares regression (section 1.4.3.4).
2.2 Analysis of Unknowns:
All of the above mentioned calibration models can be expressed in the form
y=X.sup.t p
The models differ in the means used to estimate the regression vector p. If x_{u} represents the spectrum of a new, unknown material, then the component concentrations and/or properties estimated using the model are
y.sub.u =x.sub.u.sup.t p
The use of the calibration model to estimate component concentrations and/or properties is referred to as analysis.
3.0 Constrained Spectral Analysis:
The spectrum measured for a sample, x, generally represents the superposition of the absorptions due to the sample components, x_{c}, and signals arising from the process involved in making the spectral measurement, m.
x=x.sub.c +m
The signal arising from the measurement process may be the sum of various individual signals, m_{i}
x=x.sub.c +m.sub.1 +m.sub.2 + . . . +m.sub.s
Examples of these measurement process signals would include, but not be limited to,
[1] reflection and/or scattering loses from the windows of the cell used to contain the sample, or in the case of freestanding samples, from the front surface of the sample;
[2] variations in the spectral baseline arising from variations in the intensity (temperature) of the spectrometer sourse and/or the responsivity of the spectrometer detector;
[3] absorption signals arising from gas phase components (e.g. water vapor and carbon dioxide) that are present in the light path through the spectrometer, but are not sample components. Since the regression vectors calculated from the calibration are in all cases linear combinations of the original spectra, i.e.
for linear least squares, p=X.sup.t (XX.sup.t).sup.1 Xy
for PCR, p=UΣ.sup.1 V.sup.t y=XVΣ.sup.2 V.sup.t y
and for PLS, p=Lb=XS(S.sup.t S).sup.1 b
they will include contribution from the measurement process signals, m. As such, the estimation of component concentrations and/or property values based on
y.sub.u =x.sub.u.sup.t p
will vary not only with changes in the absorptions due to real sample components, but also with variations in the signals arising from the measurement process. The calibration models are thus not robust relative to variations in the signals arising from the measurement process.
The estimation of component concentrations and/or properties can be expressed as
y.sub.u =x.sub.u.sup.t p=x.sub.c.sup.t p+ m.sup.t p
The estimation will be insensitive to variations in the signals arising from the measurement process if, and only if, the second term, m^{t} p, is equal to zero, i.e. if the regression vector is orthogonal to the signals arising from the measurement process.
2.1 Constrained Linear Least Squares:
If X is an f by n matrix containing the spectra of the reference samples, X can be represented as the sum
X=X.sub.c +MN.sup.t
where M is a f by s matrix containing the individual signals arising from the measurement process as columns, and N is a n by s matrix containing the scalar elements n_{ij} which correspond to the level of the _{j} ^{th} measurement process signal for the i^{th} reference sample. If f<n (or if f'<n frequencies are chosen), then the solution to the calibration model
y=X.sup.t p+e
which yields a regression vector p which is orthogonal to the measurement process signals M is derived via a constrained least squares regression, where the constrain equation is
M.sup.t p=0
where 0 is a zero vector. X' is defined as the f by n+s matrix obtained by adding the s columns of M to X, and y' is the vector of dimension n+s obtained by adding a corresponding s zero elements to y. The constrained least square solution is then
p=(X'X'.sup.t).sup.1 X'y'(X'X'.sup.t).sup.1 M(M.sup.t M).sup.1 M.sup.t X'y'
p=(X'X'.sup.t).sup.1 [IM(M.sup.t M).sup.1 M.sup.t ]X'y'
p=[IM(M.sup.t M).sup.1 M.sup.t ](X'X'.sup.t).sup.1 X'y'
The estimate y is then
y=X.sup.t p=X.sub.c.sup.t p+NM.sup.t p
y=X.sub.c.sup.t [IM(M.sup.t M).sup.1 M.sup.t ](X'X'.sup.t).sup.1 X'y'+
NM.sup.t [IM(M.sup.t M).sup.1 M.sup.t ](X'X'.sup.t).sup.1 X'y'
y=X.sub.c.sup.t [IM(M.sup.t M).sup.1 M.sup.t ](X'X'.sup.t).sup.1 X'y'+NM.sup.t (X'X'.sup.t).sup.1 X'y'
NM.sup.t M(M.sup.t M).sup.1 M.sup.t (X'X'.sup.t).sup.1 X'y'
y= X.sub.c.sup.t [IM(M.sup.t M).sup.1 M.sup.t ](X'X'.sup.t).sup.1 X'y'+NM.sup.t (X'X'.sup.t).sup.1 X'y'NM.sup.t (X'X'.sup.t).sup.1 X'y'
y=X.sub.c.sup.t [IM(M.sup.t M).sup.1 M.sup.t ](X'X'.sup.t).sup.1 X'y'
The estimate y is constrained to be independent of the presence of the measurement process signals. Similarly, it x_{u} is a spectrum of a new unknown sample, such that
x.sub.u =x.sub.c +Mn
then the estimation of y_{u} is
y.sub.u =x.sub.u.sup.t p=(x.sub.c +Mn).sup.t p
y.sub.u =x.sub.c.sup.t [IM(M.sup.t M).sup.1 M.sup.t ](X'X'.sup.t).sup.1 X'y'+
nM.sup.t [IM(M.sup.t M).sup.1 M.sup.t ](X'X'.sup.t).sup.1 X'y'
y.sub.u =x.sub.c.sup.t [IM(M.sup.t M).sup.1 M.sup.t ](X'X'.sup.t).sup.1 X'y'
i.e. the estimate is independent of the presence of the measurement process signals in the spectrum x_{u}. The residuals from the model are
e=yy=yX.sup.t [IM(M.sup.t M).sup.1 M.sup.t ](X'X'.sup.t).sup.1 X'y'
The Standard Error of Estimate is ##EQU21## Based on the model, the probability is a that the true value y_{u} lies in range ##EQU22## where t is the student's t distribution value for probability and nf degrees of freedom, and d^{2} is given by
d.sup.2 =x.sub.u.sup.t (XX.sup.t).sup.1 M[M.sup.t (M.sup.t M).sup.1 M].sup.1 M.sup.t x.sub.u
2.2 Constrained Principal Components Regression:
If f>n, then the constrained linear least squares solution in section 2.1 cannot be used since the matrix XX^{t} is at most of rank n, is singular, and cannot be directly inverted. It is possible to develop a constrained principal components model by following a similar methodology. If X' again represents the f by n+s matrix whose first s columns are measurement process signals, M, and whose last n columns are the reference spectra, then the singular value decomposition of X' is
X'=U'Σ'V'.sup.t
where U' is an f by k matrix, Σ' is the k by k matrix of singular values and V' is an n+s by k matrix. The s by k matrix V.sub. corresponds to the first s rows of V', i.e. to the right eigenvectors associated with the signals due to the measurement process. The measurement process signals can be expressed as
M=U'Σ'V.sub.m.sup.t
If y' is the vector containing the s zeros and the n concentration or property values, then the regression
y'=V'b+e
subject to the constraint
V.sub. b=0
yields estimated coefficients of the form
b=[IV.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m ](V'.sup.t V').sup.1 V'.sup.t y'=
[IV.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m ]V'.sup.t y'
The regression vector is then
p=U'Σ'.sup.1 [IV.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m ]V'.sup.t y'
p=U'Σ'.sup.1 V'.sup.t y'U'Σ'.sup.1 V.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m V'.sup.t y'
The estimate y for the model is
y=X.sup.t p=X.sub.c.sup.t p+NM.sup.t p
y=X.sub.c.sup.t [U'Σ'.sup.1 V'.sup.t y'U'Σ'.sup.1 V.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m V'.sup.t y']+
NM.sup.t [U'Σ'.sup.1 V'.sup.t y'U'Σ'.sup.1 V.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m V'.sup.t y']
y=X.sub.c.sup.t [U'Σ'.sup.1 V'.sup.t y'U'Σ'.sup.1 V.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m V'.sup.t y']+
NV.sub.m Σ'U'.sup.t [U'Σ'.sup.1 V'.sup.t y'U'Σ'.sup.1 V.sub.m.sup.t)V.sub.m V.sub.m.sup.t).sup.1 V.sub.m V'.sup.t y']
y=X.sub.c.sup.t [U'Σ'.sup.1 V'.sup.t y'U'Σ'.sup.1 V.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m V'.sup.t y']=
NV.sub.m Σ'U.sup.t U'Σ'.sup.1 V'.sup.t y'NV.sub.m Σ'U'.sup.t U'Σ'.sup.1 V.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m V'.sup.t y'
y=X.sub.c.sup.t [U'Σ'.sup.1 V'.sup.t y'U'Σ'.sup.1 V.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m V'.sup.t y']+
NV.sub.m V'.sup.t y'NV.sub.m V.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m V'.sup.t y'
y=X.sub.c.sup.t [U'Σ'.sup.1 V'.sup.t y'U'Σ'.sup.1 V.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m V'.sup.t y']
The estimated concentration and property values are constrained to be independent of the signals that are due to the measurement process.
For an unknown material with spectrum x_{u},
x.sub.u +x.sub.c +Mn=x.sub.c +U'Σ'V'.sub.m t.sub.n →y.sub.u =x.sub.u.sup.t p=(x.sub.c +Mn).sup.t p
y.sub.u =x.sub.c.sup.t [U'Σ'.sup.1 V'.sup.t y'U'Σ'.sup.1 V.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m V'.sup.t y']+
nM.sup.t [U'Σ'.sup.1 V'.sup.t y'U'Σ'.sup.1 V.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m V'.sup.t y']
y.sub.u =x.sub.c.sup.t [U'Σ'.sup.1 V'.sup.t y'U'Σ'.sup.1 V.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m V'.sup.t y']=
nV.sub.m Σ'U.sup.t U'Σ'.sup.1 V'.sup.t y'nV.sub.m Σ'U'.sup.t U'Σ'.sup.1 V.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m V'.sup.t y'
y.sub.u =x.sub.c.sup.t [U'Σ'.sup.1 V'.sup.t y'U'Σ'.sup.1 V.sub.m.sup.t) (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m V'.sup.t y']
The estimated concentrations and properties are independent of the measurement process signal. The residuals from the model are
e=yy=yX.sup.t p=yX.sup.t U'Σ'.sup.1 [IV.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m ]V'.sup.t y'
The Standard Error of Estimate is ##EQU23## Based on the model, the probability is a that the true value y_{u} lies in range ##EQU24## where t is the student's t distribution value for probability a and nf degrees of freedom, and d^{2} is given by
d.sup.2 =v.sub.u [IV.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m ]v.sub.u.sup.t
The Constrained Principal Components Regression described above will yield a stable calibration model that is robust to variations in the intensity of signals arising from the measurement process provided that the scaling of the measurement process signals in M is sufficiently large that principal components corresponding to these measurement process signals are retained in the first k principal components that are used in the model. If the scaling of the signals in M is too small, then the matrix V_{m} V_{m} ^{t} will be singular and cannot be inverted.
An alternative, equivalent, and preferred method for developing a Constrained Principal Components model involves orthogonalizing the spectra in X to the measurement process signals in M prior to the development of the model. X is a f by n matrix containing the spectra of the n reference samples at f frequencies. M is a f by s matrix containing the spectra of the s measurement process signals at the f frequencies. The first step in the process is to derive a set of orthogonal vectors, the columns of the matrix U_{m}, that represent the original measurement process signals in M. This can be accomplished by either using a GramSchmidt orthogonalization procedure, a singular value decomposition of M, or some combination of the two. For example, the singular value decomposition of M would yield
M=U.sub. Σ.sub. V.sub.m.sup.t
where U.sub. is f by s, Σ is s by s, and V.sub. is s by s. Each spectrum in X is then orthogonalized to the columns of U.sub. to obtain the corrected spectra, X_{c}. If x_{1} is the first spectrum in X, and u_{1}, u_{2}, . . . u_{s} are the orthogonalized measurement process signals, then the first corrected spectrum, c_{1}, is given by
c.sub.1 =x.sub.1 u.sub.1 (u.sub.1.sup.t x.sub.1)+u.sub.2 (u.sub.2.sup.t x.sub.1)+ . . . +u.sub.s (u.sub.s.sup.t x.sub.1)
The matrix form of this orthogonalization is
X.sub.c =XU.sub. U.sub.m.sup.t X
where the individual corrected spectra, c_{i}, are the columns of the X_{c} matrix. The original spectral matrix is given by
X=X.sub.c +U.sub. U.sub.m.sup.t X
In the next step, the singular value decomposition of X_{c} is then obtained as
X.sub.c =U.sub.c Σ.sub.c V.sub.c.sup.t
The regression model can then be obtained directly as
y=X.sub.c.sup.t p+e→p=U.sub.c Σ.sub.c.sup.1 V.sub.c.sup.t y
or indirectly by using the relationship
y=V.sub.c b+e→b=V.sub.c.sup.t y
The estimate y from the model is
y=X.sup.t p=X.sup.t U.sub.c Σ.sub.c.sup.1 V.sub.c.sup.t y=(X.sub.c +U.sub. U.sub.m.sup.t X).sup.t
U.sub.c Σ.sub.c.sup.1 V.sub.c.sup.t y
y=X.sub.c.sup.t U.sub.c Σ.sub.c.sup.1 V.sub.c.sup.t y+X.sup.t U.sub. U.sub.m.sup.t U.sub.c Σ.sub.c.sup.1 V.sub.c.sup.t y=
X.sub.c.sup.t U.sub.c Σ.sub.c.sup.1 V.sub.c.sup.t y
The term containing the product U_{m} ^{t} U_{c} is zero since the columns of U_{c} are linear combinations of the columns of X_{c} which where generated to be orthogonal to the columns of U.sub. . The estimate is constrained to be insensitive to the intensity of the measurement process signals in X. The residuals for the model are
e=yy=yX.sub.c.sup.t U.sub.c Σ.sub.c.sup.1 V.sub.c.sup.t y
The Standard Error of Estimate for the model is ##EQU25## For a new unknown sample with spectrum x_{u} which unknown measurement process signals nM, the estimate of y_{u} is
y.sub.u =x.sub.u.sup.t p=(x.sub.c.sup.t +Mn).sup.t U.sub.c Σ.sub.c.sup.1 V.sub.c.sup.t y=
(x.sub.c.sup.t +U.sub. Σ.sub. V.sub.m.sup.t n).sup.t U.sub.c Σ.sub.c.sup.1 V.sub.c.sup.t y
y.sub.u =x.sub.c.sup.t U.sub.c Σ.sub.c.sup.1 V.sub.c.sup.t y+nV.sub. Σ.sub. U.sub.m.sup.t U.sub.c Σ.sub.c.sup.1 V.sub.c.sup.t y=
x.sub.c.sup.t U.sub.c Σ.sub.c.sup.1 V.sub.c.sup.t y=v.sub.c.sup.t V.sub.c.sup.t y
where
v.sub.c.sup.t =x.sub.c.sup.t U.sub.c Σ.sub.c.sup.1
The estimate depends only on the portions of the spectra arising from the component absorptions, x_{c}, and is independent of the measurement process signals. Based on the model, the probability is a that the true value y_{u} lies in range ##EQU26## where t is the student's t distribution value for probability c, and nf degrees of freedom, and d^{2} is given by
d.sup.2 =v.sub.c.sup.t v.sub.c
Individual b_{i} used in the regression of y versus V_{c} can be set to zero based on the results of a stepwise or PRESS based regression if doing so does not result in a statistically significant increase in the residual sum of squares. If the individual elements of y represent measurements having differing measurement variance, or are the averages of differing numbers of measurments, the regression of y versus V_{c} can be accomplished using a weighted regression. In the weighted regression, a stepwise or PRESS based algorithm can again be employed to test whether individual coefficients can be set to zero.
The equivalence of the two methods of calculating a Constrained Principal Components Regression model can be demonstrated as follows: If the first s columns of X' that correspond to the measurement process signals M are scaled by a sufficiently large value such that the variance due to M (the sum of the diagonal elements of M^{t} M) is much larger than the variance due to the reference spectra (the sum of the diagonal elements of X^{t} X) then the first s Principal Components of X' will be U.sub. , and the remaining k Principal Components will be U_{c}, i.e.
X'=U'Σ'V'.sup.t =(U.sub.m,U.sub.c)(Σ.sub.m,Σ.sub.c)(V.sub.m,V.sub.c).sup.t
where (U.sub. ,U_{c}) is an f by s+k matrix whose first s columns are U.sub. and whose last k columns are U_{c}, (Σ.sub. ,Σ_{c}) is an s+k by s+k matrix whose first s diagonal elements are from Σ_{m} and whose last k diagonal elements are from Σ_{c}, and (V.sub. ,V_{c}) is an s+k by s+n matrix whose first s columns are V.sub. and whose last n columns are V_{c}. The regression vector is then
p=U'Σ'.sup.1 V'.sup.t y'U'Σ'.sup.1 V.sub.m.sup.t (V.sub.m V.sub.m.sup.t) .sup.1 V.sub.m V'.sup.t y'
p=(U.sub. ,U.sub.c)(Σ.sub. ,Σ.sub.c).sup.1 (V.sub. ,V.sub.c).sup.t y'(U.sub. ,U.sub.c)
(Σ.sub. ,Σ.sub.c).sup.1 V.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m (V.sub. ,V.sub.c).sup.t y'
p=U.sub. Σ.sub. .sup.1 V.sub. .sup.t y.sub. +U.sub.c Σ.sub.c.sup.1 V.sub.c.sup.t yU.sub. Σ.sub. .sup.1
V.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m.sup.t y.sub.mU.sub.c Σ.sub.c.sup.1 V.sub.m.sup.t (V.sub.m V.sub.m.sup.t).sup.1 V.sub.m V.sub.c.sup.t y'
p=U.sub. Σ.sub. .sup.1 V.sub. .sup.t y.sub. +U.sub.c Σ.sub.c.sup.1 V.sub.c.sup.t y
U.sub. Σ.sub. .sup.1 V.sub.m.sup.t y.sub. =U.sub.c Σ.sub.c.sup.1 V.sub.c.sup.t y
where the term U_{c} Σ_{c} ^{1} V_{m} ^{t} (V_{m} V_{m} ^{t})^{1} V_{m} V_{c} ^{t} y' is zero since the columns of V_{m} and V_{c} are orthogonal. The Constrained Principal Components model obtained using the constrained regression method is equivalent to that obtained using the preorthogonaliztion method providing the scaling of the measurement process signals is large enough to ensure that they are included in Principal Components that are retained from the singular value decomposition of X'. The preorthogonalization method is preferred since it avoids the necessity of determining an appropriate scaling for the measurement process signals, and it avoids potential computer round off errors which can result if the scaling of the measurement process signals is too large relative to the scaling of the spectra.
Constrained Principal Components Regression can be employed in the case where f≦n, but it is not required since the matrices required for solving the constrained linear least squares model can be inverted.
2.2 Constrained Partial Least Squares Regression:
If f>n, then the constrained linear least squares solution in section 2.1 cannot be used since the matrix XX^{t} is at most of rank n, is singular, and cannot be directly inverted. It is possible to develop a constrained partial least squares model by following a similar methodology. X is a f by n matrix containing the spectra of the n reference samples at f frequencies. M is a f by s matrix containing the spectra of the s measurement process signals at the f frequencies. The first step in the development of the constrained partial least square model is to derive a set of orthogonal vectors, the columns of the matrix U.sub. , that represent the original measurement process signals in M. This can be accomplished by either using a GramSchmidt orthogonalization procedure, a singular value decomposition of M, or some combination of the two. For example, the singular value decomposition of M would yield
M=U.sub. Σ.sub. V.sub.m.sup.t
where U.sub. is f by s, Σ is s by s, and V.sub. is s by s. Each spectrum in X is then orthogonalized to the columns of U.sub. to obtain the corrected spectra, X_{c}. If x_{1} is the first spectrum in X, and u_{1}, u_{2}, . . . u_{s} are the orthogonalized measurement process signals, then the first corrected spectrum, c_{1}, is given by
c.sub.1 =x.sub.1 u.sub.1 (u.sub.1.sup.t x.sub.1)+u.sub.2 (u.sub.2.sup.t x.sub.1)+ . . . +u.sub.s (u.sub.s.sup.t x.sub.1)
The matrix form of this orthogonalization is
X.sub.c =XU.sub. U.sub.m.sup.t X
where the individual corrected spectra, c_{i}, are the columns of the X_{c} matrix. The original spectral matrix is given by
X=X.sub.c +U.sub. U.sub.m.sup.t X
A partial least squares model is then developed to relate X_{c} and y
X.sub.c =L.sub.c S.sub.c.sup.t →y=S.sub.c b+e
where S is a n by k matrix, L is a f by k matrix. The model is developed using the methodology described in section 1.4.3.2. The regression vector obtained is
p=L.sub.c b=X.sub.c S.sub.c (S.sub.c.sup.t S.sub.c).sup.1 S.sub.c.sup.t y
The estimate y based on the model is
y=X.sup.t p=(X.sub.c +MN).sup.t p=(X.sub.c +U.sub. Σ.sub. V.sub.m.sup.t N).sup.t p=
X.sub.c.sup.t p+(U.sub. Σ.sub. V.sub.m.sup.t N).sup.t p
y=X.sub.c.sup.t X.sub.c S.sub.c (S.sub.c.sup.t S.sub.c).sup.1 b+N.sup.t V.sub. Σ.sub. U.sub.m.sup.t X.sub.c S.sub.c
(S.sub.c.sup.t S.sub.c).sup.1 b=X.sub.c.sup.t X.sub.c S.sub.c (S.sub.c.sup.t S.sub.c).sup.1 b
where the term N^{t} V.sub. Σ.sub. U_{m} ^{t} X_{c} S_{c} (S_{c} ^{t} S_{c})^{1} b is zero since the columns of U.sub. are orthogonal to the columns of X_{c}. The estimate y is constrained to be independent of the measurment process signals. The residuals from the model are
e=yy=yX.sub.c.sup.t X.sub.c S.sub.c (S.sub.c.sup.t S.sub.c).sup.1 b
The Standard Error of Estimate for the model is ##EQU27## For a new spectrum, x_{u}, which contains component absorptions x_{c} and measurment process signals Mn, the scalar estimate y_{u} is obtained as
s.sub.u =(L.sub.c.sup.t L.sub.c).sup.1 L.sub.c.sup.t x.sub.u =(L.sub.c.sup.t L.sub.c).sup.1 L.sub.c.sup.t (x.sub.c +Mn)=
(L.sub.c.sup.t L.sub.c).sup.1 L.sub.c.sup.t (x.sub.c +U.sub. Σ.sub. V.sub.m.sup.t n)
s.sub.u =(L.sub.c.sup.t L.sub.c).sup.1 L.sub.c.sup.t x.sub.c =L.sub.c.sup.t x.sub.c
y.sub.u =s.sub.u.sup.t b=x.sub.c.sup.t L.sub.c (S.sub.c.sup.t S.sub.c).sup.1 S.sub.c.sup.t y
The estimate of y_{u} is constrained so as to be independent of an measurment process signals in x_{u}. Based on the model, the probability is a that the true value y_{u} lies in range ##EQU28## where t is the student's t distribution value for probability a and nk degrees of freedom, and d^{2} is given by
d.sup.2 =s.sub.u.sup.t (S.sub.c S.sub.c.sup.t).sup.1 s.sub.u
3.0 Pathlength and Scaling Constraints:
The matrix form of Boer's Law can be represented by the equation
X=ACT or XT.sup.1 =AC or (A.sup.t A).sup.1 A.sup.t XT.sup.1 =
C→C.sup.t =T.sup.1 X.sup.t A(A.sup.t A).sup.1
where X is the f by n matrix containing the reference spectra used in the development of spectral calibrations, A is an f by c matrix hose columns are the absorption spectra a_{i} of the c individual components (measured at or normalized to unit pathlength), where C is a c by n matrix whose general element c_{ij} the concentration of the i^{th} component in the j^{th} reference sample, and where T is a diagonal n by n matrix containing the pathlengths for the n samples.
If y is a component concentration or a property that depends linearly on the concentrations of all the sample components, then
y=C.sup.t r
where r is a vector of dimension c which the coefficients relating y to the concentrations of the individual components. If y is the concentration of a single component, the only the one corresponding element of r will be nonzero. If y is the concentration of a "lumped" component (e.g. the total aromatics content of a hydrocarbon which equals the sum of the concentrations of the individual aromatic components), then the elements of r which correspond to the components that were "lumped" will be one, and the remaining elements of r will be zero. If y is a physical property (i.e. a property that depends on the physical state of the samples, e.g. the sample density measured at a specified temperature) or a performance property (i.e. a property that is defined by the use of the sample in an application, e.g. the research octane number of a gasoline as defined by ASTM 269984 (6), or the reactivity of the sample under a defined set of conditions), then r contains the coefficients that relate the property to all of the individual components. The regression models that are developed above have the form
y=C.sup.t r+e=T.sup.1 X.sup.t A(A.sup.t A).sup.1 r=T.sup.1 X.sup.t p+e→A(A.sup.t A).sup.1 r=p
If all the reference spectra are collected at the same pathlength, then T and T^{1} are the products of scalars times the unit matrix I, and the regression equation becomes
y=T.sup.1 X.sup.t A(A.sup.t A).sup.1 r+e=IX.sup.t A(A.sup.t A).sup.1 rt=X.sup.t p+e→A(A.sup.t A).sup.1 rt=p
where t is the pathlength used in all the measurements. In this case, the scalar t is lumped into the overall regression vector that is determined in the calibration. During the analysis, the unknown spectrum must be collected at the same pathlength, or if it is collected at a different pathlength t', either the spectrum x_{u} or the estimated value y_{u} must be scaled by the ratio t/t'.
If all the reference spectra are not collected at the same pathlength, then the spectra can be normalized prior to the calibration,
X.sub.n =XT.sup.1
and the regression model can built relating the normalized data X_{n} to y. In this case, during the analysis, either the unknown spectrum x_{u} or the intial estimate of y_{u}, must be scaled by 1/t' (the pathlength used in the collection of x_{u}) to obtain the final estimate.
The sum of all the individual components in the sample must equal unity, i.e. they must account for the total sample. If r is a vector containing all ones, then y is also
y=C.sup.t r=C.sup.t 1=1
where 1 indicates a vector containing all ones. In this case, a regression of the form
y=1=T.sup.1 X.sup.t p.sub.t →1T=X.sup.t p.sub.t →t=X.sup.t p.sub.t
where t is a vector containing the pathlengths used in the measurment of the n reference samples. p_{t} is a regression vector that can be used to estimate the pathlength used in the measurement of the spectrum of an unknown, i.e.
t=x.sub.u.sup.t p.sub.t
The use of a regression model to estimate the pathlength at which a spectrum was collected assumes that all the components in the sample contribute a measurable absorption to the spectrum. If a component possesses no absorptions in the spectral range used in the calibration, then an increase or decrease in the concentration of the component produces at most a change in the scaling of the spectrum (via changing the concentrations of the other components where the sum of the component concentrations sums to unity, i.e. dilution effects), and cannot be distinguished from a scaling change associated with varying the pathlength. The assumption required for use of a pathlength regression is valid for mid and nearinfrared spectroscopies where all typical sample components are expected to give rise to some vibrational absorption, but it would not hold for ultravioletvisible spectroscopy where some components might possess no measurable absorption. If reference and unknown samples are diluted in solvent for the collection of the IR spectra, the pathlength regression can still be applies since the solvent can be considered as an additional sample component.
The results of a pathlength regression model can be incorporated into the calibration models built for the estimation of component concentrations and/or properties. If X is the matrix containing reference spectra collected at pathlengths t, then a regression model can be constructed to estimate t
t=X.sup.t p.sub.t +e→t=X.sup.t p.sub.t
where p_{t} can be estimated using Linear Least Square Regression, Principal Components Regression, Partial Least Squares Regression, or more preferably Constrained Linear Least Squares Regression, Comstrained Principal Components Regression or Contrained Partial Least Squares Regression. The matrix T is defined as a diagonal matrix whose element t_{ii} is the estimate t_{i} from t, i.e. the t_{ii} is the estimated pathlength for the i^{th} sample. Regression models for component concentrations and/or properties can be developed using the equation
y=T.sup.1 X.sup.t p+e=X.sub.t.sup.t p+e
where X_{t} is the reference spectral data scaled by the estimated pathlengths.
For Linear Least Squares Regression, or Constrained Linear Least Squares Regression, the incorporation of the pathlength regression into the calibration is relatively straight forward. A model is built based on the raw reference spectra, X, and the pathlengths, t. The model is applied to the reference spectra to estimate the pathlengths, t. The estimated pathlengths are used to calculate T and to rescale the data to obtain X_{t}. Models are then built for all the components and/or properties y based on the rescaled spectral data, X_{t}. During the analysis, the pathlength used in the collection of the spectrum of the unknown, t_{u}, is estimated, and either the spectrum of the unknown, or the initial estimates of the component concentrations and/or properties are scaled by 1/t_{u}.
In Partial Least Squares Regression and Constrained Partial Least Squares regression, the pathlength, component concentration, or property values are used in the calculation of the decomposition of X. The matrices L and S which are calculated are different for each component, property and for pathlength. Separate models must thus be constructed for pathlength and for components and properties. A general calibration procedure would involve, developing a model for the pathlengths t based on the raw reference spectra X. Using the estimates t to construct the T matrix and scaling X to obtain X_{t}. Building regression models for each component and/or property y based on X_{t}. In the analysis of a spectrum x_{u} of an unknown, the pathlength t_{u} would first be estimated, and then either the spectrum or the initial estimates of the components and/or properties would be scaled by 1/t_{u}.
For Principal Components Regression and Constrained Principal Components regression, a more computationally efficient procedure is preferred. The matrix X (or X_{c} for Constrained PCR) is decomposed as
X=UΣV.sup.t
The pathlengths are then regressed as
t=Vb.sub.t +e→b.sub.t =V.sup.t t→t=Vb.sub.t
The matrix T is formed by placing the elements of t on the diagonal, and the component concentrations and/or properties are regressed as
y=T.sup.1 Vb+e→b=(V.sup.t T.sup.2 V).sup.1 V.sup.t T.sup.1 y
For an unknown material with spectrum x_{u}, the pathlength used in collection the spectrum is estimated as
x.sub.u =UΣv.sub.c →t.sub.u =v.sub.c.sup.t b.sub.t
The component concnetrations and/or properties are then estimated as
y.sub.u =v.sub.c.sup.t b/t.sub.u
The alternative to the above method would involve calculating the singular value decomposition of X_{t} (the spectra scaled to the estimated pathlengths) to build regression models for y. Since the matrices produced by the singular value decomposition of X (or X_{c}) are required for the pathlength estimation, the singular value decomposition of X_{t} would produce a second set of matrices which would have to be stored and recalled for the estimation of components and/or properties. By conducting only one singular value decomposition, the above method reduces the computer memory, disk storage, input/output requirements and computation time for conducting the calibrations and analyses.
The regression for the pathlength can be conducted using a stepwise or PRESS based method to detect coefficients that are statistically equivalent to zero. A weighted regression may be employed in the development of the regression models for the component concentrations and/or properties in the case where the individual elements if y are of unequal measurement variance, and/or represent the average of differing numbers of individual experimental determinations. A stepwise or PRESS based method can also be employed to conduct these regressions.
4.0 Preferred Method for Calculating the Measurement Process Singnal Matrix:
The matrix U.sub. used in calculation of the Constrained Principal Components Regression and the Constrained Partial Least Square Regression is derived from a matrix M which contains examples of the signals arising from the measurement process. The columns of M will generally consist of two types, calculated spectra which are used to simulate certain types of measurement process signals, and experimental spectra collected so as to provide an example of one or more measurement process signals. The calculated spectra could include, but not be limited to, frequency (or wavelength) dependent polynomials which are used to approximate variations in the baseline of the absorption spectra. Such baseline variations can result from variations in the reflection and/or scattering from the front face of the sample or cell, from variations in the responce of the spectrometer detector, or from variations in the intensity of the spectrometer source. An example of the experimental spectra which can be used would include, but not be limited to, the spectra of atmospheric water vapor, and/or carbon dioxide. These type of experimental spectra are referred to as correction spectra.
The following is an example of one computational method for arriving at the U.sub. matrix. If the reference spectra X are collected over a frequency (wavelength) range from ν_{1} to ν_{2} at interval Δν, then a set of orthogonal polynomials over the spectral range can be constructed using Legendre polynomials ##EQU29## where ξ is defined as
ξ=(νν.sub.2)/(ν.sub.1 ν.sub.2)
The elements u_{ij} of the U_{p} polynomial vectors are calculated as ##EQU30## where j is equal to k+1 and the √f is required so that the columns of U_{p} are normalized. The elements of the first column of U_{p} correspond to P_{o} (ξ), and are constants of the value of 1/√f. The second column of U_{p} depends linearly on frequency, the third has a quadratic dependency of frequency, etc.. The column vectors of U_{p} form an orthonormal set, i.e.
U.sub.p.sup.t U.sub.p =I
The matrix U_{p} is of dimension f by p, where p is the number of polynomial terms being used to model background variations, and p1 is the highest degree of the polynomial.
Z is an f by z matrix containing z experimental spectra representing measurment process signals for which constraints are to be developed. Z may contain multiple examples of measurement process signals, e.g. several spectra of air with varying water vapor and carbon dioxide levels. The matrix Z is first orthogonalized relative to U_{p}.
Z'=ZZ(Z.sup.t U.sub.p.sup.t)
The singular value decomposition of Z' is then calculated
Z'=U.sub.z Σ.sub.z V.sub.z.sup.t
If z' is the number of experimental measurement process signals being modeled, then the first z' columns of U_{z} are then combined with the U_{p} to form the f by p+z' matrix U.sub. . Note, for example, if z=10 spectra of air with varying levels of water vapor and carbon dioxide where used in Z, then only z'=2 columns of U_{z} would be used since only two different process singals are being modeled.
5.0 Spectral Ranges:
The spectral data used in developing a Constrained Spectral Regression model may include all of the data points (frequencies or wavelengths) between the normal starting and ending points of a spectrometer scan, or it may be limited to subregions of the spectra, so as to exclude portions of the data. For many spectrometer systems, absorbances above a certain nominal value do not scale linearly with pathlength or component concentration. Since the regression models assume linear responce, regions where the absorbance exceeds the range of linear spectrometer response will generally be excluded from the data prior to the development of the model. Subregions can also be excluded to exclude measurement process signals that occur in narrow spectral ranges. For example, in the midinfrared data where sample components are expected to absorb minimally in the 23002400 cm^{1} range, this spectral range may be excluded to avoid the necessity of adding a carbon dioxide spectrum as a correction spectrum.
Claims
Priority Applications (3)
Application Number  Priority Date  Filing Date  Title 

US59643590 true  19901012  19901012  
US99071592 true  19921215  19921215  
US08300016 US5446681A (en)  19901012  19940902  Method of estimating property and/or composition data of a test sample 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

US08300016 US5446681A (en)  19901012  19940902  Method of estimating property and/or composition data of a test sample 
Related Parent Applications (1)
Application Number  Title  Priority Date  Filing Date  

US99071592 Continuation  19921215  19921215 
Publications (1)
Publication Number  Publication Date 

US5446681A true US5446681A (en)  19950829 
Family
ID=24387259
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

US08300016 Expired  Lifetime US5446681A (en)  19901012  19940902  Method of estimating property and/or composition data of a test sample 
Country Status (6)
Country  Link 

US (1)  US5446681A (en) 
EP (1)  EP0552291B1 (en) 
JP (1)  JP3130931B2 (en) 
CA (1)  CA2093015C (en) 
DE (2)  DE69128357T2 (en) 
WO (1)  WO1992007326A1 (en) 
Cited By (102)
Publication number  Priority date  Publication date  Assignee  Title 

US5596135A (en) *  19940131  19970121  Shimadzu Corporation  Apparatus for and method of determining purity of a peak of a peak of a chromatogram 
US5610836A (en) *  19960131  19970311  Eastman Chemical Company  Process to use multivariate signal responses to analyze a sample 
US5641962A (en) *  19951205  19970624  Exxon Research And Engineering Company  Non linear multivariate infrared analysis method (LAW362) 
US5680321A (en) *  19940518  19971021  Eka Nobel Ab  Method of quantifying the properties of paper 
US5699269A (en) *  19950623  19971216  Exxon Research And Engineering Company  Method for predicting chemical or physical properties of crude oils 
US5699270A (en) *  19950623  19971216  Exxon Research And Engineering Company  Method for preparing lubrication oils (LAW232) 
US5708593A (en) *  19950519  19980113  Elf Antar France  Method for correcting a signal delivered by a measuring instrument 
US5712797A (en) *  19941007  19980127  Bp Chemicals Limited  Property determination 
US5740073A (en) *  19941007  19980414  Bp Chemicals Limited  Lubricant property determination 
US5744702A (en) *  19960912  19980428  Exxon Research And Engineering Company  Method for analyzing total reactive sulfur 
US5763883A (en) *  19941007  19980609  Bp Chemicals Limited  Chemicals property determination 
US5768157A (en) *  19941122  19980616  Nec Corporation  Method of determining an indication for estimating item processing times to model a production apparatus 
WO1998029787A1 (en) *  19961231  19980709  Exxon Chemical Patents Inc.  Online control of a chemical process plant 
US5808180A (en) *  19960912  19980915  Exxon Research And Engineering Company  Direct method for determination of true boiling point distillation profiles of crude oils by gas chromatography/mass spectrometry 
US5845237A (en) *  19951228  19981201  Elf Antar France  Process for determining the value of a physical quantity 
US5861228A (en) *  19941007  19990119  Bp Chemicals Limited  Cracking property determination 
US5862060A (en) *  19961122  19990119  Uop Llc  Maintenance of process control by statistical analysis of product optical spectrum 
US5876121A (en) *  19940805  19990302  Mcgill University  Substrate temperature measurement by infrared spectroscopy 
US5907495A (en) *  19970627  19990525  General Motors Corporation  Method of formulating paint through color space modeling 
US5930784A (en) *  19970821  19990727  Sandia Corporation  Method of locating related items in a geometric space for data mining 
US5935863A (en) *  19941007  19990810  Bp Chemicals Limited  Cracking property determination and process control 
US5946640A (en) *  19950608  19990831  University Of Wales Aberystwyth  Composition analysis 
FR2776074A1 (en) *  19980313  19990917  Transtechnologies  Equipment for absolute measurement of smell, useful for quality control of manufactured products and detection of drugs and explosives 
US6012019A (en) *  19961023  20000104  Elf Antar France  Process for tracking and monitoring a manufacturing unit and/or a nearinfrared spectrometer by means of at least one criterion of quality of sets of spectra 
US6049764A (en) *  19971112  20000411  City Of Hope  Method and system for realtime control of analytical and diagnostic instruments 
US6070128A (en) *  19950606  20000530  Eutech Engineering Solutions Limited  Method for determining properties using near infrared (NIR) spectroscopy 
US6085153A (en) *  19961106  20000704  Henry M. Jackson Foundation  Differential spectral topographic analysis (DISTA) 
US6087182A (en) *  19980827  20000711  Abbott Laboratories  Reagentless analysis of biological samples 
WO2000049424A1 (en) *  19990219  20000824  Fox Chase Cancer Center  Methods of decomposing complex data 
US6167391A (en) *  19980319  20001226  Lawrence Technologies, Llc  Architecture for corob based computing system 
US6232609B1 (en) *  19951201  20010515  CedarsSinai Medical Center  Glucose monitoring apparatus and method using laserinduced emission spectroscopy 
US6295485B1 (en) *  19990129  20010925  Mobil Oil Corporation  Control of lubricant production by a method to predict a base stock's ultimate lubricant performance 
WO2001075625A1 (en) *  20000403  20011011  Libraria, Inc.  Chemistry resource database 
US6317654B1 (en) *  19990129  20011113  James William Gleeson  Control of crude refining by a method to predict lubricant base stock's ultimate lubricant preformance 
US6353802B1 (en) *  19950725  20020305  Eastman Kodak Company  Reject analysis 
CN1086039C (en) *  19960119  20020605  日本电气株式会社  Method for defining processtime target of estimated item 
EP1210682A1 (en) *  19990514  20020605  ExxonMobil Research and Engineering Company  Method for optimizing multivariate calibrations 
WO2002051359A1 (en) *  20001227  20020704  Haarmann & Reimer Gmbh  Method for selecting cosmetic adjuvants 
US6438440B1 (en) *  19980930  20020820  Oki Electric Industry Co., Ltd.  Method and system for managing semiconductor manufacturing equipment 
US6512156B1 (en)  19961022  20030128  The Dow Chemical Company  Method and apparatus for controlling severity of cracking operations by near infrared analysis in the gas phase using fiber optics 
US20030052681A1 (en) *  20010914  20030320  Kazuhiro Kono  Failure prediction apparatus for superconductive magnet and magnetic resonance imaging system 
US6549861B1 (en)  20000810  20030415  EuroCeltique, S.A.  Automated system and method for spectroscopic analysis 
US6549899B1 (en) *  19971114  20030415  Mitsubishi Electric Research Laboratories, Inc.  System for analyzing and synthesis of multifactor data 
US20030158850A1 (en) *  20020220  20030821  Lawrence Technologies, L.L.C.  System and method for identifying relationships between database records 
US6611735B1 (en) *  19991117  20030826  Ethyl Corporation  Method of predicting and optimizing production 
US20030163284A1 (en) *  20020227  20030828  Luc Sandjivy  Method for determining a spatial quality index of regionalised data 
US6662116B2 (en) *  20011130  20031209  Exxonmobile Research And Engineering Company  Method for analyzing an unknown material as a blend of known materials calculated so as to match certain analytical data and predicting properties of the unknown based on the calculated blend 
US6675030B2 (en)  20000821  20040106  EuroCeltique, S.A.  Near infrared blood glucose monitoring system 
US20040033617A1 (en) *  20020813  20040219  Sonbul Yaser R.  Topological near infrared analysis modeling of petroleum refinery products 
WO2004023112A1 (en) *  20020906  20040318  Institut Des Communications Graphiques Du Quebec  Printing medium evaluation method and device 
US20040058386A1 (en) *  20010115  20040325  Wishart David Scott  Automatic identificaiton of compounds in a sample mixture by means of nmr spectroscopy 
US20040059560A1 (en) *  20020920  20040325  Martha Gardner  Systems and methods for developing a predictive continuous product space from an existing discrete product space 
US20040064465A1 (en) *  20020926  20040401  Lam Research Corporation  Expert knowledge methods and systems for data analysis 
US20040122276A1 (en) *  20021223  20040624  Ngan Danny YukKwan  Apparatus and method for determining and contolling the hydrogentocarbon ratio of a pyrolysis product liquid fraction 
US20040130713A1 (en) *  20000516  20040708  O'mongain Eon  Photometric analysis of natural waters 
US20040174154A1 (en) *  20020419  20040909  Butters Bennett M.  System and method for sample detection based on lowfrequency spectral components 
US20040183530A1 (en) *  20020329  20040923  Butters Bennett M.  System and method for characterizing a sample by lowfrequency spectra 
US20040223155A1 (en) *  19990722  20041111  Hazen Kevin H.  Method of characterizing spectrometer instruments and providing calibration models to compensate for instrument variation 
US20040233446A1 (en) *  20030522  20041125  Chengli Dong  [optical fluid analysis signal refinement] 
US20050033127A1 (en) *  20030130  20050210  EuroCeltique, S.A.  Wireless blood glucose monitoring system 
US20050030016A1 (en) *  20020329  20050210  Butters John T.  System and method for characterizing a sample by lowfrequency spectra 
US6862484B2 (en)  20010420  20050301  Oki Electric Industry Co., Ltd.  Controlling method for manufacturing process 
US6875414B2 (en)  20020114  20050405  American Air Liquide, Inc.  Polysulfide measurement methods using colormetric techniques 
US6895340B2 (en)  20010425  20050517  BristolMyers Squibb Company  Method of molecular structure recognition 
US6898530B1 (en)  19990930  20050524  Battelle Memorial Institute  Method and apparatus for extracting attributes from sequence strings and biopolymer material 
US20050137476A1 (en) *  20030404  20050623  Elisabeth Weiland  Method for evaluating magnetic resonance spectroscopy data using a baseline model 
US20050154539A1 (en) *  20020522  20050714  Matthew Butler  Processing system for remote chemical identification 
US6947913B1 (en)  20010823  20050920  Lawrence Technologies, Llc  Systems and methods for generating string correlithm objects 
US20050288871A1 (en) *  20040629  20051229  Duffy Nigel P  Estimating the accuracy of molecular property models and predictions 
US20060009875A1 (en) *  20040709  20060112  Simpson Michael B  Chemical mixing apparatus, system and method 
US6990238B1 (en)  19990930  20060124  Battelle Memorial Institute  Data processing, analysis, and visualization system for use with disparate data types 
US20060017923A1 (en) *  19990122  20060126  Ruchti Timothy L  Analyte filter method and apparatus 
US20060080041A1 (en) *  20040708  20060413  Anderson Gary R  Chemical mixing apparatus, system and method 
US20060190137A1 (en) *  20050218  20060824  Steven W. Free  Chemometric modeling software 
US20060190216A1 (en) *  20050224  20060824  Boysworth Marc K  Retroregression residual remediation for spectral/signal identification 
US7106329B1 (en)  19990930  20060912  Battelle Memorial Institute  Methods and apparatus for displaying disparate types of information using an interactive surface map 
US20060266102A1 (en) *  20050525  20061130  Tolliver Charlie L  System, apparatus and method for detecting unknown chemical compounds 
US20070043518A1 (en) *  20050419  20070222  Nicholson Jeremy K  Method for the identification of molecules and biomarkers using chemical, biochemical and biological data 
US20070050154A1 (en) *  20050901  20070301  Albahri Tareq A  Method and apparatus for measuring the properties of petroleum fuels by distillation 
US20070124118A1 (en) *  20021220  20070531  Lam Research Corporation  Expert knowledge methods and systems for data analysis 
US20070192035A1 (en) *  20050609  20070816  Chem Image Corporation  Forensic integrated search technology 
US20070231872A1 (en) *  20040727  20071004  Nativis, Inc.  System and Method for Collecting, Storing, Processing, Transmitting and Presenting Very Low Amplitude Signals 
US20080077644A1 (en) *  20060710  20080327  Visx, Incorporated  Systems and methods for wavefront analysis over circular and noncircular pupils 
US20080172141A1 (en) *  20040708  20080717  Simpson Michael B  Chemical Mixing Apparatus, System And Method 
US20080300826A1 (en) *  20050609  20081204  Schweitzer Robert C  Forensic integrated search technology with instrument weight factor determination 
US20090144022A1 (en) *  20071203  20090604  Smiths Detection Inc.  Mixed statistical and numerical model for sensor array detection and classification 
US20090163369A1 (en) *  20020110  20090625  Chemlmage Corporation  Detection of Pathogenic Microorganisms Using Fused Sensor Data 
US20090192340A1 (en) *  20071101  20090730  Robert Dielman Culp  Alkylaromatic dehydrogenation system and method for monitoring and controlling the system 
US20090198463A1 (en) *  20080131  20090806  Kumiko Kamihara  Automatic analzyer 
US20090256562A1 (en) *  20060313  20091015  Shuqiang Gao  Nmr method of detecting precipitants in a hydrocarbon stream 
US20100127217A1 (en) *  20070615  20100527  David Lightowlers  Method for the online analysis of a vapour phase process stream 
US20100134784A1 (en) *  20070502  20100603  Ralf Bitter  Detector Arrangement for a Nondispersive Infrared Gas Analyzer and Method for the Detection of a Measuring Gas Component in a Gas Mixture by Means of Such a Gas Analyzer 
US20100204925A1 (en) *  20050901  20100812  Tareq Abduljalil Albahri  Method for measuring the properties of petroleum fuels by distillation 
US20100305872A1 (en) *  20090531  20101202  University Of Kuwait  Apparatus and Method for Measuring the Properties of Petroleum Factions and Pure Hydrocarbon Liquids by Light Refraction 
US20110153035A1 (en) *  20091222  20110623  Caterpillar Inc.  Sensor Failure Detection System And Method 
US20110237446A1 (en) *  20060609  20110929  Chemlmage Corporation  Detection of Pathogenic Microorganisms Using Fused Raman, SWIR and LIBS Sensor Data 
US20120116731A1 (en) *  20101104  20120510  Charles David Eads  Multidimensional relaxometry methods for consumer goods 
US20120227043A1 (en) *  20110303  20120906  Mks Instruments, Inc.  Optimization of Data Processing Parameters 
CN103210301A (en) *  20100913  20130717  Mks仪器股份有限公司  Monitoring, detecting and quantifying chemical compounds in sample 
US20130325418A1 (en) *  20120530  20131205  Exxonmobil Research And Engineering Company  System and method to generate moledular formula distributions beyond a predetermined threshold for a petroleum stream 
US20160123872A1 (en) *  20141029  20160505  Chevron U.S.A. Inc.  Method and system for nir spectroscopy of mixtures to evaluate composition of components of the mixtures 
EP2480875A4 (en) *  20090924  20171011  Commw Scient Ind Res Org  Method of contaminant prediction 
Families Citing this family (19)
Publication number  Priority date  Publication date  Assignee  Title 

ES2129223T3 (en) *  19941007  19990601  Bp Chem Int Ltd  Determination of properties. 
DE19713194C2 (en) *  19970327  19990401  Hkr Sensorsysteme Gmbh  Method and arrangement for detecting properties of a sample on the basis of mass spectroscopy 
EP0982582B1 (en) *  19980828  20050601  PerkinElmer Limited  Suppression of undesired components in measured spectra 
FR2783322B1 (en) *  19980911  20010309  Naphtachimie Sa  Process and quality control device effluent 
EP0985920A1 (en) *  19980911  20000315  Naphtachimie  Method and device for checking the quality of waste waters 
FR2787883B1 (en) *  19981130  20010316  Naphtachimie Sa  Process and quality control device effluent by spectrophotometry 
DE19953387A1 (en) *  19991106  20010523  Andreas Gronauer  Method for evaluating electromagnetic spectra of substances in terms of their applicationspecific effect 
US7302349B2 (en)  20020816  20071127  Lattec I/S  System and a method for observing and predicting a physiological state of an animal 
CA2501003C (en)  20040423  20090519  F. HoffmannLa Roche Ag  Sample analysis to provide characterization data 
WO2006137902A3 (en) *  20041004  20071206  Michael L Myrick  Thermal selectivity multivariate optical computing 
GB0523832D0 (en) *  20051123  20060104  Univ City  Noninvasive optical monitoring of glucose using an adaptive modelling scheme 
FR2906034B1 (en) *  20060918  20140606  Topnir Systems  Method for estimating a property of a sample 
FR2906033B1 (en) *  20060918  20140606  Topnir Systems  Method for estimating a property of a sample 
US7880473B2 (en)  20080331  20110201  General Electric Company  Noninvasive monitoring and diagnosis of electric machines by measuring external flux density 
WO2014137354A1 (en) *  20130308  20140912  Halliburton Energy Services, Inc  Systems and methods for optical fluid identification approximation and calibration 
EP2799841A1 (en)  20130430  20141105  Topnir Systems SAS  Method for characterising a product by topological spectral analysis 
EP2799840A1 (en)  20130430  20141105  Topnir Systems SAS  Method for characterising a product by topological spectral analysis 
CN104897709A (en) *  20150615  20150909  江苏大学  Agricultural product element quantitative detection model building method based on Xray fluorescence analysis 
DE102016009636A1 (en)  20160810  20180215  Qfood Gmbh  Method for checking the conformity of a beer sample with a reference beer 
Citations (5)
Publication number  Priority date  Publication date  Assignee  Title 

US4864842A (en) *  19880729  19890912  Troxler Electronic Laboratories, Inc.  Method and system for transferring calibration data between calibrated measurement instruments 
US4866644A (en) *  19860829  19890912  Shenk John S  Optical instrument calibration system 
US4959796A (en) *  19871110  19900925  Konica Corporation  Method of producing analytical curve 
US5121337A (en) *  19901015  19920609  Exxon Research And Engineering Company  Method for correcting spectral data for data due to the spectral measurement process itself and estimating unknown property and/or composition data of a sample using such method 
US5243546A (en) *  19910110  19930907  Ashland Oil, Inc.  Spectroscopic instrument calibration 
Family Cites Families (5)
Publication number  Priority date  Publication date  Assignee  Title 

JPH0141936B2 (en) *  19831116  19890908  Ube Industries  
US4766551A (en) *  19860922  19880823  Pacific Scientific Company  Method of comparing spectra to identify similar materials 
US4802102A (en) *  19870715  19890131  HewlettPackard Company  Baseline correction for chromatography 
US5014217A (en) *  19890209  19910507  S C Technology, Inc.  Apparatus and method for automatically identifying chemical species within a plasma reactor environment 
US4975581A (en) *  19890621  19901204  University Of New Mexico  Method of and apparatus for determining the similarity of a biological analyte from a model constructed from known biological fluids 
Patent Citations (5)
Publication number  Priority date  Publication date  Assignee  Title 

US4866644A (en) *  19860829  19890912  Shenk John S  Optical instrument calibration system 
US4959796A (en) *  19871110  19900925  Konica Corporation  Method of producing analytical curve 
US4864842A (en) *  19880729  19890912  Troxler Electronic Laboratories, Inc.  Method and system for transferring calibration data between calibrated measurement instruments 
US5121337A (en) *  19901015  19920609  Exxon Research And Engineering Company  Method for correcting spectral data for data due to the spectral measurement process itself and estimating unknown property and/or composition data of a sample using such method 
US5243546A (en) *  19910110  19930907  Ashland Oil, Inc.  Spectroscopic instrument calibration 
Cited By (170)
Publication number  Priority date  Publication date  Assignee  Title 

US5596135A (en) *  19940131  19970121  Shimadzu Corporation  Apparatus for and method of determining purity of a peak of a peak of a chromatogram 
US5680321A (en) *  19940518  19971021  Eka Nobel Ab  Method of quantifying the properties of paper 
US5876121A (en) *  19940805  19990302  Mcgill University  Substrate temperature measurement by infrared spectroscopy 
US5861228A (en) *  19941007  19990119  Bp Chemicals Limited  Cracking property determination 
US5935863A (en) *  19941007  19990810  Bp Chemicals Limited  Cracking property determination and process control 
US5763883A (en) *  19941007  19980609  Bp Chemicals Limited  Chemicals property determination 
US5712797A (en) *  19941007  19980127  Bp Chemicals Limited  Property determination 
US5740073A (en) *  19941007  19980414  Bp Chemicals Limited  Lubricant property determination 
US5768157A (en) *  19941122  19980616  Nec Corporation  Method of determining an indication for estimating item processing times to model a production apparatus 
US5708593A (en) *  19950519  19980113  Elf Antar France  Method for correcting a signal delivered by a measuring instrument 
US6070128A (en) *  19950606  20000530  Eutech Engineering Solutions Limited  Method for determining properties using near infrared (NIR) spectroscopy 
US5946640A (en) *  19950608  19990831  University Of Wales Aberystwyth  Composition analysis 
US5699270A (en) *  19950623  19971216  Exxon Research And Engineering Company  Method for preparing lubrication oils (LAW232) 
US5699269A (en) *  19950623  19971216  Exxon Research And Engineering Company  Method for predicting chemical or physical properties of crude oils 
US6353802B1 (en) *  19950725  20020305  Eastman Kodak Company  Reject analysis 
US6232609B1 (en) *  19951201  20010515  CedarsSinai Medical Center  Glucose monitoring apparatus and method using laserinduced emission spectroscopy 
US5641962A (en) *  19951205  19970624  Exxon Research And Engineering Company  Non linear multivariate infrared analysis method (LAW362) 
US5845237A (en) *  19951228  19981201  Elf Antar France  Process for determining the value of a physical quantity 
CN1086039C (en) *  19960119  20020605  日本电气株式会社  Method for defining processtime target of estimated item 
US5610836A (en) *  19960131  19970311  Eastman Chemical Company  Process to use multivariate signal responses to analyze a sample 
US5808180A (en) *  19960912  19980915  Exxon Research And Engineering Company  Direct method for determination of true boiling point distillation profiles of crude oils by gas chromatography/mass spectrometry 
US5744702A (en) *  19960912  19980428  Exxon Research And Engineering Company  Method for analyzing total reactive sulfur 
US6512156B1 (en)  19961022  20030128  The Dow Chemical Company  Method and apparatus for controlling severity of cracking operations by near infrared analysis in the gas phase using fiber optics 
US6012019A (en) *  19961023  20000104  Elf Antar France  Process for tracking and monitoring a manufacturing unit and/or a nearinfrared spectrometer by means of at least one criterion of quality of sets of spectra 
US6085153A (en) *  19961106  20000704  Henry M. Jackson Foundation  Differential spectral topographic analysis (DISTA) 
US5862060A (en) *  19961122  19990119  Uop Llc  Maintenance of process control by statistical analysis of product optical spectrum 
US6072576A (en) *  19961231  20000606  Exxon Chemical Patents Inc.  Online control of a chemical process plant 
WO1998029787A1 (en) *  19961231  19980709  Exxon Chemical Patents Inc.  Online control of a chemical process plant 
US5907495A (en) *  19970627  19990525  General Motors Corporation  Method of formulating paint through color space modeling 
US5930784A (en) *  19970821  19990727  Sandia Corporation  Method of locating related items in a geometric space for data mining 
US6049764A (en) *  19971112  20000411  City Of Hope  Method and system for realtime control of analytical and diagnostic instruments 
US6549899B1 (en) *  19971114  20030415  Mitsubishi Electric Research Laboratories, Inc.  System for analyzing and synthesis of multifactor data 
FR2776074A1 (en) *  19980313  19990917  Transtechnologies  Equipment for absolute measurement of smell, useful for quality control of manufactured products and detection of drugs and explosives 
US6167391A (en) *  19980319  20001226  Lawrence Technologies, Llc  Architecture for corob based computing system 
US20040197927A1 (en) *  19980827  20041007  TzyyWen Jeng  Reagentless analysis of biological samples 
US6426045B1 (en)  19980827  20020730  Abbott Laboratories  Reagentless analysis of biological samples 
US6087182A (en) *  19980827  20000711  Abbott Laboratories  Reagentless analysis of biological samples 
US6365109B1 (en)  19980827  20020402  Abbott Laboratories  Reagentless analysis of biological samples 
US6773922B2 (en)  19980827  20040810  Abbott Laboratories  Reagentless analysis of biological samples 
US7303922B2 (en)  19980827  20071204  Abbott Laboratories  Reagentless analysis of biological samples by applying mathematical algorithms to smoothed spectra 
US6665576B2 (en)  19980930  20031216  Oki Electric Industry Co., Ltd.  Method and system for managing semiconductor manufacturing equipment 
US6438440B1 (en) *  19980930  20020820  Oki Electric Industry Co., Ltd.  Method and system for managing semiconductor manufacturing equipment 
US7076318B2 (en)  19980930  20060711  Oki Electric Industry Co., Ltd.  Method and system for managing semiconductor manufacturing equipment 
US20040098161A1 (en) *  19980930  20040520  Shunji Hayashi  Method and systme for managing semiconductor manufacturing equipment 
US20060017923A1 (en) *  19990122  20060126  Ruchti Timothy L  Analyte filter method and apparatus 
US7436511B2 (en)  19990122  20081014  Sensys Medical, Inc.  Analyte filter method and apparatus 
US6317654B1 (en) *  19990129  20011113  James William Gleeson  Control of crude refining by a method to predict lubricant base stock's ultimate lubricant preformance 
US6295485B1 (en) *  19990129  20010925  Mobil Oil Corporation  Control of lubricant production by a method to predict a base stock's ultimate lubricant performance 
WO2000049424A1 (en) *  19990219  20000824  Fox Chase Cancer Center  Methods of decomposing complex data 
EP1210682A1 (en) *  19990514  20020605  ExxonMobil Research and Engineering Company  Method for optimizing multivariate calibrations 
EP1210682A4 (en) *  19990514  20030625  Exxonmobil Res & Eng Co  Method for optimizing multivariate calibrations 
US7038774B2 (en) *  19990722  20060502  Sensys Medical, Inc.  Method of characterizing spectrometer instruments and providing calibration models to compensate for instrument variation 
US20040223155A1 (en) *  19990722  20041111  Hazen Kevin H.  Method of characterizing spectrometer instruments and providing calibration models to compensate for instrument variation 
US6990238B1 (en)  19990930  20060124  Battelle Memorial Institute  Data processing, analysis, and visualization system for use with disparate data types 
US7106329B1 (en)  19990930  20060912  Battelle Memorial Institute  Methods and apparatus for displaying disparate types of information using an interactive surface map 
US6898530B1 (en)  19990930  20050524  Battelle Memorial Institute  Method and apparatus for extracting attributes from sequence strings and biopolymer material 
US20060093222A1 (en) *  19990930  20060504  Battelle Memorial Institute  Data processing, analysis, and visualization system for use with disparate data types 
US6611735B1 (en) *  19991117  20030826  Ethyl Corporation  Method of predicting and optimizing production 
WO2001075625A1 (en) *  20000403  20011011  Libraria, Inc.  Chemistry resource database 
US7027149B2 (en) *  20000516  20060411  Jeacle Limited  Photometric analysis of natural waters 
US20040130713A1 (en) *  20000516  20040708  O'mongain Eon  Photometric analysis of natural waters 
US6549861B1 (en)  20000810  20030415  EuroCeltique, S.A.  Automated system and method for spectroscopic analysis 
US6675030B2 (en)  20000821  20040106  EuroCeltique, S.A.  Near infrared blood glucose monitoring system 
WO2002051359A1 (en) *  20001227  20020704  Haarmann & Reimer Gmbh  Method for selecting cosmetic adjuvants 
US20050075796A1 (en) *  20010111  20050407  Chenomx, Inc.  Automatic identification of compounds in a sample mixture by means of NMR spectroscopy 
US7181348B2 (en)  20010115  20070220  Chenomx, Inc.  Automatic identification of compounds in a sample mixture by means of NMR spectroscopy 
US20040058386A1 (en) *  20010115  20040325  Wishart David Scott  Automatic identificaiton of compounds in a sample mixture by means of nmr spectroscopy 
US7191069B2 (en)  20010115  20070313  Chenomx, Inc.  Automatic identification of compounds in a sample mixture by means of NMR spectroscopy 
US20050143849A1 (en) *  20010420  20050630  Shunji Hayashi  Controlling method for manufacturing process 
US7151974B2 (en)  20010420  20061219  Oki Electric Industry Co., Ltd.  Controlling method for manufacturing process comprising determining a priority of manufacturing process needing recovery in response to degree of risk 
US6862484B2 (en)  20010420  20050301  Oki Electric Industry Co., Ltd.  Controlling method for manufacturing process 
US6895340B2 (en)  20010425  20050517  BristolMyers Squibb Company  Method of molecular structure recognition 
US6947913B1 (en)  20010823  20050920  Lawrence Technologies, Llc  Systems and methods for generating string correlithm objects 
US6774632B2 (en) *  20010914  20040810  Ge Medical Systems Global Technology Company, Llc  Failure prediction apparatus for superconductive magnet and magnetic resonance imaging system 
US20030052681A1 (en) *  20010914  20030320  Kazuhiro Kono  Failure prediction apparatus for superconductive magnet and magnetic resonance imaging system 
US6662116B2 (en) *  20011130  20031209  Exxonmobile Research And Engineering Company  Method for analyzing an unknown material as a blend of known materials calculated so as to match certain analytical data and predicting properties of the unknown based on the calculated blend 
US20090163369A1 (en) *  20020110  20090625  Chemlmage Corporation  Detection of Pathogenic Microorganisms Using Fused Sensor Data 
US7945393B2 (en)  20020110  20110517  Chemimage Corporation  Detection of pathogenic microorganisms using fused sensor data 
US6875414B2 (en)  20020114  20050405  American Air Liquide, Inc.  Polysulfide measurement methods using colormetric techniques 
US20030158850A1 (en) *  20020220  20030821  Lawrence Technologies, L.L.C.  System and method for identifying relationships between database records 
US20060010144A1 (en) *  20020220  20060112  Lawrence Technologies, Llc  System and method for identifying relationships between database records 
US20060123036A1 (en) *  20020220  20060608  Lawrence Technologies, Llc  System and method for identifying relationships between database records 
US7246129B2 (en)  20020220  20070717  Lawrence P Nick  System and method for identifying relationships between database records 
US7031969B2 (en)  20020220  20060418  Lawrence Technologies, Llc  System and method for identifying relationships between database records 
US7349928B2 (en)  20020220  20080325  Syngence Corporation  System and method for identifying relationships between database records 
US6807513B2 (en) *  20020227  20041019  Earth Resource Management (Erm.S)  Method for determining a spatial quality index of regionalized data 
US20030163284A1 (en) *  20020227  20030828  Luc Sandjivy  Method for determining a spatial quality index of regionalised data 
US20060158183A1 (en) *  20020329  20060720  Butters Bennett M  System and method for characterizing a sample by lowfrequency spectra 
US7081747B2 (en)  20020329  20060725  Nativis, Inc.  System and method for characterizing a sample by lowfrequency spectra 
US6995558B2 (en)  20020329  20060207  Wavbank, Inc.  System and method for characterizing a sample by lowfrequency spectra 
US20050030016A1 (en) *  20020329  20050210  Butters John T.  System and method for characterizing a sample by lowfrequency spectra 
US20040183530A1 (en) *  20020329  20040923  Butters Bennett M.  System and method for characterizing a sample by lowfrequency spectra 
US20040174154A1 (en) *  20020419  20040909  Butters Bennett M.  System and method for sample detection based on lowfrequency spectral components 
US7412340B2 (en)  20020419  20080812  Nativis, Inc.  System and method for sample detection based on lowfrequency spectral components 
US6952652B2 (en) *  20020419  20051004  Wavbank, Inc.  System and method for sample detection based on lowfrequency spectral components 
US20050176391A1 (en) *  20020419  20050811  Butters Bennett M.  System and method for sample detection based on lowfrequency spectral components 
US20050154539A1 (en) *  20020522  20050714  Matthew Butler  Processing system for remote chemical identification 
US20040033617A1 (en) *  20020813  20040219  Sonbul Yaser R.  Topological near infrared analysis modeling of petroleum refinery products 
US6897071B2 (en)  20020813  20050524  Saudi Arabian Oil Company  Topological near infrared analysis modeling of petroleum refinery products 
WO2004023112A1 (en) *  20020906  20040318  Institut Des Communications Graphiques Du Quebec  Printing medium evaluation method and device 
US20050030546A1 (en) *  20020906  20050210  Robert StAmour  Printing medium evaluation method and device 
US7304311B2 (en)  20020906  20071204  Institut Des Communications Graphiques Du Quebec  Whole printing medium evaluation method and device 
WO2005066875A1 (en) *  20020920  20050721  General Electric Company  Systems and methods for developing a predictive continuous product space from an existing discrete product space 
US20040059560A1 (en) *  20020920  20040325  Martha Gardner  Systems and methods for developing a predictive continuous product space from an existing discrete product space 
US7295954B2 (en) *  20020926  20071113  Lam Research Corporation  Expert knowledge methods and systems for data analysis 
US20040064465A1 (en) *  20020926  20040401  Lam Research Corporation  Expert knowledge methods and systems for data analysis 
US7653515B2 (en) *  20021220  20100126  Lam Research Corporation  Expert knowledge methods and systems for data analysis 
US20070124118A1 (en) *  20021220  20070531  Lam Research Corporation  Expert knowledge methods and systems for data analysis 
US7238847B2 (en)  20021223  20070703  Shell Oil Company  Apparatus and method for determining and controlling the hydrogentocarbon ratio of a pyrolysis product liquid fraction 
US20040122276A1 (en) *  20021223  20040624  Ngan Danny YukKwan  Apparatus and method for determining and contolling the hydrogentocarbon ratio of a pyrolysis product liquid fraction 
US20050033127A1 (en) *  20030130  20050210  EuroCeltique, S.A.  Wireless blood glucose monitoring system 
US7253619B2 (en) *  20030404  20070807  Siemens Aktiengesellschaft  Method for evaluating magnetic resonance spectroscopy data using a baseline model 
US20050137476A1 (en) *  20030404  20050623  Elisabeth Weiland  Method for evaluating magnetic resonance spectroscopy data using a baseline model 
US6992768B2 (en)  20030522  20060131  Schlumberger Technology Corporation  Optical fluid analysis signal refinement 
US20040233446A1 (en) *  20030522  20041125  Chengli Dong  [optical fluid analysis signal refinement] 
US7194359B2 (en) *  20040629  20070320  Pharmix Corporation  Estimating the accuracy of molecular property models and predictions 
US20050288871A1 (en) *  20040629  20051229  Duffy Nigel P  Estimating the accuracy of molecular property models and predictions 
US20080172141A1 (en) *  20040708  20080717  Simpson Michael B  Chemical Mixing Apparatus, System And Method 
US20060080041A1 (en) *  20040708  20060413  Anderson Gary R  Chemical mixing apparatus, system and method 
US20060009875A1 (en) *  20040709  20060112  Simpson Michael B  Chemical mixing apparatus, system and method 
US7281840B2 (en)  20040709  20071016  TresArk, Inc.  Chemical mixing apparatus 
US20090156659A1 (en) *  20040727  20090618  Butters John T  System and method for collecting, storing, processing, transmitting and presenting very low amplitude signals 
US20070231872A1 (en) *  20040727  20071004  Nativis, Inc.  System and Method for Collecting, Storing, Processing, Transmitting and Presenting Very Low Amplitude Signals 
US9417257B2 (en)  20040727  20160816  Nativis, Inc.  System and method for collecting, storing, processing, transmitting and presenting very low amplitude signals 
US20060190137A1 (en) *  20050218  20060824  Steven W. Free  Chemometric modeling software 
US7127372B2 (en) *  20050224  20061024  Itt Manufacturing Enterprises, Inc.  Retroregression residual remediation for spectral/signal identification 
US20060190216A1 (en) *  20050224  20060824  Boysworth Marc K  Retroregression residual remediation for spectral/signal identification 
US20070043518A1 (en) *  20050419  20070222  Nicholson Jeremy K  Method for the identification of molecules and biomarkers using chemical, biochemical and biological data 
US7373256B2 (en) *  20050419  20080513  Nicholson Jeremy K  Method for the identification of molecules and biomarkers using chemical, biochemical and biological data 
US20100148973A1 (en) *  20050525  20100617  Tolliver Charlie L  System, apparatus and method for detecting unknown chemical compounds 
US20060266102A1 (en) *  20050525  20061130  Tolliver Charlie L  System, apparatus and method for detecting unknown chemical compounds 
US20080300826A1 (en) *  20050609  20081204  Schweitzer Robert C  Forensic integrated search technology with instrument weight factor determination 
US8112248B2 (en)  20050609  20120207  Chemimage Corp.  Forensic integrated search technology with instrument weight factor determination 
US20070192035A1 (en) *  20050609  20070816  Chem Image Corporation  Forensic integrated search technology 
US20070043473A1 (en) *  20050708  20070222  Anderson Gary R  Pointofuse mixing method with standard deviation homogeneity monitoring 
US7363114B2 (en)  20050708  20080422  TresArk, Inc.  Batch mixing method with standard deviation homogeneity monitoring 
US20070043471A1 (en) *  20050708  20070222  Anderson Gary R  Batch mixing method with standard deviation homogeneity monitoring 
US7363115B2 (en)  20050708  20080422  TresArk, Inc.  Batch mixing method with first derivative homogeneity monitoring 
US20070106425A1 (en) *  20050708  20070510  Anderson Gary R  Pointofuse mixing method with first derivative homogeneity monitoring 
US20070043472A1 (en) *  20050708  20070222  Anderson Gary R  Batch mixing method with first derivative homogeneity monitoring 
US9201053B2 (en)  20050901  20151201  Kuwait University  Method for measuring the properties of petroleum fuels by distillation 
US8645079B2 (en)  20050901  20140204  Kuwait University  Method for measuring the properties of petroleum fuels by distillation 
US20070050154A1 (en) *  20050901  20070301  Albahri Tareq A  Method and apparatus for measuring the properties of petroleum fuels by distillation 
US20100204925A1 (en) *  20050901  20100812  Tareq Abduljalil Albahri  Method for measuring the properties of petroleum fuels by distillation 
US7940043B2 (en) *  20060313  20110510  William Marsh Rice University  NMR method of detecting precipitants in a hydrocarbon stream 
US20090256562A1 (en) *  20060313  20091015  Shuqiang Gao  Nmr method of detecting precipitants in a hydrocarbon stream 
US20110237446A1 (en) *  20060609  20110929  Chemlmage Corporation  Detection of Pathogenic Microorganisms Using Fused Raman, SWIR and LIBS Sensor Data 
US20080077644A1 (en) *  20060710  20080327  Visx, Incorporated  Systems and methods for wavefront analysis over circular and noncircular pupils 
US8983812B2 (en)  20060710  20150317  Amo Development, Llc  Systems and methods for wavefront analysis over circular and noncircular pupils 
US8504329B2 (en) *  20060710  20130806  Amo Manufacturing Usa, Llc.  Systems and methods for wavefront analysis over circular and noncircular pupils 
US8158945B2 (en) *  20070502  20120417  Siemens Aktiengesellschaft  Detector arrangement for a nondispersive infrared gas analyzer and method for the detection of a measuring gas component in a gas mixture by means of such a gas analyzer 
US20100134784A1 (en) *  20070502  20100603  Ralf Bitter  Detector Arrangement for a Nondispersive Infrared Gas Analyzer and Method for the Detection of a Measuring Gas Component in a Gas Mixture by Means of Such a Gas Analyzer 
US20100127217A1 (en) *  20070615  20100527  David Lightowlers  Method for the online analysis of a vapour phase process stream 
US20090192340A1 (en) *  20071101  20090730  Robert Dielman Culp  Alkylaromatic dehydrogenation system and method for monitoring and controlling the system 
US20090144022A1 (en) *  20071203  20090604  Smiths Detection Inc.  Mixed statistical and numerical model for sensor array detection and classification 
US7672813B2 (en)  20071203  20100302  Smiths Detection Inc.  Mixed statistical and numerical model for sensor array detection and classification 
US20090198463A1 (en) *  20080131  20090806  Kumiko Kamihara  Automatic analzyer 
US8150645B2 (en) *  20080131  20120403  Hitachi HighTechnologies Corporation  Automatic analzyer 
US20100305872A1 (en) *  20090531  20101202  University Of Kuwait  Apparatus and Method for Measuring the Properties of Petroleum Factions and Pure Hydrocarbon Liquids by Light Refraction 
EP2480875A4 (en) *  20090924  20171011  Commw Scient Ind Res Org  Method of contaminant prediction 
US20110153035A1 (en) *  20091222  20110623  Caterpillar Inc.  Sensor Failure Detection System And Method 
CN103210301B (en) *  20100913  20150729  Mks仪器股份有限公司  Chemical compounds monitoring, detection, and quantification of the sample 
CN103210301A (en) *  20100913  20130717  Mks仪器股份有限公司  Monitoring, detecting and quantifying chemical compounds in sample 
US20120116731A1 (en) *  20101104  20120510  Charles David Eads  Multidimensional relaxometry methods for consumer goods 
US20120227043A1 (en) *  20110303  20120906  Mks Instruments, Inc.  Optimization of Data Processing Parameters 
US8725469B2 (en) *  20110303  20140513  Mks Instruments, Inc.  Optimization of data processing parameters 
US20130325418A1 (en) *  20120530  20131205  Exxonmobil Research And Engineering Company  System and method to generate moledular formula distributions beyond a predetermined threshold for a petroleum stream 
US9665693B2 (en) *  20120530  20170530  Exxonmobil Research And Engineering Company  System and method to generate molecular formula distributions beyond a predetermined threshold for a petroleum stream 
US20160123872A1 (en) *  20141029  20160505  Chevron U.S.A. Inc.  Method and system for nir spectroscopy of mixtures to evaluate composition of components of the mixtures 
US9678002B2 (en) *  20141029  20170613  Chevron U.S.A. Inc.  Method and system for NIR spectroscopy of mixtures to evaluate composition of components of the mixtures 
Also Published As
Publication number  Publication date  Type 

JP3130931B2 (en)  20010131  grant 
JPH06502247A (en)  19940310  application 
CA2093015A1 (en)  19920413  application 
DE69128357T2 (en)  19980716  grant 
EP0552291A1 (en)  19930728  application 
WO1992007326A1 (en)  19920430  application 
CA2093015C (en)  19991221  grant 
EP0552291A4 (en)  19941026  application 
EP0552291B1 (en)  19971203  grant 
DE69128357D1 (en)  19980115  grant 
Similar Documents
Publication  Publication Date  Title 

Tobias  An introduction to partial least squares regression  
Le Borgne et al.  Evolutionary synthesis of galaxies at high spectral resolution with the code PEGASEHRMetallicity and age tracers  
Rodgers  Characterization and error analysis of profiles retrieved from remote sounding measurements  
Leardi et al.  Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions  
Helland  On the structure of partial least squares regression  
US6687620B1 (en)  Augmented classical least squares multivariate spectral analysis  
Currie  Detection and quantification limits: origins and historical overview1  
Mark  Principles and practice of spectroscopic calibration  
US4660151A (en)  Multicomponent quantitative analytical method and apparatus  
US6223133B1 (en)  Method for optimizing multivariate calibrations  
Renfroe et al.  Nondestructive spectrophotometric determination of dry matter in onions  
US6697654B2 (en)  Targeted interference subtraction applied to nearinfrared measurement of analytes  
Small et al.  Strategies for coupling digital filtering with partial leastsquares regression: Application to the determination of glucose in plasma by Fouriertransform nearinfrared spectroscopy  
De Noord  Multivariate calibration standardization  
Hocking et al.  The regression dilemma  
Cogdill et al.  Singlekernel maize analysis by nearinfrared hyperspectral imaging  
Olivieri et al.  MVC2: a MATLAB graphical interface toolbox for secondorder multivariate calibration  
US5610836A (en)  Process to use multivariate signal responses to analyze a sample  
Wold et al.  The multivariate calibration problem in chemistry solved by the PLS method  
Fuller et al.  Partial leastsquares quantitative analysis of infrared spectroscopic data. Part I: Algorithm implementation  
Frankenberg et al.  Iterative maximum a posteriori (IMAP)DOAS for retrieval of strongly absorbing trace gases: Model studies for CH 4 and CO 2 retrieval from near infrared spectra of SCIAMACHY onboard ENVISAT  
Geladi et al.  Partial leastsquares regression: a tutorial  
Garthwaite  An interpretation of partial least squares  
Hopke  The evolution of chemometrics  
Peng et al.  Prediction of apple fruit firmness and soluble solids content using characteristics of multispectral scattering images 
Legal Events
Date  Code  Title  Description 

AS  Assignment 
Owner name: EXXON RESEARCH & ENGINEERING CO., NEW JERSEY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GETHNER, JON S.;TODD, TERRY R.;BROWN, JAMES MILTON;REEL/FRAME:007326/0147 Effective date: 19901005 

FPAY  Fee payment 
Year of fee payment: 4 

FPAY  Fee payment 
Year of fee payment: 8 

FPAY  Fee payment 
Year of fee payment: 12 