MXPA98001056A - Analysis of a biological fluid using the detection of results in aisla intervals - Google Patents

Analysis of a biological fluid using the detection of results in aisla intervals

Info

Publication number
MXPA98001056A
MXPA98001056A MXPA/A/1998/001056A MX9801056A MXPA98001056A MX PA98001056 A MXPA98001056 A MX PA98001056A MX 9801056 A MX9801056 A MX 9801056A MX PA98001056 A MXPA98001056 A MX PA98001056A
Authority
MX
Mexico
Prior art keywords
calibration
group
sample
data
samples
Prior art date
Application number
MXPA/A/1998/001056A
Other languages
Spanish (es)
Other versions
MX9801056A (en
Inventor
F Price John
R Long James
Original Assignee
Boehringer Mannheim Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US08/587,017 external-priority patent/US5606164A/en
Priority claimed from PCT/US1996/012625 external-priority patent/WO1997006418A1/en
Application filed by Boehringer Mannheim Corporation filed Critical Boehringer Mannheim Corporation
Publication of MX9801056A publication Critical patent/MX9801056A/en
Publication of MXPA98001056A publication Critical patent/MXPA98001056A/en

Links

Abstract

The present invention relates to a method and an apparatus is disclosed for measuring the concentration of an analyte present in a biological fluid. The method includes the steps of applying NIR radiation to calibration samples to produce calibration data, analyzing calibration data to identify and eliminate isolated results, constructing a calibration model, collecting and analyzing unknown samples to identify and eliminate isolated results, and predict the analyte concentration of non-isolated results with the calibration model. The analysis of the calibration data includes pretreatment of data, decomposition of data to eliminate redundant data, identification and elimination of isolated results using generalized intervals. The apparatus (100) includes a pump (102) which circulates a sample through a tube (104) to fill a flow of cells (106). The light of a NIR source (114) is synchronized with a detector (110), facilitates light and dark measurements, and passes through a monochrometer (120) and the cell (106) flow and straight to the detector (110). ), where the radiation transmitted through the sample

Description

ANALYSIS OF A BIOLOGICAL FLUID USING THE DETECTION OF RESULTS IN ISOLATED INTERVALS BACKGROUND OF THE INVENTION Spectral analysis is widely used when identifying and quantifying analytes (substance to be detected or measured) in a sample of a material. A form of spectral analysis measures the amount of electromagnetic radiation that is absorbed by a sample. For example, an infrared spectrophotometer directs a wave of infrared radiation to a sample, and then measures the amount of radiation absorbed by the sample over a range of infrared wavelengths. An absorbance spectrum can then be plotted which represents the absorbance of the sample as a function of wavelength. The shape of the absorbance spectrum, which includes REF: 026764 relative magnitudes wavelengths of absorbance peaks, serves as a "fingerprint" characteristic of particular analytes in the sample. The absorbance spectrum can give useful information by identifying analytes in a sample. In addition, the absorbance spectrum can also be used for quantitative analysis of the analyte concentration in the sample. In many cases, the absorbance of an analyte in a sample is approximately proportional to the concentration of the analyte in the sample. In these cases where an absorbance spectrum represents the absorbance of a single analyte in a sample, the concentration of the analyte can be determined by comparing the absorbance of the sample with the absorbance of a reference sample at the same wavelength, where the sample of reference contains a known concentration of the analyte. A fundamental purpose of a near-infrared spectroscopic method for measurements of biological fluid analyte concentration such as blood glucose levels is to collect high quality data. Although with great care can be taken to ensure reliable measurements in the preparation of the consistent sample and data acquisition, the data generated by the test Clinical and instrumentation reference, equal in all data, are susceptible to inclusion of errors from a number of sources. In large groups of data, it is not common to have a number of measurements that are extremely deviated from the expected distribution of measurements, commonly referred to as isolated results. If the isolated results result from statistical errors or systematic errors, the detection of the isolated results identifies samples containing such errors with sufficient confidence that such samples can be considered unique with respect to the sampled population. The inclusion of a small number of isolated results within a group of measurements can degrade or destroy a calibration model that should otherwise be obtained by the measurements. Referring to the method and apparatus of the present invention, it has at least four potential sources of error in a chemometric analysis for biological fluid analyte measurements such as measurements of blood glucose levels. A first source of error is related to the preparation of the sample. Blood serum samples require a large distribution of the preparation before the chemometric analysis. During this preparation, a number of factors can affect the sample. For example, the amount of time that blood samples are allowed to coagulate can affect the sample continuously in terms of fibrogen content. The coagulation levels also impact the quality of centrifugation and finally the decanting of the serum from the cells. The samples prepared for clinical trial determine the quality of the data used for reference and calibration, so that with great care can be exercised with the samples since these data can finally define the limit of the prediction capabilities. A second source of error may result from the spectral measurement process. For example, the use of a flow of cells for the sample contained during the acquisition of data is susceptible to problems such as bubbles in the optical course as well as effects of dilution of a saline reference reference solution. These dilution effects are usually negligible, but bubbles in the optical course are not uncommon and have a severe impact on the ity of the data. In addition, errors caused by mechanical or electronic problems that occur in the instrumentation of the analysis can have important effects on the quality of the data. A third source of error is also related to the reference tests. Errors due to off-spec instrument controls and a low sample volume during the clinical trial have similar effects to errors related to sample preparation, described above. A fourth source of error, and probably the most difficult to identify and control, is related to the sources of the samples, that is, by the individuals that provide the biological fluids. A sample taken from an individual may at first sight be absolutely the only one with respect to a previously sampled population, but may in fact be an ordinary sample when a large sample population is considered, that is, a single collocative sample may only be a Subsampling artifact. All these errors, alone or in combination, can lead to calculated values of the concentration of the biological fluid analyte that has the greatest variation with respect to measurements of samples taken from the same individual at approximately the same time. These extremely deviated values, which can be ordered larger or smaller than a Predicted average value, the isolated results must be identified before constructing a model to predict the concentration of analytes of biological fluid. The elimination of isolated results from a group of data can be done in a quantitative and subjective sense by graphical inspection of the data plotted in these cases when the dimensionality is low, that is, when the number of data results associated with each of the measurements is small. In these examples where the number of results of the data associated with each of the measurements is large, in any case, the detection of isolated results can be faster and performed efficiently by a number of automated procedures such as residual analysis. In any case, such procedures are frequently subject to a number of errors, or at least subject to errors in interpretation, especially in the relatively large dimensional space that are typically associated with multifactorial chemometric analyzes.
BRIEF DESCRIPTION OF THE INVENTION To ensure accurate and consistent results, the chemometric applications for measuring the Biological fluid analyte, such as determination of glucose concentration, require multiple measurements taken from a number of individuals subject to testing for a period of time. In any case, with the preparation of the consistent sample, and the acquisition of data, natural variations in samples and unexpected errors can decrease the accuracy of the results. In addition, these errors are increased relatively by the small number of samples of biological fluid that can be economically taken and tested. Automated techniques for detection of isolated results are necessary to evaluate the capacity of all samples acquired during both phase search and in end uses. The quality of the data during the clinical studies can define the calibration models and the direction of the subsequent directed searches depends on the results. In a final use, visual inspection of acquired data may or may not be possible. Even if a data inspection is possible, it is necessary that the objective independent methods for determination are not susceptible to subjective influences. In order to assist in the understanding of the present invention, this can be placed in the form of brief summary essentially directed by a method and a I set out to measure concentrations of the analyte of biological fluid that uses the identification of isolated results and elimination based on the generalized intervals. The present invention improves the accuracy of the determination of analyte concentrations of biological fluid by the identification of values of isolated results, and identify and eliminate the isolated results of the data before the formation of a calibration model. The present invention provides a method and an apparatus according to which the concentration of an analyte in a sample of a biological fluid can be investigated by spectral analysis of electromagnetic radiation applied to a sample, which includes collection of the calibration data, analyzing the data of calibration to identify and eliminate isolated results using the calibration model, analyze the data of unknown samples to identify and eliminate isolated results, and predict concentrations of non-isolated results in the data of unknown samples when using the calibration model. Calibration data group can include data pre-treatment, data decomposition to eliminate data redundant, and identification and elimination of isolated results that have a low probability of being members of this category, when using generalized interval methods. The construction of a calibration model can use a regression of the main component, partial least squares, multiple linear regression, or artificial neural networks, according to which the group of calibration data can be reduced to significant factors using analysis of the main component or data records. partial least squares, allows the calculation of regression coefficients and the weights of artificial neural networks. Unknown sample data can be analyzed using a data pre-treatment, followed by projection within the space defined by the calibration model; and identification and elimination of isolated results in unknown sample data that have a low probability of being members of this category. Predicting the analyte concentration of an unknown sample can include projecting data from the unknown sample into the space defined by the calibration model, thereby enabling the determination of the analyte concentration.
A first embodiment of the apparatus of the present invention includes a pump in which a sample is introduced, the pump acts to circulate the sample through a tube to fill a flow of cells, with the pump capable of performing both continuous flow operations and blocking flow. A sample container compartment containing the cell flow and a detector is controlled by the temperature by a temperature control unit. The light from a near-infrared source of a relatively wide bandwidth is directed through a light-cutting wheel, and the light-cutting wheel is synchronized by a wave-synchronizing synchronization unit with respect to the detector, facilitating the The apparatus of the present invention performs both light and dark measurements to substantially eliminate electronic interference. The modulated light then passes through a monochrometer, allowing the variation of the wavelength of radiation continuously over an appropriate range. The monochromatic light passes through the flow of cells and straight to the detector, according to which the amount of light transmitted through the sample is measured. The measurement data is stored in a programmable computer for general purposes that has a general purpose microprocessor, suitable for another process according to the present invention. In addition, the computer can also control the operation of the pump, the temperature control unit, the synchrotron wave synchronization unit, the light cutting wheel, and the monochrometer. In a second embodiment of the apparatus of the present invention, the light from a light source of a relatively wide bandwidth is directed through the light cutting wheel, and after that the modulated light is passed through a wheel filter, by means of which separated wavelengths of radiation can be selected and transmitted to the cell flow. In a third embodiment of the apparatus of the present invention, a plurality of near infrared sources of the width of the relatively narrow band, such as a plurality of laser diodes, is provided to produce an infrared radiation close to a preselected plurality of wavelengths of wave. The light from a near infrared source of a selected narrow bandwidth can be pulsed by an exciter in synchronization with the detector and directed into the cell stream 106. The synchronization of the near infrared source of the narrow bandwidth and the detector allows the apparatus to perform both light and dark measurements, thereby substantially eliminating significant electronic interference. The selection of each of the groups of near infrared sources of narrow bandwidth for the emission of light to be transmitted within the cell stream can be selected in a convenient order for example in order of increasing or decreasing the wavelength, by setting the computer to sequential pulses each of the near infrared source groups of narrow bandwidth. In the implementation in the computer the method and apparatus of the present invention, variations in the intensity of light transmitted as a function of the wavelength are converted into digital signals by the detector, with the magnitude of the digital signals determined by the intensity of the radiation transmitted at the wavelength assigned to this particular signal. After this, the digital signals are placed in the computer's memory to process as described. The steps of the method of the present invention include as a first step the data collection to be used in the construction of a model of calibration. After the calibration data has been collected, the data pretreatment can be developed in order to eliminate or compensate for the spectral artifacts such as dispersion effects (multiplicative), baseline changes, and instrumental interference. The pretreatment of the calibration data can be selected from the group of techniques that include calculating derivatives of order n of spectral data, correction of the multiplicative dispersion, smoothing of n-results, centering of the mean, adjustment of the variance,. and the eosientimetric method. Once the data has been pre-processed, if any, it has been developed on the calibration data not analyzed, a calibration model can be formed. Just as the variables of the near-infrared spectral data are greatly correlated, to reduce the level of redundant information present, near-infrared spectral calibration data can be formed in an nxp matrix representing n samples each measured at p wavelengths. The nxp matrix can be decomposed by analyzing the main component in a group of n, n-dimensional record vectors formed in a matrix, and a group n, p-dimensional load vectors formed in a nxp load matrix. The registration vectors are orthogonal and represent projections of n spectral samples in the space defined by the load vectors and the major sources of variation. The analysis of the main component generates a group of n eigenvectors and a group of eigenvalues, ?? > ? .r > ... =? n. The eigenvalues represent the variance explained by the associated eigenvectors and can be divided into two groups. The first eigenvalues are primary eigenvalues, ?? =? ¿> ... =? q, and the calculation for the significant sources of variations within the data. The remaining n-q secondary eigenvalues (error)? q +? > ? q + 2 > ... =? q the calculation for the residual variance or measured interference. The number of primary eigenvalues q can be determined by an iterative method that compares the variances of q eigenvalues with the variance of the eigenvalues error gathered by means of an F-test. In addition, the reduced eigenvalues can be used, whose weight eigenvalues by an amount proportional to the information explained by the associated eigenvectors. The q values recorded for each sample are used to represent the original data during the detection of isolated results, with the original spectrum projected in the nxq dimension of the main component of the subspace defined by the loading of the matrix. Isolated results can be identified using generalized ranges such as Mahalanobis interval or Pobust interval. A generalized interval between a sample and the centroid defined by a group of samples can be determined using the variance-covariance matrix of the group of samples. Where the true variance-covariance matrix and the true centroid of a complete set of samples are unknown, a subgroup of the whole group can be used to form an approximation of the variance-covariance matrix and an approximate centroid. In addition, when using registers -of the main component to represent spectral data for each sample-, independent variables that maximize the information contained, which ensures an approximate inverted variance-covariance matrix, can be obtained. With respect to the Mahalanobis interval, an approximate centroid can be determined as the centroid of a final ultivariate normal distribution of the group of calibration samples, according to which approximate Mahalanobis intervals can be found in units of standard deviations measured between the centroid and each calibration sample. With respect to the Robust interval, it can be obtained by using an ellipsoidal minimum volume estimator (MVE), the estimated Robust of a variance-covariance matrix and an approximate centroid. Alternatively, an algorithmic projection can be used to determine the Robust interval for each calibration sample. After determining the generalized intervals to the calibration samples, the probability of the members of this category can be determined by a number of techniques, including evaluation of a chi-square distribution function or using the Hotelling T-statistic. Isolated results are identified as having relatively large generalized intervals that result in a low probability of being members of this class. Samples whose members of this class can be rejected at a confidence level greater than approximately 3-5s can be considered as isolated results. Following the identification, the isolated results can be eliminated in the calibration samples. The generalized ranges of isolated results of the calibration samples can be examined to determine whether a Additional data pretreatment is necessary. In the event that a relatively large number of isolated results have large generalized ranges, another pretreatment of the calibration data may be indicated. After such additional pre-treatment, the calibration data can again be subjected to analysis. On the other hand, if relatively large numbers of single results do not have large generalized intervals, then additional data pretreatment may not be appropriate. A calibration model can then be constructed using any of the techniques, which include major component regression (PCR), partial least squares (PLS), multiple linear regression (MLR), and artificial neural networks (ANN). The calibration model can try to correlate a group of independent variables that represent absorbance values of n samples each measured at wavelengths, with a group of dependent or response variables that represent the concentration of an analyte in each of the n samples, by using a p-dimensional regression coefficient vector. A calibration model determines the vector of the regression coefficient and is used to Predict the concentration of the analyte in other samples, giving only the absorbance at the p wavelengths. As noted, the variables of the near-infrared spectrum data have large correlations and prudent selection of the wavelengths can uniquely minimize problems, the spectral regions of interest can suffer from severe overlaps and a large number of wavelengths are needed. wave to model a multicomponent system. Data compression can be used to address problems with collinearity to determine the vector of the regression coefficient, so that redundant data can be reduced to significant factors. The regression of the main component is a technique that incorporates a data compression method. The technique of partial least squares can also be used to address the problem of redundant data. With respect to both the regression of the main component and the partial least squares, a determination is made of the appropriate number of registration factors or factors to be included in a calibration model that adequately represents the calibration data. The objective of selecting the number Optimal of regression factors is to obtain parcimonious models with strong prediction skills. Including so few factors causes the development of the model to be damaged due to inadequate information during calibration, including many factors can also degrade the development. The main components are normally classified in an order so that the amount of variation explained by each of the main components decays monotonically. Then ordering the main components associated with small eigenvalues can be considered as containers of interference measurements. By using only the first q factors and omitting remaining factors, a type of interference elimination can be incorporated into the regression of the main component. The number of analysis of principal components or factors or partial least squares records to be used during the regression step can be determined using the standard error of prediction, a measure of the error associated with each of the groups of predictions. By plotting the standard prediction error against the number of factors used in each of the respective prediction groups, a continuous graphical representation can be obtained in parts and used for determine the number of factors retained. A criterion for the selection of the factor is to determine the first local minimum. Another technique for factor selection uses an F-test to compare the standard error of prediction with models that use different numbers of factors. In certain examples, the data that is analyzed may not be willing to be divided into a calibration, the training group and a validation group, test group. The reason may be due to a limited number of suitable samples or ppr the division of data into two groups, one or both of the resulting groups do not adequately represent the population of the sample. In such a situation, the iterative technique of omitting one of the cross-validations can be used where, during each iteration, a sample is excluded from the calibration group and used as a test sample. Prediction models that use determined factors of the calibration samples are then used to make predictions of the test samples. The test sample is then returned to a calibration group and another sample is excluded. The same process is repeated until all the samples have been excluded from the calibration group and predicted by models generated by samples of calibration. All predictions are accumulated to give a standard validation error. Subsequently to determine the number of significant factors, the data set for the calibration model can be reduced to significant factors, and the regression coefficients for the calibration model can be determined. After the construction of the calibration model, the calibration model can be applied to data collected from samples where the concentration of analytes of interest are unknown. The data from the unknown samples can be pretreated appropriately and then projected in the defined space of the main component by the calibration model. Next, the generalized ranges for the group of data from unknown samples can be found, using, for example, any Mahalanobis or Robust interval as used with respect to the calibration data, and the probability of the members of this class can be estimated using the techniques described above, which include an evaluation of the chi-square distribution function or using the Hostelling T-statistic. The isolated results of the data from the unknown sample are then identified based on the Elimination of members of this class at a confidence level greater than approximately 3-5s. Like the final steps of the present invention, in the event that an unknown sample does not have an isolated result, the sample is projected in the space defined by the calibration model, and a prediction of the analyte concentration is made. On the other hand, if the unknown sample has an isolated result, the unknown sample can reject and does not make predictions of the analyte concentration, although if possible, making measurements again of the unknown sample can be done to verify that the sample has a result isolated. With respect to the apparatus of the present invention, the previous steps described with respect to the method of the present invention can be configured in a general-purpose microprocessor of the computer by using code segments of a computer program according to each of the steps. As the experience in the art can be appreciated, the present invention is projected to encompass without limitations a range of modalities so that they can be better understood with reference to the diagrams and following the detailed description of the preferred embodiments of the invention.
BRIEF DESCRIPTION OF THE DIAGRAMS. FIG 1 is a schematic block diagram of a first preferred embodiment of the apparatus for measuring the concentration of the biological fluid analyte represented by the present invention. FIG 2 is a schematic block diagram of a second preferred embodiment of the apparatus for measuring the concentration of the biological fluid analyte represented by the present invention. FIG 3 is a schematic block diagram of a third preferred embodiment of the apparatus for measuring the concentration of the biological fluid analyte represented by the present invention. FIG 4 is a flow diagram depicting the initial steps of the method for measurements of biological fluid concentrations represented by the present invention. FIG 5 is a flow chart depicting the intermediate steps of the method for measurements of biological fluid concentrations represented by the present invention.
The FIG ß is a flow diagram representing the final steps of the method for measurements of biological fluid concentrations represented by the present invention. FIG 7 is a scatter diagram of the main component 2 against the main component 1 of the near infrared spectrum of 111 blood glucose samples in a range of 1580 nm to 1848 nm. FIG 8 is a scatter diagram of the main component 2 against the main component 1 of the infrared spectrum next of 111 samples of. blood glucose in a range of 2030 nm to 2398 nm. FIG 9 is a scatter diagram of the main component 3 against the main component 2 of near infrared spectrum of 111 blood glucose samples in a range of 2030 nm to 2398 nm. FIG 10 is a bar graph of the Mahalanobis ranges calculated from 103 blood glucose samples in the range of 1100 nm to 2398 nm taken from data shown in FIGS 7-9. FIG. 11 is a scatter plot of blood glucose concentrations of 103 samples using data derived from 2030 nm at 2398 nm, generated from a partial least squares model optimized with twenty factors making an error validation standard of 64.10 mg / dL against current blood glucose concentrations. FIG. 12 is a scatter plot of blood glucose concentrations of 100 samples using data derived from 2030 nm to 2398 nm, generated from a partial least squares model optimized with twenty factors achieving a standard error of validation of 27.43 mg / dL against current blood glucose concentrations. FIG 13 is a bar graph of the Mahalanobis ranges calculated from 100 blood glucose samples in the range of 1580 nm to 1848 nm taken from data shown in FIGS 7-9. FIG 14 is a bar graph of the Mahalanobis ranges calculated from 100 blood glucose samples in the range of 2030 nm to 2398 nm taken from data shown in FIGS 7-9. FIG. 15 is a scatter plot of blood glucose concentrations of 95 samples using data derived from 2030 nm to 2398 nm, generated from an optimized partial least squares model with twenty factors achieving a standard error of validation of 26.97 mg / dL against current blood glucose concentrations.
FIG 16 is a table depicting a summary of the results of isolated detection results for 111 blood glucose samples over spectrum ranges 1580 nm at 1848 nm and 2030 nm at 2398 nm using the present invention, and indicating the possible causes of the error in the sample. FIG 17 is a graph of the standard error of predictions against the number of factors used during the regression.
DESCRIPTION OF THE PREFERRED MODALITIES. The following parts of the specification, taken in conjunction with the diagrams, hereinafter groups of preferred embodiments of the present invention. The embodiments of the invention reveal here that they are the best modes contemplated by the inventors for carrying out their invention in a commercial environment, although it should be understood that various modifications can be made to the parameters of the present invention. Referring now to the drawings for a detailed description of the present invention the reference is first made to FIG 1, which represents a first preferred embodiment of a measurement of the concentration of the analyte of biological fluid. In an apparatus 100, a sample of biological fluid can be introduced into a pump 102, the pump acts to circulate the sample through a tube 104 to fill a flow of cells 106. The pump 102 may be capable of performing both flow operations continuous and blocking flow. A compartment of the sample containing the flow of cells 106 and a detector 110, and the temperature is controlled by a temperature control unit 112. The light from a near infrared source of a relatively broad bandwidth 114 is directed to through a light cutting wheel 116. The light cutting wheel 116 is synchronized by a wave synchronization synchronization unit 118 with respect to the detector 116, it facilitates the apparatus 100 to perform both light and dark measurements to substantially eliminate the electronic interference . The modulated light then passes through a monochrometer 120, allowing the variation of the radiation wavelength over an appropriate range. The monochromatic light passes through the flow of cells 106 and straight to the detector 110. The detector 110 measures the amount of light transmitted through the sample. The data of the measurements are stored in a programmable computer for general purposes 124 that has a general-purpose microprocessor, suitable for another process as described. In addition, the computer 124 can also control the operation of the pump 102, the temperature control unit 112, the wave descrambling synchronization unit 118, the light cutting wheel 116, and the monochrometer 120. In a second embodiment of the apparatus 100 as described in FIG 2, the light of a light source of a relatively large bandwidth 114 is directed through the light cutting wheel 116, and thereafter the modulated light is passed through a filter wheel 130, by means of which lengths wavelengths of the radiation can be selected and transmitted to the cell stream 106. In a third embodiment of the apparatus 100 of the present invention as described in FIG. 3, a plurality of near infrared sources of the narrow band width 134, such as a plurality of laser diodes, is provided to produce an infrared radiation close to a preselected plurality of wavelengths. Light from a near infrared source of a selected narrow bandwidth 134 may be pulsed by a driver 138 in synchronization with the detector 110 and directed within of the flow of cells 106. The synchronization of the near infrared source of the selected narrow bandwidth 134 and the detector 110 allows the apparatus 100 to perform both light and dark measurements, thereby substantially eliminating the electronic interference. The selection of each of the groups of near infrared sources of narrow bandwidth 134 for the emission of light to be transmitted within the cell stream 106 may be selected in a convenient order, for example in order of increase or decrease of the wavelength, by configuring the computer 124 to sequential pulses each of the near infrared source groups of narrow bandwidth. Referring to FIGS 1-3, in the implementation in the computer the method and apparatus of the present invention, variations in the intensity of light transmitted as a function of the wavelength are converted into digital signals by the detector, with the magnitude of the digital signals determined by the intensity of the radiation transmitted at the wavelength assigned to this particular signal. After this, the digital signals are placed in the memory of the computer 124 to process as described.
As symbolically represented in FIG 4, step 1 in the method of the present invention relates to the collection of data to be used in the development of calibration and thereafter the construction of a calibration model. After the calibration data has been collected, the pretreatment of data from step 2 can be developed, it is often necessary to pre-treat the spectral data without treatment before analyzing the data or constructing the calibration model in order to eliminate or compensate for the spectral artifacts such as scattering effects (multiplicative), baseline changes, and instrumental interference. The pretreatment of the calibration data can be selected from the group of techniques that include calculating derivatives of order n of spectral data, correction of the multiplicative dispersion, smoothing of n-results, centering of the mean, adjustment of the variance, and the cosienti method. electric Once the data has been pre-processed, if any, it has been developed on the calibration data not analyzed, the directed steps that can form a calibration model can be taken. With reference to step 3 as shown in FIG. 4, the variables of the near infrared spectral data are highly correlated To reduce the level of redundant information present, near-infrared spectral calibration data can be formed in an X nxp matrix that represents n samples each measured at wavelengths, which can be decomposed by analysis of the main component in a group of n, n-dimensional record vectors formed in a matrix T of record nxn, and a group n, p-dimensional load vectors formed in a matrix L of charge nxp, with X = TL '(1) In most spectroscopic applications, p > n, so that the decomposition can be considered the decomposition of the matrix X of rank n in a sum of n to 1 matrices. The register vectors represent projections of n spectral samples in X the space defined by the load vectors. The register matrix T represents the largest variation source found within X, and the column of vectors in T are orthogonal. Referring to steps 4 and 5 as described in FIG 4, the analysis of the main component generates a group of n eigenvectors and a group of eigenvalues,?] > ? > ... =? r. The eigenvalues represent the variance explained by the associated eigenvectors and can be divided into two groups. The first eigenvalues are primary eigenvalues, ?? > ? 2 > ... =? , and the calculation for the significant sources of variations within the data. The remaining n-q secondary eigenvalues (error)? Q +? > ? q +: > ... =? q the calculation for the residual variance or interference measurements. With reference to steps 6 and 7 of FIG 4, the number of primary eigenvalues q can be determined by an iterative method that compares the variation of q eigenvalues to the variation of the assembled eigenvalue error by means of an F-test, F (ln - q) = -----_. ("< • /) (2: In addition, the reduced eigenvalues whose weight of eigenvalues by an amount proportional to the information explained by the associated eigenvectors can be used with the reduced eigenvalue is defined as A (3) (n - q. \) (P - q 1) So that equation 2 can be expressed as The i-th sample in the subspace of the main component is represented by the q register values of t_. The log values for each sample are used to represent the original data during the detection of isolated results. Meanwhile, the original spectra are projected into the nxq subspace of the dimensioned principal component defined by the charge matrix L. As symbolically represented in steps 8 and 9 of FIG 4, the isolated results can identify themselves using generalized intervals, such as the Mahalanobis interval or the Robust interval.
A generalized interval between a centroid μ of a group of samples and the sample Xi i-th can be determined from Where ? is the variance-covariance matrix of the group of samples. Where the true variance-covariance matrix and the true centroid of a whole group of samples are not determinable, a subgroup of the whole group can be used to form an approximation of the variance-covariance matrix and an approximate centroid. In addition, by using main component records to represent spectral data for each sample, the independent variables are orthogonal, thus maximizing the information contained, and ensuring an approximate inverted variance-covariance matrix. The generalized ranges can be Mahalanobis ranges as described in step 10a of FIG. 4, with an approximate centroid x can be determined as the centroid of a multivariable normal distribution of the group of calibration samples and an approximate variance-covariance matrix of the group of calibration samples S. Approximate Mahalanobis intervals MDX in units of standard deviations measured between the centroid and an i-th sample of calibration xx can thus be determined from where ? (? -. -? ') (X - X) S = With respect to the Robust interval as described in step 10b of FIG 4, when using a minimum volume ellipsoidal estimator (MVE), the estimated Robust of a variance-covariance matrix S .fc .-. St and an approximate centroid xRob, - .. can be obtained, with Robust RD intervals, - for the i calibration sample determined from -) S ". (X -.- XRotu.-t) tt] ii / r Alternatively, a projection of the algorithm can be used to determine the Robust RdL interval for each i-th calibration sample from for g = l, ..., n and where a scale of the minimum ellipsoidal volume is given by Ztx ^ g1, ..., xnvgt) = (l + (15 / n-p)) (xjVg1- xj.n / 2vgt) (10) and a placement of an ellipsoidal minimum volume is given by L (Xi Vq, ..., XnV, Xi + Xi-n z) / 2 11 xx is a p-dimension vector representing the i-th calibration sample, and vq is a p-dimension vector representing the g calibration sample defined by vq = xn-M 12! where M is a p-dimension vector such that the r component of M is given the mean of a group formed by the r component of each n vectors x3. For each value of g = l, ..., a j used in equations 10 and 11 is determined by +? Vgt-X? Vc, xin /?) + 2Vgt-x2v7t, ..., (xnvg 'x - / _ v3: i3: where X? Vq <x2vq <x-vg = ... < xnvg After determining the generalized ranges to the calibration samples referred to step 11 shown in FIG 4, the probability of the members of this class for a number of techniques, including or use the Hotelling T-statistic. As described in step 12, the isolated results are identified as having relatively large generalized intervals that result in a low probability of being members of this class. Talking in general, samples whose members of this class can be rejected at a confidence level in the range of approximately 3-5s can be considered as isolated results. Following the identification, the isolated results can be eliminated in the calibration samples as described in step 13. Furthermore, as indicated in step 14, the generalized ranges of isolated results of the calibration samples can be examined to determine if necessary an additional data pre-treatment. In the event that a relatively large number of isolated results have large generalized ranges, another pretreatment of the calibration data may be indicated. If such additional pretreatment is indicated, then after such pretreatment, the calibration data can again be subjected to the previous steps described at the beginning of step 2. On the other hand, if relatively a large number of isolated results do not have large generalized ranges, then the additional data pre-treatment may not be appropriate. After this, as indicated by step 15 shown in FIG. 5, a calibration model can be constructed using any of the techniques, which include the regression of the main component (PCR), partial least squares (PLS), multiple linear regression (MLR), and artificial neural networks (ANN). The calibration model can try to correlate a group of independent variables that represent absorbance values of n samples each measured at wavelengths, represented symbolically by the matrix X nxp, with a group of dependent or response variables representing the concentration of an analyte in each of the n samples, represented symbolically by the vector y. and is a n-dimensional vector, or alternatively, it can be considered as an nxl matrix. After centering the mean X and y, the relationship between X and y can be expressed as y = Xb + e 14; where b represents a vector of the p-dimensional regression coefficient (pxl matrix) and e is a n-dimensional vector (matrix nxl) that represents errors in y. The calibration of the model determines the vector b, using b = (xlx) xy 15; Knowing b is used to predict the concentration of the analyte, and, in unknown samples, give only absorbances at each of the p wavelengths. Referring to step 16, the determination of (XtX) ~ 1 can be difficult just as collinearity is inherent in spectroscopic data. As described, the variables of the near-infrared spectrum data are greatly correlated. A prudent selection of the wavelength measurements can singularly minimize problems, the spectral regions of interest can suffer from severe overlaps and a large number of wavelengths are needed to model a multicomponent system. Data compression can be used to address problems with collinearity to determine the vector of the regression coefficient b, so that redundant data can be reduced to significant factors. The regression of the main component is a technique to determine the vector b that incorporates a data compression method. The first step in the regression of the main component is to develop the analysis on the calibration data formed within the matrix X. The register matrix T represents the largest source of variation found in X, and the column of vectors in T are orthogonal. As a result, in the next step in the regression of the main component, T is used instead of X whereby an approximate value of b is being used as (TT1) is invertible. b = (TtT) "1Tty: i6) Partial least-squares techniques can also be used to address the problem of redundant data. A difference between partial least squares and the regression of the main component is the means by which the register matrix T and the load matrix L are generated. As described, in the regression of the main component, using nonlinear iterative partial least squares (NIPALS), the load vectors are extracted one at a time in the order of their contribution to the variation of in X. As each vector of load is determined, this is removed from X and the next load vector is eliminated. This process is repeated until n charges are determined. In the partial least squares, the concentration, the block and, the information is used during the iterative decomposition of X. With information from concentration incorporated in L, the T values are related to the concentration as well as useful predictive information is placed on primitive factors compared to the regression of the main component. With respect to both the regression of the main component and the partial least squares, a determination of the appropriate number of vectors or registration factors must be made to be included in a calibration model that adequately represents the calibration data. The objective of selecting the optimal number of regression factors is to obtain parsimonious models with strong predictive abilities. Including so few factors causes the development of the model to be damaged due to inadequate information during calibration, including many factors can also degrade the development. The main components are normally classified in an order so that the amount of variation explained by each of the main components decays monotonically. Then ordering the main components associated with small eigenvalues can be considered as containers of interference measurements. By using only the first q factors and omitting remaining factors, a type of interference elimination can be incorporated into the regression of the main component. The number of analysis of principal components or factors or partial least squares records, q, to be used during the regression step can be determined as follows. In the case of matrix X with range n, there are constinterferences n preliminary calibration models. Each preliminary calibration model uses a different number of selected record vectors from the range of 1 to n register vectors. Then the predictions of the n preliminary calibration models are made using the standard error prediction technique. The standard prediction error (SEP) is a measure of the error associated with group predictions and is given by where the number of samples of the test group is given by n and (18) When plotting the standard error of prediction against the number of factors (log vectors) used, denoted by, in each of the respective groups of predictions, a continuous graphical representation can be obtained by parts FIG.17 and used to determine the number of retained factors. A criterion for the selection of the factor is to determine the first local minimum. Applying a criterion of the first local minimum to the data plotted in FIG. 17 eight factors must be selected from the calibration model. A general interpretation of FIG. 17 has significant information that is incorporated into the factor calibration model from one to six. Like factors seven and eight are included, they are included in the data with acuity. For factors nine through fifteen, specific variations or measurements of interference in the calibration group are modeled, so the error increases. Another technique for factor selection uses an F-test to compare the standard error of prediction with models that use different numbers of factors. An optimization of the F-test factor must find that the standard error of prediction of an eight-factor model does not vary significantly from the error Prediction standard of a six factor model, according to which six factors is seen to be optimal. In certain examples, the data that is analyzed may not be willing to be divided into a calibration, the training group and a validation group, test group. The reason may be due to a limited number of suitable samples or because of the division of data into two groups, one or both of the resulting groups do not adequately represent the population of the sample. In such a situation, the iterative technique of omitting one of the cross validations can be used, where during each iteration, a sample is excluded from the calibration group and used as a test sample. Prediction models using 1 to n-l determined factors of the n-l calibration samples are then used to make predictions of the test samples. The test sample is then returned to a calibration group and another sample is excluded. The same process is repeated until all n samples have been excluded from the calibration group and predicted by models generated by n-l calibration samples. All predictions are accumulated to give a standard validation error (SEV) given by where the subscript (i) represents the omitted i-th iteration that omits the i-th sample, with the standard validation error then treated as standard prediction error. Referring to step 17, as described in FIG. 5, after determining the number of significant factors, the data for the calibration model can be reduced to significant factors, and the regression coefficients for the calibration model can be determined. After the construction of the calibration model as described above it can be applied to data collected from samples where the concentration of analytes of interest are unknown, symbolically indicated in FIG. 6 from step 18. The data from the unknown samples can be appropriately pretreated as indicated in step 19, with techniques similar to those described above with respect to pretreatment techniques capable of being used with data from calibration. On the pretreatment complementation, the data sample can be projected in the defined space of the main component that was previously defined by the calibration model, as indicated in step 20. In step 21, the generalized intervals for the group are found of data from unknown samples, such as Mahalanobis or Robust intervals that were used with respect to the calibration data. The probability of the members of this class can be estimated using the techniques described above, which include an evaluation of the chi-square distribution function or using the Hostelling T-statistic. Referring next to step 22, the isolated results of the unknown sample are then identified based on the elimination of members of this class at a higher confidence level that is in the range of about 3-5s. In the event that an unknown sample does not have an isolated result as in step 23a, the unknown sample can be projected in the space defined by the calibration model, and a prediction of the analyte concentration is made. Therefore, if the unknown sample has an isolated result, as in step 23b, the unknown sample must reject and does not make predictions of the analyte concentration, although if possible, making measurements again of the unknown sample can be done by reanalysis to verify that the sample truly has an isolated result. With respect to the apparatus of the present invention, it is to be understood that the previous steps described with respect to the method of the present invention can be configured in a general-purpose microprocessor of computer 124 by employing computer program code segments according to each of the steps .
In use, the method and apparatus of the present invention was applied to glucose concentration data obtained from samples of 111 individuals, six of the samples did not have enough serum to collect a near infrared spectrum, so that the vectors of zeros will be used to fill their position in the data matrix in order to maintain the integrity of numbering succession during data manipulation. The six samples and the associated reference tests were omitted from future analyzes. Two other samples were associated with reference to test errors and were omitted, leaving 103 samples in the data set.
The potential isolated results were identified through visual inspection of two-dimensional scatter diagrams and three dimensions of main component records. FIGS. 7-9 represent analyzes of the main component separated from two developed spectral regions. The vectors of zeros, indicated with reference to number 200, tend from the main group of expected data. The near infrared spectrum of three samples, indicated by reference numerals 23, 67, and 85 each exhibit indications of interference due to bubbles in the optical path of the cell flow. As shown in FIGS. 7-9, such interference was present across the spectrum used as shown by the intervals of samples 23, 67, and 85 of the main group. In FIG. 7, samples 28 and 44 are viewed as potential isolated results, as in samples 3 and 4 in FIG. 8. In FIG 9, samples 3, 4, and 44 are potential isolated results. The Mahalanobis intervals of the 103 samples were calculated, as shown in FIG. 10, where samples 23, 67, and 83 are seen to have Mahalanobis intervals much larger than the other samples. In addition, in FIGS. 10, 13, and 14, samples omitted represent that they have zero Mahalartobis interval. An additional number of samples appears in FIG. 10 as candidates to be isolated results, include samples 3, 4, and 44. Data were subjected to another analysis, as it should be described with examples 23, 67, and 83 omitted, leaving 100 samples in the data group. The detrimental impact of including samples of isolated results in the data set is illustrated in FIGS. 11 and 12. FIG. 11 represents a scatter plot of blood glucose concentrations of 103 samples using derived data from 2030 nm to 2398 nm generated from an optimized partial least squares model with twenty factors allowing a standard error of validation of 64.10 mg / dL against concentrations of glucose in the blood. With samples 23, 67, and 83 eliminated, FIG. 12 represents a scatter plot of estimated blood glucose concentrations of 100 samples using data derived from 2030 nm to 2398 nm, generated from an optimized partial-squares model with eight factors allowing a standard validation error of 27.43 mg / dL against concentrations of glucose in the blood. With the total isolated results eliminated, the partial least squares technique used in the method of the present invention was able to make better predictions and the use of a less complex model, this is a model that uses fewer factors. The sample shown in FIG. 11 has an estimated value of about 750 mg / dL corresponding to the sample indicated in reference numeral 83. If the sample 83 in FIG. 11 is ignored and the rest of the samples in FIG. 11 is compared to this in FIG. 12, it appears to be very widespread near the identity line in FIG. 11 These results illustrate the influence of a relatively small number of isolated results on seriously degrading the overall development of the calibration model. Two spectral regions of the 100 samples were tested separately from the isolated results, with Mahalanobis intervals for each of the regions shown in FIG. 13 and 14. Nine samples were marked as possible isolated results in the region from 1580 nm to 1848 nm, and six samples were labeled in the region of 2030 nm to 2398 nm as possible isolated results. As evidenced by the comparison of FIGS. 13 and 14, the labeled samples were different in the two spectral regions. Isolated results can selected to be labeled samples are excluded as a member of this class in either or both spectral ranges, at a confidence level selected to be in the range of 3-5s. Four of the rejected samples were identified as possible isolated results from the main component registration graphs, FIGS. 7-9. The identification of the fifth sample requires examination in the larger dimensional space associated with Mahalanobis intervals. FIG 16 sets forth a summary of 95 samples representing both major spectral regions examined using the method and apparatus of the present invention, and this sample estimates the blood glucose concentration using groups of 95 and 100 sample data and the The same spectral regions produce very similar results. A slight reduction in the SEV error estimate of 26.97 mg / dL with respect to the group of 100 samples depicted in FIG. 12 the results for the group of 95 samples is depicted in FIG. 15 for the region of 2030 nm to 2398 nm, with the difference representing approximately 1% error reduction. A test-F at 95% confidence level does not find a significant difference. The comparison of pal least squares results from other spectral regions with various data forms that processes a production of similar results. If a Mahalanobis range with a threshold of 3.0 is used to determine isolated results, a group of 89 samples results. Using a pal least squares technique, omitting the check in the group of 89 samples results in a SEV of 27.95 mg / dL, a negligible increase over the groups of 100 and 95 samples. It was determined separately that the six samples omitted in the sample group 89 with respect to the sample group 95 correspond to samples having a high concentration of triglycerides, a high total protein value, or both. The presence of six isolated results constitutes a subsampling fact, that is, if a large number of representative samples with high concentrations of triglycerides or total protein was presented in the original group of samples, the samples having high concentrations of triglycerides or total protein it must be less likely to be marked as isolated results. The sensitivity for the detection of isolated results of triglycerides or any other analyte that affects the spectral response therefore it can be advantageous. The spectral data can be divided, such samples with large amounts of triglycerides form a first calibration group the samples with low triglyceride content form a second calibration group, so that new samples can be tested with the method and apparatus of the present invention to determine whether the first or second calibration group is representative of the new sample, thus allowing the selection of an estimation model determined from "similar" calibration spectra. The present invention, which has been described in its preferred embodiments, is clear that it is susceptible to numerous modifications and modalities in the skill of this experience in the art and without an exercise of the inventive faculty. As is to be appreciated from this experience in the art, the method and apparatus of the present invention encompass alternative techniques for measuring the biological fluid analyte, including concentrations derived from the biological fluid analyte using the light reflectance, the transmission of light , and other techniques used in conjunction with invasive, non-invasive, and measurement techniques analyte of biological fluid in-vivo. In addition, anayite measurements of biological fluid may also include triglyceride, cholesterol and serum proteins, with detection of isolated results using the method and apparatus of the present invention.

Claims (42)

1. An improved method for forming a calibration model for use in the determination of concentration of an analyte (substance to be measured) of a biological fluid of a mammal, characterized in that it comprises the steps of: collecting a group of calibration samples from a plurality of sources of biological fluid; generating near-infrared electromagnetic radiation having plurality of wavelengths; irradiating each of the calibration samples with the radiation so that a part of the radiation of each of the wavelengths is transmitted to each of the calibration samples; measuring the intensity of the radiation transmitted through each of the calibration samples at each of the wavelengths thereby forming a group of calibration data; process the group of calibration data, which includes forming the group of calibration data in an nxp matrix that defines a space, where n is the number of calibration samples and p is the number of wavelengths at which radiation intensity transmitted is measured, form a subspace of the space where sources of relatively large variations in the group of calibration data are presented, project the group of calibration data into the subspace, determine a generalized interval in the subspace between each calibration sample and a centroid of a distribution formed by the group of calibration samples, identify isolated calibration results having a generalized range greater than the preselected amount, form a small group of calibration samples of remaining calibration samples after eliminating the isolated results of calibration; and constructing a calibration model from the small group of calibration samples to predict the concentration of the analyte in a sample of a biological fluid.
2. The method as set forth in claim 1, characterized in that: the step of forming a subspace includes decomposing the matrix by analyzing the main component in a nxn dimensional register matrix and a nxp dimensional charge matrix, generated by the analysis of the main component a group of n eigenvectors and a group of n eigenvalues associated with the eigenvectors and arranged in order of decreasing order, dividing the group eigenvalues into a group of primary, large eigenvalues and a group of nq eigenvalues, small error according to which primary eigenvalues are associated with relatively more significant sources of variations in the group of calibration data and eigenvalues error are associated with relatively fewer significant sources of variation in the group of calibration data, and generate the subspace from the space defined by the load matrix; and the step of constructing a calibration model that includes forming a matrix of the regression coefficient that correlates the reduced group of calibration samples with the concentration of the analyte in a small group of calibration samples according to which the matrix of the regression coefficient can used to predict concentrations of the analyte in an unknown sample of the biological fluid given the intensity of the radiation transmitted through each of the wavelengths.
3. The method as set forth in claims 1 or 2, characterized in that each of the generalized intervals is a Mahalanobis interval determined from the following relation: where MDX is the Mahalanobis interval between an i-th calibration sample x? and the centroid x of the group of calibration samples, S "1 is the inverted variance-covarinza matrix of the calibration data group, and (x ^ -x) 'is the transpose of (X? -x).
4. The method as set forth in claims 1 or 2, characterized in that each of the generalized ranges is a determined Robust range that uses an algorithm selected from the group consisting of an ellipsoidal estimator of minimum volume and a projection algorithm.
5. The method as set forth in claims 1 or 2, characterized in that it also includes the step of pre-processing the group of calibration data to eliminate and compensate for spectral artifacts prior to the step of processing the group of calibration data.
6. - The method as set forth in claim 5, characterized in that the pretreatment step of the calibration data group is developed using an algorithm selected from the group consisting of derivatives of order n, multiplicative dispersion correction, smoothing of n-results, centered of the mean, adapt the variance, and the cosimetric method.
7. The method as set forth in claims 1 or 2, characterized in that it further includes the steps of: forming a ratio of the number of isolated calibration results against the number of calibration samples; determine if the ratio is greater than the preselected relationship; and pretreating the group of calibration data to eliminate and compensate for spectral artifacts prior to the step of processing the calibration data group if the ratio exceeds the preselected ratio.
8. The method as set forth in claims 1 or 2, characterized in that the step of identifying the isolated results includes selecting the magnitude to determine a probability to each member of the group of calibration samples belonging to a class defined by a function of the distribution of preselected probabilities according to which isolated calibration results are identified as calibration samples whose members of this class can be rejected up to a confidence level greater than a preselected level.
9. The method as set forth in claim 8, characterized in that the probability distribution function is formed using an algorithm selected from the group consisting of the evaluation of the chi-square distribution function and the Hotelling T-statistical evaluation.
10. The method as set forth in claim 8, characterized in that the preselected level is in a range of about 3 to 5 standard deviations as defined by the probability distribution function.
11. An improved method for determining the concentration of an analyte of a biological fluid of a mammal, characterized in that it comprises the steps of: collecting a group of calibration samples from a plurality of sources of biological fluid and an unknown sample from an unknown source of the fluid biological; generating near-infrared electromagnetic radiation having plurality of wavelengths; irradiate each of the calibration samples and the unknown sample with the radiation so that a part of the radiation of each of the wavelengths is transmitted to each of the calibration samples and the unknown sample; measure the intensity of the radiation transmitted through each of the calibration samples at each of the wavelengths by means of that forming a group of calibration data and through the unknown sample at each of the wavelengths in order to form a group sample data; process the group of calibration data, which includes forming the group of calibration data in an nxp matrix that defines a space, where n is the number of calibration samples and p is the number of wavelengths at which transmitted radiation intensity is measured, form a subspace of the space where sources of relatively large variations in the group of calibration data are presented, project the group of calibration data In the subspace, determine a generalized interval in the subspace between each calibration sample and a centroid of a distribution formed by the group of calibration samples, identify the isolated calibration results to these calibration samples that have a generalized interval greater than the pre-selected magnitude, form a small group of calibration samples of remaining calibration samples after eliminating the isolated calibration results; construct a calibration model from the small group of calibration samples to predict the concentration of the analyte in a sample of a biological fluid; and applying the calibration model to the group of sample data that includes projecting the group of sample data into the space defined by the model, determining a generalized interval of the unknown sample according to the model, identifying the Unknown sample as an isolated result of the sample provides the generalized range of the unknown sample is greater than the preselected amount, and predicts the concentration of the analyte in the unknown sample according to the model provides that the generalized range of the unknown sample is not greater than the preselected magnitude.
12. The method as set forth in claim 11, characterized in that: the step of forming a subspace includes decomposing the matrix by analyzing the main component in a nxn dimensional register matrix and a nxp dimensional load matrix, generating by analysis of the main component a group of n eigenvectors and a group of n eigenvalues associated with the eigenvectors and arranged in order of decreasing order, dividing the group eigenvalues into a group of primary eigenvalues, and a group of nq eigenvalues error, small according to which the eigenvalues primary are associated with relatively more significant sources of variations in the group of calibration data and the eigenvalues error are associated with relatively less significant sources of variation with the data group of calibration, and generate the subspace as a subspace of the main component dimensioned nxp from the space defined by the load matrix; and the step of constructing a calibration model that includes forming a matrix of the regression coefficient that correlates the reduced group of calibration samples with the concentration of the analyte in a small group of calibration samples according to which the matrix of the regression coefficient can used to predict concentrations of the analyte in an unknown sample of the biological fluid given the intensity of the radiation transmitted through each of the wavelengths.
13. The method as set forth in claims 11 or 12, characterized in that each of the generalized ranges of the group of calibration samples is a Mahalanobis interval determined from the following relation: where MDX is the Mahalanobis interval between an i-th calibration sample x_ and the centroid x of the group of calibration samples, S "is the variance matrix- inverted covarinza of the group of calibration data, and [x1-x.) t is the transpose of (x -.- x), and where the generalized interval of the unknown sample according to the model is a Mahalanobis interval determined by the following relationship : MDmue3tra = [(Xrauestra-? Model) s-? (Xmu? 3t ra-? Model) tji / z where MD sample is the Mahalanobis interval between the unknown sample and the centroid xm? dei of the group of cal ibration samples, Sm? deio_ 1 is the inverted variance-covarinza matrix of the model, and (xmuestram-d dios) fc is the transpose of (xmuestr -Xmo eio) •
14. The method as set forth in claims 11 or 12, characterized in that each of the generalized ranges is a determined Robust range using an algorithm selected from the group consisting of an ellipsoidal minimum volume estimator and a projection algorithm.
15. The method as set forth in claims 11 or 12, characterized in that it also includes the steps of: form a list of the number of isolated calibration results against the number of calibration samples; determine if the ratio is greater than the preselected relationship; pretreat the group of calibration data to eliminate and compensate for spectral artifacts prior to the step of processing the calibration data set if the ratio exceeds the preselected ratio; and pretreat the data sample to eliminate and compensate for spectral artifacts prior to the step of applying the calibration model to the data sample if the ratio exceeds the preselected relationship.
16. The method as set forth in claims 11 or 12, characterized in that it further includes the steps of: pre-processing the group of calibration data to eliminate and compensate for spectral artifacts prior to the step of processing the calibration data group; and pretreat the data sample to eliminate and compensate for spectral artifacts before the step to apply the calibration model to the data sample. *
17. The method as set forth in claim 16, characterized in that the pretreatment steps of the group of the data sample and the pretreatment of calibration data are developed using an algorithm selected from the group consisting of derivatives of order n, multiplicative dispersion correction , smoothing of n-results, centering of the mean, adapting the variance, and the eosthymmetric method.
18. The method as set forth in claims 11 or 12, characterized in that the step of identifying isolated calibration results includes selecting the magnitude to determine a probability to each member of the group of calibration samples belonging to a class defined by a function of the preselected probability distribution according to which isolated calibration results are identified as calibration samples whose members of this class can be rejected up to a confidence level greater than a preselected level, and where the step of identifying An isolated result of the sample includes determining if the -probability of the member of this class of the unknown sample can be rejected at a confidence level greater than the preselected level, depending on the model.
19. The method as set forth in claim 18, characterized in that the probability distribution function is formed using an algorithm selected from the group consisting of the evaluation of the chi-square distribution function and the Hotelling T-statistical evaluation.
20. The method as set forth in claim 18, characterized in that the preselected level is in a range of about 3 to 5 standard deviations as defined by the probability distribution function.
21. The method as set forth in claim 12, characterized in that the unknown sample of each of the calibration samples includes a second analyte having a concentration with a preselected range.
22. The method as set forth in claim 21, characterized in that the second analyte are triglycerides.
23. The method as set forth in claim 22, characterized in that the second analyte is total protein.
24. The method as set forth in claims 1, 2, 11 or 12, characterized in that the step of constructing a calibration model includes removing redundant data from the data corresponding to the small group of calibration samples.
25. The method as set forth in claims 1, 2, 11 or 12, characterized in that the step of constructing a calibration model is developed using an algorithm selected from the group consisting of regression of the main component, partial least squares, multiple linear regression and artificial neural networks.
26. The method as set forth in claims 1, 2, 11 or 12, characterized in that the step of constructing a calibration model is developed using an algorithm selected from the group consisting of regression of the main component, partial least squares, multiple linear regression and includes selecting an optimal number of register vectors to use in the calibration model according to which Redundant data can be deleted from data corresponding to the small group of calibration samples.
27. The method as set forth in claim 26, characterized in that in the step of selecting the optimal number of registration vectors includes: constructing preliminary calibration models, each preliminary calibration model uses a different number of selected registration vectors of a range from 1 to n; determine a standard prediction error for each of the preliminary calibration models; and compare the standard error prediction of the preliminary models to determine the optimal number of record vectors.
28. The method as set forth in claim 27, characterized in that comparing the standard prediction error is developed using an algorithm selected from the group consisting of the F-test and the local minimum determination.
29. The method as set forth in claims 2 or 12, characterized in that the step of dividing the group of eigenvalues includes determining the number of primary eigenvalues by an iterative method that compares the variance of q eigenvalues with the variance of the eigenvalues error together that They use an F-test.
30. The method as set forth in claim 29, characterized in that the step of determining the number of primary eigenvalues includes weighting the eigenvalues by an amount proportional to the information explained by the associated eigenvectors to produce a group of reduced eigenvalues.
31. An apparatus for determining the concentration of an analyte in an unknown sample of a fluid biological of a mammal, characterized in that it comprises: a placing unit capable of placing the unknown sample in each group of calibration samples of the biological fluid collected from a plurality of sources; a radiation emitter capable of emitting near infrared electromagnetic radiation to a preselected plurality of wavelengths, said radiation emitter positioned to sequentially direct radiation of each of the wavelengths within and partially through each of the samples of calibration and the unknown sample; a near infrared electromagnetic radiation detector arranged to receive and sequentially measure intensity of the radiation transmitted through each of the calibration samples at each of the wavelengths to form a group of calibration data and through the sample unknown to form a data sample group; and a computer connected to said detector and having a general purpose mprocessor configured with the code of the computer program to form the group of calibration data in a matrix defining a space, a Starting from a subspace of a space where the sources of relatively large variations in the group are represented, projecting the group of calibration data into the subspace, determining a generalized interval in the subspace between each calibration sample and a centroid defined by a distribution formed by the group of calibration samples, identifying the isolated calibration results as calibration samples having a generalized range greater than a preselected quantity, of a reduced group of calibration samples of remaining calibration samples after elimination of isolated results of calibration, construct a calibration model of a small group of calibration samples to predict the concentration of the analyte in the unknown sample, project the group of the data sample into a space defined by the model, determine a generalized interval of the unknown sample according to the model , identify the unknown sample as an isolated result of the sample given the generalized range of the unknown sample is greater than the preselected magnitude, and predict the concentration of the analyte in the unknown sample according to the model given the generalized range of a Unknown sample is not greater than the preselected magnitude.
32. The apparatus of claim 31, characterized in that said laying unit comprises: a flow of cells having an inlet orifice; and a pump arranged in fluid connection between said inlet orifice and said outlet orifice according to which each group of calibration samples and the unknown sample can sequentially circulate through said cell flow.
33. The apparatus of claim 31, characterized in that it further comprises a temperature controller capable of controlling the temperature of said placing unit and said detector.
34. The apparatus of claim 31, characterized in that each of the generalized ranges is a Mahalanobis interval determined from the following relation: -X) S- (X -? V] i where MDL is the Mahalanobis interval between an i-th calibration sample and the centroid x of the group of calibration samples, S "1 is the inverted variance-covarinza matrix of the calibration data group, and (x1-x) f ' is the transpose of (X? ~ x).
35. The apparatus of claim 31, characterized in that each of the generalized intervals is a determined interval Robust that is an algorithm selected from the group consisting of an ellipsoidal estimator of minimum volume and a projection algorithm.
36. The apparatus of claim 31, further comprises an interference reducer coupled to said radiation emitter and said detector, and capable of reducing interference in intensity measurements of this part of the radiation transmitted through each of the calibration samples and the unknown sample.
37. An apparatus for determining the concentration of an analyte in an unknown sample of a fluid biological of a mammal, characterized in that it comprises: a placing unit capable of placing the unknown sample in each group of calibration samples of the biological fluid collected from a plurality of sources, includes a flow of cells having an inlet and a pump arranged in fluid connection between said inlet orifice and said outlet orifice according to which each group of calibration samples and the unknown sample can sequentially circulate through said cell flow; a radiation emitter capable of emitting near infrared electromagnetic radiation to a preselected plurality of wavelengths, said radiation emitter positioned to sequentially direct radiation of each of the wavelengths within and partially through each of the samples of calibration and the unknown sample; a near infrared electromagnetic radiation detector arranged to receive and sequentially measure intensity of the radiation transmitted through each of the calibration samples at each of the wavelengths to form a group of calibration data and through the unknown sample to form a group of data sample; a temperature controller capable of controlling the temperature of said placing unit and said detector; an interference reducer coupled to said radiation emitter and said detector, and capable of reducing the interference in the intensity measurements of this part of the radiation transmitted through each of the calibration samples and the unknown sample; and a computer connected to said detector and having a general-purpose microprocessor configured with the code of the computer program to form the group of calibration data in a matrix defining a space, from a subspace of a space where they are represented the sources of relatively large variations in the group, project the group of calibration data into the subspace, determine a generalized interval in the subspace between each calibration sample and a centroid defined by a distribution formed by the group of calibration samples, identify isolated calibration results as calibration samples that have a Generalized interval greater than a preselected quantity, from a small group of calibration samples of calibration samples remaining after the elimination of isolated calibration results, construct a calibration model of a small group of calibration samples to forecast the concentration of the analyte In the unknown sample, project the group of the data sample in a space defined by the model, determine a generalized interval of the unknown sample according to the model, identify the unknown sample as an isolated result of the sample given the generalized interval of the sample. unknown sample is greater than the preselected magnitude, and forecast the concentration of the analyte in the unknown sample according to the model given the generalized range of an unknown sample is not greater than the preselected magnitude.
38. The apparatus of claim 37, characterized in that each of the generalized ranges is a Mahalanobis interval determined from the following relation: where MDi is the Mahalanobis interval between an i-th calibration sample and the centroid x of the group of calibration samples, S "1 is the inverted variance-covarinza matrix of the calibration data group, and (xi-x) 'is the transpose of (Xi-x).
39. The apparatus of claim 31, characterized in that each of the generalized ranges is a determined Robust range using an algorithm selected from the group consisting of a minimum volume ellipsoidal estimator and a projection algorithm.
40. The apparatus of claims 36, 38 or 39, characterized in that: said radiation emitter includes a source of near infrared electromagnetic radiation of relatively wide bandwidth and a monochrometer disposed between said source and said positioning unit; Y said interference reducer includes a wave scaler disposed between said source and said monochrome according to which the radiation of said source can alternatively block transmission to said monochrometer, and a synchronizer operably connected to said wave capacitor and said detector according to which the signals produced in said detector when the radiation of said source is blocked by said wave capacitor can be decreased in signals produced in said detector when the radiation by said source is not blocked by said wave capacitor.
41. The apparatus of claims 36, 38, or 39, characterized in that: said radiation emitter includes a source of near infrared electromagnetic radiation of a relatively broad bandwidth and a filter wheel disposed between said source and said positioning unit; and said interference reducer includes a wave splitter disposed between said source and said monochrometer, characterized in that the radiation of said source can alternatively be blocked for transmission to said monochrometer, and a synchronizer operable connected to said wave descrambler and said detector according to which the signals produced in said detector when the radiation of said source is blocked by said wave descrambler can be diminished by signals produced in said detector when the radiation of said source is not blocked by said wave descrambler.
42. The apparatus of claims 36, 38, 39, characterized in that: said radiation emitter includes a plurality of near infrared electromagnetic radiation sources of a relatively narrow bandwidth connected to said computer according to which they can be activated in a sequential order preselected; and said interference reducer includes an operable pulse exciter connected to each of said sources and said detector according to which the signals produced in said detector when the radiation of the source group is not pulsed by said exciter can be reduced by signals produced in said detector when the radiation of said sources are pulsed by said exciter.
MXPA/A/1998/001056A 1995-08-07 1998-02-06 Analysis of a biological fluid using the detection of results in aisla intervals MXPA98001056A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US195095P 1995-08-07 1995-08-07
US08/587,017 US5606164A (en) 1996-01-16 1996-01-16 Method and apparatus for biological fluid analyte concentration measurement using generalized distance outlier detection
US08587017 1996-01-16
US60/001,950 1996-01-16
PCT/US1996/012625 WO1997006418A1 (en) 1995-08-07 1996-08-02 Biological fluid analysis using distance outlier detection

Publications (2)

Publication Number Publication Date
MX9801056A MX9801056A (en) 1998-05-31
MXPA98001056A true MXPA98001056A (en) 1998-10-23

Family

ID=

Similar Documents

Publication Publication Date Title
US5606164A (en) Method and apparatus for biological fluid analyte concentration measurement using generalized distance outlier detection
CA2228844C (en) Biological fluid analysis using distance outlier detection
US6876931B2 (en) Automatic process for sample selection during multivariate calibration
EP0552291B1 (en) Method of estimating property and/or composition data of a test sample
EP0552300B1 (en) Spectral data measurement and correction
Xiaobo et al. Variables selection methods in near-infrared spectroscopy
US5121337A (en) Method for correcting spectral data for data due to the spectral measurement process itself and estimating unknown property and/or composition data of a sample using such method
US7038774B2 (en) Method of characterizing spectrometer instruments and providing calibration models to compensate for instrument variation
EP0419222A2 (en) Method for the prediction of properties of biological matter by analysis of the near-infrared spectrum thereof
US5641962A (en) Non linear multivariate infrared analysis method (LAW362)
EP0954744B1 (en) Calibration method for spectrographic analyzing instruments
JP3248905B2 (en) Method for analyzing biological substances having a water content
AU2620900A (en) System and method for noninvasive blood analyte measurements
Xu et al. Nondestructive detection of internal flavor in ‘Shatian’pomelo fruit based on visible/near infrared spectroscopy
Chia et al. Neural network and extreme gradient boosting in near infrared spectroscopy
Kawano Sampling and sample presentation
AU689016B2 (en) Non linear multivariate infrared analysis method
MXPA98001056A (en) Analysis of a biological fluid using the detection of results in aisla intervals
Idrus et al. Artificial neural network and savitzky golay derivative in predicting blood hemoglobin using near-infrared spectrum
Mello et al. Pruning neural network for architecture optimization applied to near-infrared reflectance spectroscopic measurements. Determination of the nitrogen content in wheat leaves
EP4083820A1 (en) System and method to identify and quantify individual components in a sample being analyzed
Ding et al. Efficient Sensor Calibration Via Machine Learning-Based Resampling Methods
Azcarate et al. Classification and Modeling Methods
Wu et al. Fast discrimination of juicy peach varieties by Vis/NIR spectroscopy based on bayesian-sda and pca