CA2694828A1

CA2694828A1 - Prototype of a computerized system to automatically acquire, statistically test and supply the parameters needed (standard deviation and percent recovery--obtained by running control samples) to compute 95% confidence intervals for chemical (and other) measurements contained in an organizations's main database (or locally) and ultimately to unbias those

Info

Publication number: CA2694828A1
Application number: CA2694828A
Authority: CA
Inventors: Gerald B. Buckler
Original assignee: Individual
Current assignee: Individual
Priority date: 2010-03-02
Filing date: 2010-03-02
Publication date: 2011-09-02

Abstract

A prototype of a computerized system (which also could be operated from an internet website) has been developed to automatically acquire, statistically test and supply the parameters needed (standard deviation and percent recovery--obtained by running control samples) to compute 95% confidence intervals for chemical (and other) measurements contained in an organizations's main database and ultimately to unbias those measurements and their confidence intervals. All sources of stochastic variation within the analytical methods are characterized, adjusted and corrected so as to provide parameters that are truly representative of the stochastic processes occurring in them as they are being done in a particular laboratory. This is done by running chemical, biological, microbiological or radiological control samples on specially designed computer forms that automatically test the data statistically as it is being accumulated. These forms and their control samples are often suitable to be continued to be run for quality assurance purposes.

Description

Patent Application No. 2,694, 828 by Gerald Buckler of 1205 Dorchester Avenue Ottawa, Ontario Title of Invention:

Prototype of a computerized system to automatically acquire, statistically test and supply the parameters needed (standard deviation and percent recovery--obtained by running control samples) to compute 95% confidence intervals for chemical (and other) measurements contained in an organizations's main database (or locally) and ultimately to unbias those measurements and their confidence intervals.

Table of Contents:
Operational Description and Theory of Invention, page 1 to page 25.

Special Note to Patent Office: This file is to be included with Description-filel.pdf, Description-file3.pdf and Description-file4.pdf as previously submitted for this application. This particular file (Description-file2-corrected.pdf) contains a number of corrections and is being submitted to replace the file (Description-file2.pdf) as previously submitted.

Operational Description and Theory of Invention:
Some Basic Premises:

1) Quantitative chemical analysis is done according to high technological standards. The analytical methods are documented and often published. The analytical chemistry methods are given different identification numbers and are always followed to the letter for every analytical run. Chemists and Chemical Technicians know how to carry out their trade. They know how to do exact weighings on five decimal place high precision balances. They know how to quantitatively transfer substances in solution from one flask to another without losing any. They know how to prepare solutions to exact volumes and exact concentrations. They know the theory of matter, basic chemistry and physics. There is absolutely no reason whatsoever for one chemical analyst to get a different percent recovery or standard deviation for the same material sample under identical conditions. Great care is taken by laboratory staff to ensure that specific analytical methods can be repeated over and over again in an identical manner.
However, notwithstanding all of this, there will be random variation occurring in the various stages of the chemical processing and in some of those stages, small losses will also occur which leads to one obtaining something less than 100% recovery. But the better methods have the bigger number of stages in them to take care of all potential interferences. This leads to a slightly less that desirable percent recovery at times but this can be offset by allowing the chemical measurements to be unbiased by the DBMS in the main database. This latter facility would be transparent to all laboratory staff by the proposed computerized system. This has all been said to justify the making of the first premise: The within-run variances of the premeasurements and submeasurements of a particular analytical method in a particular laboratory can be considered to be more or less constant over the several ongoing analytical runs that are routinely being made in the laboratory even though different laboratory analysts are performing the analyses.

2) The second premise that needs to be made is that: All random variation that is present in analytical chemistry measurements comes from the various stages of the chemical processing that occurs when performing the analyses. The specific analytical chemistry method as it is being done in a particular laboratory is a specific stochastic process generator. Therefore only the particular stochastic characteristics of the particular chemistry method need be determined in order to obtain the standard deviations for all the measurements to be generated by the analytical method. This obviates the need to be continually determining confidence intervals from chemical measurement data.

3) A third premise that can be made is that: While obtaining the particular stochastic characteristics of a particular analytical method in a particular laboratory at a specific measurement level, the percent recovery can also often be concurrently determined from the same control sample data for that measurement level.

4) A fourth premise: The nature of the stochastic variation that occurs in each of the various stages of the chemical processing is well known and understood by professional chemists. I fa modification to the method ever needs to be done, the determined stochastic characteristics can often be modified by careful thought, chemical process stage testing, such as of a new model of chemical instrumentation, and a minimum of re-running of control samples.

5) A fifth premise is that: Stochastic variation is inherited from one stage of the chemical processing to another in the analytical method in such stochastic manner so as to be effects-additive. Even the tolerances of standard laboratory labware such as volumetric flasks and pipettes are inherited in this stochastic manner for both regular samples and calibration standards. This means that the particular stochastic characteristics of a particular analytical chemistry method in a particular laboratory can be determined by running the appropriate control samples at specific measurement levels. As to be explained more fully later on, all of this control sampling can be routinely done at a leisurely pace over several analytical runs if only the collection of control sample data can be given its justifiable priority and initiated promptly by management officials. Often, standard deviations can be obtained from database records of regular sample duplicates.

6) A sixth premise is that: It can be shown that systematic error in chemical measurements cannot properly exist beyond the level at which the measurements are properly unbiased. The levels of concern are: (1) within analytical runs, (2) between analytical runs in the same laboratory, and (3) between laboratory biases. For reasons which are quite self-evident in light of the revelations having been made in this paper, the best level to unbias at is (2), the laboratory level. In other words, the measurements from the individual analytical methods in each particular laboratory would be unbiased in such manner as this computerized system is capable of doing, as has been explained, and if possible, this would be done by the DBMS in the main database, using the proper parameters that are supplied to it, but it can also be done within the laboratory, if necessary.

7) In paragraph (6), it was noted that the unbiasing of the chemical measurements is best done within the main database, but that it could be done within the laboratory, if necessary. A
particular example where this might find application could be that of a typical government research scientist. It is well known that research scientists almost invariably adopt the strategy of choosing the particular analytical methods they need in the beginning of their career and then to keep them for the duration of their research tenure. This is done to overcome the problem of bias between analytical methods, but as is evident from this paper, there could be systematic error between analytical runs. In addition, they are often driven to produce reams of analytical data in order to obtain sufficiently high statistical sample sizes for statistical testing purposes and for comparison to the data of other scientists. Often, it is desired to obtain a high degrees of freedom t-distribution confidence interval for publication. It can be shown that a t-distribution confidence interval is valid for significance testing but is useless and deceiving as a descriptive statistic. The proposed computerized system solves all these problems by determining the high degrees of freedom standard deviations needed to obtain the proper confidence intervals from the very beginning of the research project, from the analytical method itself, rather than from the reams of data produced by it for each new data set, and the research scientists can now compare unbiased data and confidence intervals with each other, resulting in huge savings in time and money. This is the seventh premise, the research scientist functioning as the administrator of the computerized system.

Programming the DPSP:

Variances are never entered as predefined program variables into the DPSP, only their standard deviation counterparts (this helps to control the number of decimal places needed). With the exception of the standard deviation of the slope, which is entered in terms of (AU, XAU, or AREA) per PPM at Q,-level, all standard deviations must be entered into the DPSP in terms of PPM at Q2-level. This subsection and the next one deals with how to estimate the sample standard deviations of the premeasurement and submeasurement random variables that are inherent in almost every analytical chemistry method that is out there. First of all, it should be documented that the author is recommending that a minimum of 15 degrees of freedom be established as a minimum industry standard for these standard deviations before they can be thought of as being a substitute for their population parameter counterparts for routine applications and reports. It can be shown that at a 95% confidence coefficient for the sample standard deviation at 15 degrees of freedom that it will be about 55% too high 2.5% of the time and 26% too low 2.5% of the time. What this translates into is that a 95%
confidence interval for the mean of measurements calculated as plus or minus 2.0 sample standard deviations at 15 degrees of freedom will produce an actual confidence coefficient between 95% and 97.4%
about 68% of the time and between 85.4% to 95% about 32% of the time. But this should be acceptable for routine applications and reports. It can be shown, using the theory of multiple tests, that a 95% minimum confidence coefficient confidence interval (MCCCI) for the population mean of measurements should be calculated as plus or minus 3.08 sample standard deviations at 15 degrees of freedom. This includes unbiasing of the sample standard deviation [7]. Such a confidence interval will be at 95% confidence coefficient, or above, all of the time and would therefore be more suitable for legal purposes such as court proceedings. Some chemical analysts may want to make do with some lessor number of degrees of freedom, say a minimum of ten, where, as in gas chromatography, it can take up to an hour to get a single reading on the gas chromatograph. In this case, it would be presumed, that an exception could be made. But the reliability of the 95% confidence intervals calculated as plus or minus 2.0 sample standard deviations at 10 degrees of freedom will be much less. However, the multiplier for the sample standard deviation could be increased as an expedient measure.

Note that these standard deviations with this many degrees of freedom do not need to be determined in a single analytical run. They can be obtained at a leisurely pace by running the appropriate control samples as time and circumstances permit and the results entered into the appropriate PAF's. After a period of some weeks, months, or even in some cases, a couple of years, the estimates for these standard deviations, at the minimum standard of 15 degrees of freedom per measurement level, will be achieved. But the sooner one starts collecting the data, the better. Any authoritative reference on industrial quality control will specify that such control sampling must take place for some required period of time before legitimate quality control charting can begin. It is the same principle. In the meantime, before the required minimum standard is achieved, the regular measurements that are routinely being generated in the laboratory can be entered into the DPSP for the particular analytical chemistry method and from there eventually will be entered into the main database. From time to time, the DBMS of the main database will check the DPSP for each particular analytical chemistry method in each particular laboratory to see if the required minimum standard, standard deviations and percent recoveries, have been entered into the temporary database of the DPSP
alongside the identification numbers for the respective samples. When this happens, the required standard deviations and percent recoveries for the particular samples will be uploaded and entered into the main database. Of course, all of this, or any part of it, can be done manually with now commonplace computer spreadsheet technology. The minimum standard for the percent recovery is four recovery constants (RC) or four recovery samples (RS) per measurement level (one RC or one RS per measurement level per run) to be obtained over four analytical runs for each of the required measurement levels. If a recovery sample or recovery constant cannot be run, the developer of the analytical method will supply the estimate. The percent recovery is entered into the DPSP as a percentage (this is the most straightforward and intuitive way) for uploading into the main database where it is then converted by the DBMS to its decimal equivalent.

It should be noted that the primary standards for very new and exotic chemicals are often far from being ideally pure. If a recovery sample is run using a primary standard chemical of, say, 80% theoretical purity, and the same primary standard chemical is used to make up the calibration standards, and if, in neither case, the actual purity is not being taken into account to determine the theoretical weights of primary standard chemical required to make up the recovery sample and standards, then the percent recovery obtained is only for the chemical processing stages of the analytical method and not the whole method. The recovery run could turn out to be 100% in this hypothetical case (it is as though the primary standard chemical is being considered to be 100%). If, indeed, this were the case, then the measurements being produced by this method would be, consistently, 20% too high throughout whole measurement spectrum (method bias). It is common practice in many laboratories to do a recovery run in just this way. That is why it is absolutely stipulated for the purposes of this system that the actual lot analysis or purity of the primary standard chemical to three significant figures always be taken into account in determining the theoretical weights used for the recovery sample run. Then the actual percent recovery for this hypothetical method will turn out to be 120%, when the actual lot analysis or purity of the primary standard chemical is taken into account in determining the theoretical weights used to make up the recovery sample but not the instrument calibration standards. This common practice with the instrument calibration standards does not matter to this computerized system. But when the system is implemented, a decision must be made, whether or not to continue not taking into account the actual lot analysis or purity of the primary standard chemical in determining the theoretical weights used to make up the instrument calibration standards for all future analytical runs of the analytical chemistry method in the particular laboratory.

Generally speaking, though not always, the required minimum standard, standard deviations and percent recoveries, need to be determined at three different specific measurement levels, low, medium and high, before being entered into the DPSP for each particular analytical chemistry method being used in the laboratory. The DPSP has been programmed to adjust, usually by some form of interpolation or extrapolation to explained later, the required minimum standard, standard deviations and percent recoveries, determined at low, medium and high measurement levels, that have been entered into it as predefined program variables, so that they can be applied to the routine overall measurements at M-level of the material samples at their various measurement levels. The DPSP has been programmed to further adjust the required minimum standard, standard deviations for application to the routine overall measurements at M-level of the material samples being analyzed according to the following data that is to be input into the data entry screen of the DPSP by the chemical analyst doing the particular analytical run:

1) the deviation of the material sample weight or volume of the sample or subsample replicates being analyzed from the standard nominal value required by the analytical chemistry method in the particular laboratory. A simple ratio, called an 'f' factor, is calculated by the chemical analyst and entered into the data entry screen of the DPSP. The 'f' factor is calculated as:

f _ nominal standard sample weight or volume (actual or nominal) non-standard sample weight or volume 2) the number of material sample subsample replicates being processed in the particular analytical run that is being entered into the data entry screen of the DPSP.

3) the number of reagent blanks being processed in the particular analytical run that is being entered into the data entry screen of the DPSP. This includes the number "zero" if there are no reagent blanks being processed. Alternatively, a different version of the program will not have a data entry column for this or it will be hidden.

4) the number of runs being made on the instrument calibration standards for the particular block of samples and/or subsample replicates that is going to be applied to them (by averaging the slopes, if necessary) that is being entered into the data entry screen of the DPSP. The possibilities are: one slope or two slopes (being averaged), if calibration standards are being run.
More than one run is sometimes made on the instrument calibration standards if there are any sensitivity changes occurring in the instrument during the course of reading all the sample or subsample extracts on the instrument.

5) any front-end or back-end dilutions or concentrations that are required for any individual samples or subsample replicates that are over and above all of those that are specified in the documented analytical chemistry method (that is, superimposed) for all samples or subsample replicates that are being entered into the data entry screen of the DPSP.

6) the number of replicate instrument readings, that are being made on each individual sample extract and/or on each replicate subsample extract.

The computer data entry screen contains the following columns:
Column 1: the current date.

Column 2: the lab-method identifier. This identifies the particular analytical chemistry method being done in the particular analytical laboratory.

Column 3: the unique sample identifier.

Column 4: the single or average (if more than one subsample replicate was done) original measurement for the sample.

Column 5: the number of subsample replicates done on the sample, for the analytical run.
Column 6: the front-end overall superimposed (that is, over and above any dilutions/
concentrations specifically indicated to be done in the analytical method during the regular chemical processing) dilution/concentration factor for the sample.

Column 7: the back-end overall superimposed (that is, over and above any dilutions/
concentrations specifically indicated to be done in the analytical method during the regular chemical processing) dilution/concentration factor for the sample.

Column 8: the "f' factor for the sample, as explained above.

Column 9: the number of reagent blanks that were run for the block of samples or subsample replicates in the analytical run. This value can be "zero," if no reagent blanks have been included in the current analytical run.

Column 10: the number of calibration slopes (zero, one or two) that were run for the block of samples or subsample replicates in the analytical run. This value can be "zero," if no calibration standards are being used in the particular analytical chemistry method. Note that in a titrimetric analytical method, the titer [4] is equivalent to the value of the slope but it usually has no significant variance, so a "zero" should be entered into column (10) or else the standard deviation of the titer would have to be determined and entered into the DPSP and a "1"
entered into column (10).

Column 11: the number of replicate standard instrument readings that were made on each sample or subsample replicate being run. Note that all replicate instrument readings must consist of one or more (all to be averaged along with the original reading) standard readings which may already consist of one or more (averaged) standard sub-readings such as occurs with standards additions "at the instrument" or as an expedient (when the sub-readings are averaged) to help normalize the output of the instrument while reducing the variation thereof.
If multiple (averaged) instrument sub-readings are a part of standard processing conditions (that is, they are done on each regular or control sample extract, each replicate subsample extract, and each calibration standard), then these same multiple sub-readings must be done when determining the various standard deviations on all of the various PAF-forms, including the standard deviation of the instrument as it is being determined on the STAN-DUP, CAL-DUP or CAL-DATA
forms. In the latter case though, the standard deviation of the instrument could alternatively be determined as the parent random variable of the instrument (that is, considering each individual non-composite reading to be a single outcome from the instrument) and then the variance thereof (obtained from multiple consecutive individual non-composite instrument readings using a single sample extract or standard solution) can be adjusted so as to comply with the number of multiple sub-readings which are standard. Only the respective standard deviation determined from that variance so adjusted can be entered as an alternative predefined program variable into the DPSP
once it is converted to PPM at M-level by multiplying by the standard "c"
factor for the specific analytical chemistry method. On the other hand, the standard deviation of" y"
given "x" (also the standard deviation of the instrument response variable) determined from each run on the calibration standards, would normally be calculated from the standard number of instrument sub-readings already having been made on each calibration standard so that it would not normally need to be adjusted before entering it as a predefined program variable into the DPSP, it having been converted to PPM at M-level by multiplying by the standard "c"
factor for the specific analytical chemistry method.

All of the above adjustments to the required minimum standard, standard deviations and percent recoveries, for the particular DPSP that are to accompany the overall measurements at M-level as they are being generated by the particular analytical chemistry method as it is being done in a particular laboratory and entered into the main database, are pretty straightforward to program into the DPSP although a lot of definitions had to be formulated in order to control the data entry process on behalf of chemical analysts performing the analyses. Insofar as the "calculations formula" of the particular analytical chemistry formula is concerned, it is almost invariably made up of "statistical constants," including the required standard nominal sample weight or volume in the denominator thereof, so that the entire formula is almost invariably reducible to a single standard "c" factor. What is meant by "standard" here is that analytical chemistry methods generally call for a specific "nominal" sample weight or volume to be measured out for each sample or subsample replicate to be run. For example, this could be 10.0 grams of material sample homogenate. By "nominal" is meant that the chemical analyst could, for example, weigh out 9.88, 9.93, 10.03 and 10.11 grams for a group of four subsample replicates. In this case, the "f' factor, as explained above, would be equal to 1.00. The "f' factor column is therefore pre-loaded with "1.00's" for every row for the convenience of the chemical analyst in reducing data entry time and to help reduce data entry errors. But if only approximately 3.00 grams were available for analyses, most likely the actual sample weight, say 3.08 grams, would be used to calculate the "f' factor. But if approximately 3.00 grams of an SRM material were to be run as a control sample for every analytical run, on an ongoing basis, then even though actual weights would be used for each run, the number 3.00 would be used to calculate the 'f' factor since 3.00 grams would be the "target weight" for each actual weighed-out portion of SRM
material. This example is given here but there were other such definitions that had to be formulated.

There are some more statements that are required about how to program the DPSP
to do the interpolation and extrapolation required in order to adjust, the required minimum standard, standard deviations and percent recoveries, determined at low, medium and high measurement levels that have been entered into the DPSP as predefined program variables, so that they can be applied, after being adjusted within the DPSP, to the routine overall measurements at M-level of all the material samples being done by the analytical method, at their various measurement levels. Originally, it was decided to include also the number of degrees of freedom as a separate adjusted parameter (for the adjusted standard deviations) to be included along with the adjusted standard deviations and percent recoveries which were to eventually be entered into the main database alongside the material sample and its measurement. But the approach taken in this paper is to establish a defined "minimum standard" for the number of degrees of freedom for the standard deviation, and statistical sample size for the percent recovery, eliminating the need for this option. But, of course, it can be done if desired. It should be noted here also that, although the interpolation and extrapolation techniques that are to be described here are in terms the required minimum standard, standard deviations and percent recoveries, determined at low, medium and high measurement levels, there are many cases where only two or even one measurement level would suffice. For example, a particular analytical chemistry method may only be in need of standard deviations and percent recoveries, for a particular restricted range of measurement levels, the ones being used, for example, to test for compliance of a certain food product to government imposed standards and regulations. But, for the purpose of explaining of the techniques, it will be assumed that there are three.

The main strategy used to describe the interpolation and extrapolation techniques will be to construct in one's imagination, a graph of the three plotted points using standard deviations or percent recoveries on the y-axis and measurement level on the x-axis. Taking the standard deviations first, linear interpolation would most likely be used, exclusively, to determine the adjusted standard deviations between points 1 (low measurement level), 2 (medium measurement level) and 3 (high measurement level). Subsequent adjustments will be made further on in the DPSP to the adjusted standard deviation determined here. Each of the three original plotted points must be for the standard deviation of a single analytical determination at M-level.
Originally, it was thought that a "computer table" of values of the standard deviations and percent recoveries would be needed, but it was found that simple mathematical formulas would suffice. Between point 1 and the origin (0,0) of the imagined graph, linear interpolation or a line constructed from a plot of the standard deviation according to the holding of the coefficient of relative variance (crv) of point 1 constant throughout the interval could be used, depending on what points 1, 2, and 3 are seen to doing. Such a plot makes a very nice curved line passing through the origin in somewhat of a logarithmic fashion. The system administrator, laboratory supervisor or analytical chemistry method developer would be the one making the choices. For points above point 3, linear extrapolation could be used, or extrapolation by means of holding the coefficient of relative standard deviation (crsd) of point 3 constant could be used, or extrapolation by means of holding the "crv" of point 3 constant, as previously explained, could be used. Again, it depends on what the points 1, 2 and 3 are seen to be doing.
For the percent recoveries, the task is even easier. Only linear interpolation need be used from the origin through to point 3 and beyond that the value at point 3 is extrapolated as a maximum.

Note: Unlimited extrapolation for the standard deviation is allowed to be made for all measurement levels above the highest measurement level (point 3) where the standard deviations were determined and for the percent recovery, the value at this point is extrapolated as a maximum for all measurement levels above it. This is allowed for the purposes of the algorithm that is going to be used to determine the adjusted standard deviations and percent recoveries.
For example, the extrapolation may exceed the highest measurement level (point 3) by a factor of ten times, if there is a back-end overall superimposed dilution/concentration factor equal to ten.
This may not seem very reasonable but the limiting factor for the percent recovery and standard deviation is usually not the chemical processing stages themselves (overall measurement spectrum) but the limited measurement spectrum of the instrument.

It is necessary, at this point, to fully describe how the algorithm is used to adjust the required minimum standard, standard deviations and percent recoveries, determined at low, medium and high measurement levels that have been entered into the DPSP as predefined program variables, so that they can be applied, after adjustment, to the routine overall measurements at M-level of all the material samples being done by the analytical method, at their various measurement levels. To understand this is to understand how the system works. First of all, it needs to be pointed out that along with each of the predefined program variables for the standard deviations and percent recoveries, determined at low, medium and high measurement levels, there are other predefined program variables entered in the DPSP that record the number of reagent blanks and slopes that were being run when the various PAF-forms were being used to determine the minimum standard, standard deviations and percent recoveries for the measurement levels. The standard deviation of the chemical instrumentation being used and the standard deviation of the slopes, both of which were determined under standard processing conditions, are also to be entered into the DPSP. These are obtainable from any of the STAN-DUP, CAL-DUP
or CAL-DATA forms. It is the responsibility of the system administrator, laboratory supervisor or analytical method developer, to enter all these predefined program variables into the DPSP.

Then, for the purpose of describing the algorithm below, it will be assumed that the chemical analyst will have also entered into the data entry screen of the DPSP, the required variables concerning each material sample or group of subsample replicates that have been run. The algorithm will be described in stepwise fashion with annotation:

Data Processing Algorithm:

Note: There are three possible "steps" that can be superimposed onto the standard chemical processing stages of the analytical chemistry method and each has its equivalent "factor" to be used in calculating the overall measurement. For example, there can be a front-end overall superimposed dilution/concentration giving rise to a front-end overall superimposed dilution/
concentration factor and there can be a back-end overall superimposed dilution/concentration giving rise to a back-end overall superimposed dilution/concentration factor.
In other words, a superimposed dilution/concentration factor is the reciprocal of the degree of superimposed dilution/concentration that was used for the sample. A non-standard sample weight or volume may also be used. An "f' factor has been created for the chemical analyst to enter into the data entry screen so that the standard deviations may be adjusted according to the ratio of the standard to non-standard sample weight or volume. It can be shown that the non-standard sample weight or volume and the front-end overall superimposed dilution/concentration both affect the input to the standard chemical processing stages while the back-end overall superimposed dilution/
concentration only affects the output. It can be further be shown that for purposes of determining a mock measurement for entering the computer table at the correct g-amount of analyte flowing through the various standard chemical processing stages, that the "f' factor and/or the front-end overall superimposed dilution/concentration factor should be removed from the original overall single or average measurement for the sample that has been entered into column (4). This is done by dividing by the respective "factors." The "f' factor is an implicit multiplicand in the calculations formula because the actual non-standard sample weight or volume will have been used in the denominator of the calculations formula instead of the actual standard sample weight of volume. The back-end overall superimposed dilution/concentration factor is not taken out in this manner because then the mock measurement would no longer be representative of the correct g-amount of analyte flowing through the various standard chemical processing stages of the analytical chemistry method. While the standard deviation of the various standard chemical processing stages of the analytical method are unaffected by this choice (the back-end dilution/concentration, a divisor, and the back-end dilution/concentration factor, a reciprocal multiplicand, cancel each other off), the standard deviation of the instrument can be magnified (or diminished) because it is only being multiplied by the back-end overall superimposed dilution/concentration factor and nothing is cancelling it off. A
back-end superimposed dilution is usually not made unless the concentration of the sample extract is very high and above the range of the calibration standards. To compensate for this possibility, the DPSP does unlimited extrapolation above the highest measurement level at which the standard deviations for the DPSP were determined so that the standard deviation of the sample will continue to vary as it has been doing over the standard measurement levels.
Since the system administrator, laboratory supervisor or analytical method developer will have entered the standard deviation of the instrument and the standard deviation of the slopes into the DPSP as predefined program variables and the chemical analyst will have entered the number of replicate (and averaged) instrument readings that were made on each sample or subsample replicate along with the back-end overall superimposed dilution/concentration factor, the algorithm given below will be adjusted to deal with these possibilities.

1) Divide the single or average measurement for the sample that has been entered into column (4) by the "f' factor entered in column (8). Call this result "mock measurement-1" and store it in computer memory. The "f' factor would have been used implicitly as a multiplier in the traditional calculation procedure when an (actual or nominal) non-standard sample weight or volume was used in determining the single or average measurement for the sample that was entered into column (4). Thus, by this action, it is removed.

2) Divide the mock measurement-1 determined in step (1) by the front-end overall superimposed dilution/concentration factor from column (6). Call this result "mock measurement-2" and store it in computer memory. The front-end overall superimposed dilution/concentration factor would have been used as a multiplier in the traditional calculation procedure for determining the single or average measurement for the sample that was entered into column (4). Thus, by this action, it is removed. This mock measurement-2, is the most representative measurement for entering the computer table in order to determine the eventual standard deviation and the percent recovery for the single or average measurement that was entered into column (4).

Note: If there have been no superimposed dilutions/concentrations, then "1.00"
will have been automatically entered into both column (6) for the front-end overall superimposed dilution/
concentration factor and column (7) for the back-end overall superimposed dilution/
concentration factor. This is also true for the "f' factor entered in column (8), if there have been no non-standard sample weights or volumes used. "n," the number of subsample replicates done on the sample (and averaged) for the current analytical run in column (5) and the number of replicate (and averaged) instrument readings that were made on the sample or on each subsample replicate in column (11) will also have been pre-set to the positive whole number "1."

Note: There may have been more than one back-end superimposed dilution/concentration. Thus, the word "overall" is used to reflect this.

Note: All subsample replicates must have the same degree of front-end and/or back-end overall superimposed dilutions/concentrations and their associated reagent blank or "reagent blanks "
(to be averaged) must also (each of them) have the same degree of back-end overall superimposed dilutions/concentrations. In addition, all subsample replicates must also have the same number of replicate instrument readings. Note that each replicate instrument reading may consist of more than one standard sub-reading such as occurs with standards additions "at the instrument "or as an expedient (when the sub-readings are averaged) to help normalize the output of the instrument while reducing the variation thereof Note: For the rest of the algorithm, the word "sample" will refer to each material sample or group of subsample replicates that have been run for which a single or average measurement is to be calculated and entered into the main database. Also, the "columns" refer to the various data entry columns described above. The algorithm will be described as though a particular material sample or group of subsample replicates from a single sample homogenate has being processed for which the single or average measurement for the sample has been entered into column (4).
Note: At each step described in this algorithm, the intermediate calculated results are stored in a computer memory input and output grid for further computer data processing and error checking.
The details of where and how they are stored are not given.

3) Using the above described procedure for interpolation and extrapolation, the DPSP
determines the standard deviation and percent recoveryfor the exact measurement level given for mock measurement-2. This will be the true percent recovery for the original single or average measurement for the sample that has been entered into column (4). It is stored in the computer memory grid for output later on. Further data processing is done on the standard deviation. Call this standard deviation "mock standard deviation-1 ."

Note: All the standard deviations entered as predefined programmed variables in the DPSP are either BAV-standard deviations, corrected BAV-standard deviations or corrected WAV-standard deviations so that they only apply to single determinations at M-level. These standard deviations have been determined under the standard or "corrected to standard" processing conditions specified in the particular analytical chemistry method to which the particular DPSP program applies. The number of reagent blanks and/or slopes that were used to determine the standard deviations under these standard processing conditions have also been recorded in the DPSP as predefined program variables as well as the standard deviations for the parent random variables of the reagent blanks and the slopes. If they are BAV-standard deviations such as are determined on the RS-form, they will contain the correct proportion of all forms of between-run systematic error (BRSE), including any between-run systematic measurement error, BRSME
(RBV and/or SRLV), being generated by the WRME (RBV and/or SRLV) of the reagent blank(s) or slopes(s) that are being run under standard conditions using the traditional calculation procedure. If they are WAV-standard deviations, such as are obtained on the SAM-DUP form, they will have been corrected by having had added to them the appropriate terms for the WRME (RB V
and/or SRLV) variation of the reagent blank(s) and/or slopes(s) that were being run to determine the standard deviations of the reagent blank(s) and/or slopes(s) under standard conditions.

4) The mock standard deviation-1 determined in step (3) is squared giving mock variance-l.
5) If the number of reagent blanks that were being run when the standard deviations for the regular samples were being determined under standard processing conditions is not "zero," the standard deviation for the parent random variable of the reagent blanks determined under standard processing conditions is squared, giving the respective variance.
This variance for the parent random variable of the reagent blanks is then divided by the number of reagent blanks that were used to determine the standard deviation of the reagent blanks on the RB-DUP form under standard processing conditions and the result is subtracted from mock variance-1 giving mock variance-2. If the number of reagent blanks that were being run when the standard deviations for the regular samples were being determined under standard processing conditions is "zero," then the value in mock variance-1 is assigned to the memory location for mock variance-2.

Note: Not every analytical method runs a reagent blank or count blank as part of standard conditions. In this case, a "zero" would automatically be entered into column (9) and the column hidden on the data entry screen. The standard deviation for the reagent blanks (or count blanks) would automatically be set to "zero" as a predefined program variable in the DPSP.

6) If the number of calibration slopes (zero, one or two) that were being run when the standard deviations for the regular samples were being determined under standard processing conditions is not "zero," the standard deviation for the parent random variable of the slopes determined under standard processing conditions is squared, giving the respective variance. The variance for the parent random variable of the slopes is then divided by the number of slopes (one or two) that were used to determine the standard deviation of the slopes on the STAN-DUP, CAL-DUP or CAL-DATA forms under standard processing conditions. This result is then multiplied by the square of mock measurement-2. This, in turn, is divided by the square of the mean or grand mean of the slopes as determined under standard processing conditions.
Finally, this last result is subtracted from mock variance-2 giving mock variance-3. If the number of slopes that were being run when the standard deviations for the regular samples were being determined under standard processing conditions is "zero," then the value in mock variance-2 is assigned to the memory location for mock variance-3.

Note: Not every analytical method runs instrument calibration standards as a part of standard conditions. In this case, a "zero" would automatically be entered into column (10) and the column hidden on the data entry screen. The standard deviation for the slopes would then also be automatically set to "zero" as a predefined program variable in the DPSP as a precaution.
Another special case is with standard additions "at the instrument." In this case, the calibration standards (including a "zero" standard) are added on top of each sample or subsample replicate injection or else mixed with each sample or subsample extract before injection. Therefore, a run on the calibration standards is being done for each sample or subsample extract as a part of obtaining an overall individual instrument reading (one sub-reading from each injection) for each extract. In this case, the number of calibration slopes would also be automatically set to "zero"
in column (10) and the column hidden on the data entry screen, since the variation in the individual "standard additions" slopes for each overall reading per determination will be included (as inherited variation), in the standard deviation of the instrument as determined for, and/or corrected to, a single instrument reading (composed of more than one sub-reading). The "standard additions" technique is too complex to be described here but note that this computerized system is not applicable to doing standard additions "through the method," in which case, the overall standard deviation at M-level for each determination is obtainable from the technique itself and the overall recovery is normally 100%.

Note: Mock variance-3 is an unmixed variance, not containing any BRSME (RBV
and/or SRLV) that would have been generated by the WRME (RB V and/or SRLV) of the reagent blank(s) and/or slopes(s) that were being run under standard processing conditions, nor likewise any variation from the WRME (RB V and/or SRL V) itself, when the standard deviations were being determined.
As a result, this variance can now be manipulated by standard statistical procedures. If it were BAV-standard deviations that had been entered as predefined program variables into the DPSP, there could be some other form of BRSE contained in this variance, but in theory, there shouldn't be any, and if there is, it has a right to be included as long as it is random. Any form of non-random BRSE should have been screened out on the respective PAF-forms. The basic idea is to remove, in steps (5) and (6), all variation due to the reagent blank(s) and/or slopes(s) that were being run under standard processing conditions using the traditional calculation procedure when the standard deviations for the measurement levels were being determined for the DPSP.
In steps (13) and (15), the variance for the parent random variable of the reagent blanks divided by the number of reagent blanks that are entered into column (9) for the current analytical run and the variance term for the number of calibration slopes {one or two, as entered into column (10) } for the current analytical run, will be put back into the overall variance for the sample in their stead.

7) The standard deviation of the instrument, such as determined on the STAN-DUP, CAL-DUP
or CAL-DATA forms under standard processing conditions, must be for only a single standard instrument reading per sample or per subsample replicate with no back-end overall superimposed dilution/concentration factor employed. This standard deviation of the instrument, having been entered into the DPSP as a predefined program variable, is squared giving the respective variance of the instrument. This variance is then subtracted from mock variance-3 giving mock variance-4. Note that under standard processing conditions, each standard individual instrument reading may consist of more then one standard sub-reading such as occurs with standard additions "at the instrument" or as an expedient (when the sub-readings are averaged) to help normalize the output of the instrument while reducing the variation thereof.

8) The back-end overall superimposed dilution/concentration factor from column (7) is squared.

Note: Both the front-end overall superimposed dilution/concentration factor and the back-end overall superimposed dilution/concentration factor must be very clearly defined to the user, especially what is meant by "superimposed." A message concerning this must always be output to the user on the data entry screen. An example to explain this point is given here concerning the back-end overall superimposed dilution/concentration factor. Suppose an analytical chemistry method requires under standard processing conditions, a concentration of 10 ml to ml at the back-end of the analytical method. Then, a concentration factor of 0.5 will appear in the numerator of the calculations formula, as part of standard conditions.
Suppose that the chemical analyst decides to concentrate further, in the above mentioned step, down to 1 ml. This is an overall concentration of 10 ml to 1 ml and the overall concentration factor is 0.1. But only the 5 ml to 1 ml is "superimposed." Consequently, the chemical analyst should enter 0.2 into column (7) of the data entry screen as the back-end overall superimposed dilution/concentration factor. To check, 0.2 which is entered into column (7) times 0.5 which is in the numerator of the calculations formula, is equal to 0.1 which is the correct overall concentration factor.

9) Multiply the variance of the instrument as determined in step (7) by the result from step (8) and divide this result by the number of replicate instrument readings that were made on the sample, or on each subsample replicate, that is entered into column (11) of the data entry screen.

Note: Each standard individual instrument reading may consist of more than one standard sub-reading such as occurs with standard additions "at the instrument" or as an expedient (when the sub-readings are averaged) to help normalize the output of the instrument while reducing the variation thereof. Only replicate instrument readings are being dealt with here, not sub-readings.
Refer to column (11) in the data entry section for an explanation of the number of replicate instrument readings that have been made on each sample or subsample replicate being run. See also the first note for step (6).

10) The result from step (9) is added to mock variance-4 from step (7), giving mock variance-5.
Note: Thus, the variance of the instrument is either put back into the overall variance for the sample the way it was or, as modified by steps (8), (9) and (10).

11) Mock variance-5 is then divided by "n," the number of subsample replicates done on the sample, for the current analytical run, as entered into column (5), provided "n" is a positive whole number greater than or equal to "1," giving mock variance-6.

12) If the number of reagent blanks entered into column (9) that were being run for the block of samples or subsample replicates in the current analytical run is not "zero,"
the variance for the parent random variable of the reagent blanks from step (5), is divided by the number of reagent blanks that are entered into column (9). Call this result the "blank variance correction term"
(BVCT). If the value entered into column (9) is "zero," then the value of "zero" is assigned to the BVCT.

Note: The same note as for step (5) applies to this step.

13) The BVCT, determined in step (12), is added to mock variance-6 giving mock variance-7.

14) If the number of calibration slopes (zero, one or two) entered into column (10) that were run for the block of samples or subsample replicates in the current analytical run is not "zero," the variance for the parent random variable of the slopes from step (6), is divided by the number of slopes (one or two) that are entered into column (10). This result is then multiplied by the square of mock measurement-2. This, in turn, is divided by the square of the mean or grand mean of the slopes as determined under standard processing conditions. Call this result the "slope variance correction term" (SVCT). If the value entered into column (10) is "zero," then the value of "zero" is assigned to the SVCT.

Note: The same notes as for step (6) apply to this step.

Note: In a titrimetric analytical method, the titer [4] is equivalent to the value of the slope but it usually has no significant variance, so a "zero" should be entered into column (10) or else the standard deviation of the titer would have to be determined and entered into the DPSP and a "1"
entered into column (10).

15) The SVCT, determined in step (14), is added to mock variance-7 giving mock variance-8.

16) Mock variance-8 is then converted to a standard deviation by taking the square root of it.

17) The result from step (16) is then multiplied by the "f' factor entered in column (8) and this result is further multiplied by the front-end overall superimposed dilution/concentration factor from column (6). This last result will then be the computed standard deviation at M-level for the original single or average (f more than one subsample replicate was done) measurement for the sample that was entered into column (4).

Note: It would be prudent to remember at this point, the two basic assumptions that underlie this algorithm in its present form, which are that (1) any unbiasing that needs to be done will be done by the DBMS in the main database with the parameters that are supplied to it, and that (2) the percent recoveries that have been entered into the DPSP, being averages based on a minimum of four (and where possible, sixteen) recovery samples that have been run, one per run, over the required number of analytical runs, means that the random variation in these average percent recoveries can usually be ignored. If, in defiance of the first assumption, the unbiasing is to be optionally done within the DPSP, then the algorithm would have to be modified at this point to allow for it. But these two basic assumptions will be maintained for the purpose of the algorithm as it is being presented here, on the basis that a bias-error tolerance of approximately plus or minus 1.00% would likely be acceptable to the user. However, in defiance of the second assumption, it might be desirable to have the DBMS further adjust the unbiased standard deviation that was computed by it from the (possibly biased) standard deviation that was obtained from step (17) so that a further corrected version could be applied to determine the unbiased 95% confidence interval for the unbiased single or average measurement in such manner that would take into account the random variation in the average percent recovery. This would require that an additional parameter, the standard deviation of the average percent recovery, also called the standard error of the percent recovery, be entered as a predefined program variable in the DPSP and stored in the temporary database in the DPSP
to be uploaded along with the other two parameters that were determined by the algorithm for the particular sample. This would only be done in the event that the standard error of the percent recovery is unusually high and/or the degree of bias-error tolerance in the unbiased measurement is unacceptable. [The standard error of the percent recovery, as obtainable from either the RC-form or the RS-form, is not independent from the standard deviation of the measurement as calculated on these same forms, but, nevertheless, it should be suitable for the purpose of correcting the overall standard deviation of the single or average measurement. The standard error of the percent recovery will be independent if the data set obtained by running the recovery samples over several analytical runs is used to calculate the standard error independently from all other calculations.] This further adjustment of the unbiased standard deviation for the unbiased single or average measurement of the sample would then be done by the DBMS
according to the f nal term that is given below in the general equation for the overall variance of a single or average determination (including the dividing by the percent recovery in decimal form).

To summarize, adhering to the above two assumptions, the DBMS will calculate the (possibly biased) 95% confidence interval for the (possibly biased) single or average measurement for the sample as plus or minus two of the standard deviations that were determined in step (17). This (possibly biased) 95% confidence interval and (possibly biased) measurement data are to be maintained (not deleted) in separate columns in the main database (necessary for a variety of reasons) despite the unbiasing operation which is to be done next. The DBMS
will then unbias the (possibly biased) single or average measurement for the sample and the (possibly biased) standard deviation that was obtained from step (17) by dividing both of them by the uploaded percent recovery for the measurement level in decimal form (this uploaded percent recovery value can be equal to 100% depending on the analytical method). The resulting unbiased single or average measurement and unbiased standard deviation are then stored in separate and hidden password-protected columns. The DBMS will then calculate the unbiased 95%
confidence interval for the unbiased single or average measurement as plus or minus two of the unbiased standard deviations. The resulting unbiased 95% confidence interval will then be stored in a separate and hidden password-protected column in the main database. This unbiasing operation assumes, as already stated, that the uploaded percent recovery is regarded as being a statistical constant. If, in defiance of the second assumption above, the standard error of the percent recovery has also been uploaded, then the DBMS will further adjust the unbiased standard deviation so that a corrected version of it can be applied to determine a corrected version of the unbiased 95% confidence interval. The unbiased standard deviation will then be corrected according to the final term that is given below in the general equation for the overall variance of a single or average determination (including the dividing by the percent recovery in decimal form) so as to take into account the random variation in the standard error of the percent recovery. The resulting corrected unbiased standard deviation will then be stored in a separate and hidden password-protected column. The DBMS will then alternatively calculate the corrected unbiased 95% confidence interval for the unbiased single or average measurement that was calculated above as plus or minus two of the corrected unbiased standard deviations. The resulting corrected unbiased 95% confidence interval will then be stored in a separate and hidden password-protected column in the main database. As previously stated, all of the above unbiasing operations can be done within the DPSP if required.

18) The percent recovery value from step (3) and the computed standard deviation at M-level from step (17) are then output in the output screen to the user along with the original single or average (if more than one subsample replicate was done) measurement at M-level that was entered into column (4) and these values are stored (along with the sample identifier and other relevant data) in a temporary database in the DPSP for uploading into the main database when accessed by the DBMS--unless step (19) applies.

19) If the data processing that has just been done to determine the standard deviation and percent recovery for the original single or average (if more than one subsample replicate was done) measurement that was entered into column (4) applies to transformed biological, microbiological or radiological data, then a 95% confidence interval is calculated for the single or average measurement by the DPSP. The endpoints for this 95% confidence interval are then retransformed and output in the output screen to the user along with the original single or average (if more than one subsample replicate was done) measurement that was entered into column (4) and these values are stored (along with the sample identifier and other relevant data) in a temporary database in the DPSP for uploading into the main database when accessed by the DBMS. The percent recovery determined by the DPSP would normally always be set to 100% in this case or else this parameter is omitted altogether. The transformational and retransformational formulas would normally be entered into the DPSP by the user.
The following are examples of experiments, thinking in terms of a computer table:
Explanation of Computer Table Experiment 1:

1) Suppose the DPSP is using a computer table instead of simple formulas.

2) Suppose for purposes of checking the algorithm that the coefficient of relative standard deviation (crsd) is constant throughout the measurement spectrum.

3) Create two computer tables, one for IOg sample and one for 5g sample.
lOg 5g "c" factor = 0.1 "c" factor = 0.2 M-level M-level Q2-level Q2 -level Q2-level M-level M-level Q2 -level Q2 -level Q2-level Meas. S.D. g-output S.D. %Rec Meas. S.D. g-output S.D. %Rec PPM PPM g PPM PPM g 100 5.0 1000 50 99 100 5.0 500 25 98 50 2.5 500 25 98 50 2.5 250 12.5 97 25 1.25 250 12.5 97 25 1.25 125 6.3 96 4) The same g-output at the back-end of the anal. chem. method should give the same S.D. at Q2-level. The output and S.D. at Q2-level is given in " g" instead of PPM for simplification.
Therefore the "c" factors are purely hypothetical but they are in the correct proportion for 5g and l Og in the denominator of the calculations formula.

5) There is only one computer table available and it is for log of sample but there is only 5g of sample available to be run.

6) The overall measurement at M-level is 100 PPM for 5g of sample.

7) The "f' factor for 5g of sample, when lOg of sample is standard, is 2Ø

8) If the overall measurement at M-level (100 PPM) is divided by the "f' factor, a mock measurement of 50 PPM is obtained.

9) The table for lOg is accessed at 50 PPM, and a S.D. of 2.5 PPM is obtained.
This is the correct S.D. at M-level for 500 }.g of output at Q2-level in the I Og table.
The percent recovery of 98% is also obtained at this time. If any adjustments need to be made to the S.D., they are done here at 2.5 PPM. It is assumed that none are needed.

10) The standard deviation (2.5 PPM) obtained in step (9) is multiplied by the "f' factor giving a value of 5.0 PPM.

11) By inspection of the hypothetical 5g table, this is the correct S.D. for the 5g sample at M-level for 500 g of output at Q2 -level in the 5g table.

12) By inspection of the hypothetical 5g table, the percent recovery is also correct since, although the measurement is divided by the "f' factor before accessing the computer table in step (9), the percent recovery obtained is not multiplied by the "f' factor.

Explanation of Computer Table Experiment 2:

1) Suppose the DPSP is using a computer table instead of simple formulas.

2) Suppose for purposes of checking the algorithm that the standard deviation (S.D.) is constant at Q2-level throughout the measurement spectrum.

3) Create two computer tables, one for IOg sample and one for 5g sample.
lOg 5g "c" factor = 0.1 "c" factor = 0.2 M-level M-level Q2 -level Q2-level Q2 -level M-level M-level Q2 -level Q?
level Q2 -level Meas. S.D. g-output S.D. %Rec Meas. S.D. g-output S.D. %Rec PPM PPM g PPM PPM g 100 2.5 1000 25 99 100 5.0 500 25 98 50 2.5 500 25 98 50 5.0 250 25 97 25 2.5 250 25 97 25 5.0 125 25 96 4) The same g-output at the back-end of the anal. chem. method should give the same S.D. at Q2-level. The output and S.D. at Q2 -level is given in " g" instead of PPM for simplification.
Therefore the "c" factors are purely hypothetical but they are in the correct proportion for 5g and IOg in the denominator of the calculations formula.

5) There is only one computer table available and it is for lOg of sample but there is only 5g of sample available to be run.

6) The overall measurement at M-level is 100 PPM for 5g of sample.

7) The 'f' factor for 5g of sample, when lOg of sample is standard, is 2Ø

8) If the overall measurement at M-level (100 PPM) is divided by the "f' factor, a mock measurement of 50 PPM is obtained.

9) The table for lOg is accessed at 50 PPM, and a S.D. of 2.5 PPM is obtained.
This is the correct S.D. at M-level for 500 g of output at Q2-level in the lOg table. The percent recovery of 98% is also obtained at this time. If any adjustments need to be made to the S.D., they are done here at 2.5 PPM. It is assumed that none are needed.

10) The standard deviation (2.5 PPM) obtained in step (9) is multiplied by the 'f' factor giving a value of 5.0 PPM.

11) By inspection of the hypothetical 5g table, this is the correct S.D. for the 5g sample at M-level for 500 g of output at Q2 -level in the 5g table.

12) By inspection of the hypothetical 5g table, the percent recovery is also correct since, although the measurement is divided by the "f' factor before accessing the computer table in step (9), the percent recovery obtained is not multiplied by the 'f' factor.

Conclusion of experiments 1 and 2:

The correct standard deviation and percent recovery are obtained in both experiments. If any adjustments had been made, they would have been made in approximately the correct proportions for the fmal overall standard deviations. The only error remaining will be due to the uncertainty in the standard deviations themselves in the computer table. These two experiments only deal with the "f" factor but, for example, the "f' factor could have been replaced with the "f' factor times the front-end overall superimposed dilution/concentration factor.

Some Statistical Formulas:
1) Sample variance of "x"

Sex = (x' - x) 2 (Formula-1) k - 1 "k" is the number of analytical runs.
The degrees of freedom (df) for the sample variance is equal to "k - 1"
2) (s x), the sample standard deviation, is equal to the square root of (S2 x) .
The degrees of freedom (df) for the sample standard deviation is equal to "k -1"
3) Sample variance of "x"

S 2 x = (d 2) (Formula-2) 2k "d" is equal to (x, - x2) the difference between the duplicate measurements.
"k" is the number of sample duplicates The degrees of freedom (df) for the sample variance is equal to "k"

4) (s x), the sample standard deviation, is equal to the square root of (S2 The degrees of freedom (df) for the sample standard deviation is equal to "k"

5) Sample pseudo-variance of "I d I"

S2Idl _ I (kdJ) (Formula-3) d is equal to I x, - x2 1 the absolute value of the difference between the duplicate measurements (also called the range of duplicates).

"k" is the number of sample duplicates The degrees of freedom (df) for the sample pseudo-variance is equal to "k"

6) (S I d l ), the sample pseudo-standard deviation is equal to the square root of (S2 I d l ).
The degrees of freedom (df) for the sample pseudo-standard deviation is equal to "k"

Notes:

(1) Both Formula-1 and Formula-2 can be utilized under either BA V or WAV
statistical sampling conditions depending on the application.

(2) Formula-2 is easy to derive. Just let "d/2" = (x; - k) in Formula-I but with (n - 1) in the denominator instead of (k - 1). The sign of "d/2," of course, doesn't matter due to squaring. This yields the intermediate formula "(d 2)/2" divided by (n - 1) which is the formula for determining the sample variance of "x" from two outcomes from a non-composite primary random variable "X" in terms of the "difference" between the two outcomes. "n" is always equal to "2," so the denominator is usually omitted, but it will be needed here. Plug this intermediate formula into the general formula for the pooled variance [8] using (n - 1) as the degrees of freedom in the denominators of the variances to be pooled, substituting "(d 2)/2" divided by (n - 1) for each of the "k" variances in the numerator of the general pooled variance formula. In the denominator of the general pooled variance formula, we have "k" times (n - 1) which is equal to "k." By using a summation identity, "2/4 = 1/2" is factored entirely out of the numerator of the general pooled variance formula and placed to the left of the summation sign. This is then taken this out of the numerator of the general pooled variance formula altogether by putting a "2"
in the denominator.
This yields Formula-2 which is sometimes called the "pooled variance formula for duplicates."
It is an unbiased estimator of the population variance of "X" since it has been the unbiased form of the general pooled variance formula that has been used to derive it. But it must be remembered that the sample variance is not for "d" but for "x" and the number of degrees of freedom for it is not "2k" but "k." Because of its ease of programming into the computer spreadsheets, Formula-2 is used to determine the WAV-variances and the WAV-standard deviations in all of the "duplicates" PAF-forms.

(3) Another strategy, used by Pearson and Hartley [3], to determine the probabilities for the range at "n = 2," from the standard normal probability table is a little more difficult to describe without a diagram but it can be shown that these probabilities can be obtained from the right-hand side of the standard normal probability table. Basically, by taking the absolute values of the distribution of (x, - x2) which is composed of equal frequencies of both positive and negative values, we get the distribution of I x, - x2 1 which is composed of only positive values.
The frequencies of the positive values are doubled but this doesn't affect the probabilities. The variance of (x, - x2) is double the variance of "x" so the variance of "x," as defined in Formula-2, is multiplied by "2," cancelling off the "2" in the denominator.
This is the real variance of (x, - x2) but not of I x, - x2 I so it is called a pseudo-variance for the distribution of lx, - x2 I and the square root of it is called a pseudo-standard deviation for the distribution of XI - x2 1 . Thus, the standard normal probability table can still be used to determine the probabilities for the distribution of I x, - x2 1 . For example, 95% of outcomes from the distribution of (x, - x2) will be between -2 and +2 standard deviations for (x, - x2) and 95% of outcomes from the distribution of I x, - x2 1 will be between "zero" and +2 pseudo-standard deviations for I x, - X21. The pseudo-variance of I x, - x2 1is shown above as Formula-3. The respective pseudo-standard deviation of I x, - x2 1 is used to determine the control limits for the range charts in all of the "duplicates" PAF-forms.

(4) Another strategy adapted by the author is called "chain-link-sampling." To explain this, imagine three identical series of outcomes, labelled Si, S2 and S3, directly on top of one another, from the same non-composite primary random variable "X," the population mean of which can be premised to be absolutely constant. The members of each series, Si, S2 and S3, are labelled by subscripting "x" as a, b, c, d, e, f, g, and so on, say for about 500 outcomes. Then referring to each of the outcomes by their subscripts, Si and S2 will fast be sampled according to the traditional sampling method: S 1: a _b, c_d, e_f, and so on; S2: b_c, d_e, f_g, and so on.
Then the samples for Si could be used to calculate a sample mean from the sets of pairs, averaging each pair and then averaging the individual averages and likewise for S2. Then the two overall means could be averaged giving a grand mean. Then, applying "chain-link-sampling" to S3, the sampling would be: S3: a_b, b_c, c_d, de, e_f, f_g, and so on. The overall mean calculated from the sets of pairs from S3, averaging each pair and then averaging the individual averages, will obviously be equal to the grand mean calculated from S 1 and S2.
This is not "overlapped sampling." There is no overlapping of any of the means in each of the pairs from S3. Nor is it related in any way to any form of "re-sampling."

The same principle can be applied to sampling for the variance and standard deviation using Formula-2. In this case, the "difference between duplicates" is obtained from each pair and applied to Formula-2 to calculate a variance. Then the variances obtained from Si and S2 could be pooled. It can be shown that the variances from S 1 and S2 are not entirely independent. In fact, in the extreme hypothetical case, they are inversely correlated. But this is an advantage. If the variance from Si is too small than the variance from S2 will be too big.
But when the two variances are pooled, a better estimate is obtained with double the degrees of freedom. Of course, with random sampling, the two variances will be similar anyway.
Research, using the random generation capability of the computer spreadsheet to generate random normal variates, confirms these statements. Then it can be shown that the variance obtained by applying the "difference between duplicates" obtained from S3 to Formula-2 will give the exact same variance as the former pooled variance from S 1 and S2. The same justification applies to both the WAV-variances and the WAV-standard deviations determined on the "duplicates"
PAF-forms.
This is not "overlapped sampling" nor any form of "re-sampling." There is no overlapping of any of the deviations inherent in each of the differences obtained from each of the pairs from S3.
Note that ANOVA cannot be done using the "chain-link-sampled" pairs from S3.

(5) "Chain-link-sampling" is considered to be absolutely essential for this computerized system.
The time and cost of obtaining the required number of degrees of freedom for the standard deviations from some of the "duplicates" PAF-forms is quite high having to use stratified sampling according to measurement level and having to obtain the various duplicates at "random" measurement levels, since the concentrations of analyte in the regular material samples are unknown before analysis. "Chain-link-sampling" cuts this time and cost in half. In practice, any number of subsample replicates can be run on any material sample homogenate by labelling their respective flasks as: a, b, c, d, e, f, g, and so on. A rule is made to subtract "b" from "a", "c" from "b", "d" from "c", "e" from "d", "f' from "e", "g" from "f', and so on. Six pairs of "differences between duplicates" are obtained if "chain-link-sampling" is used, whereas a maximum of only three is available by using the regular sampling. Over and above this stated advantage, additional PAF-forms would otherwise have to be created for triplicates, quadruplicates, quintuplicates, and so on when running this many subsample replicates. This would be an enormous task in itself and would make the computerized system so much more confusing and awkward and irksome to use. "Chain-link-sampling" can only be used with the SAM-DUP, RB-DUP, RS-DUP, COUNT-DUP (on transformed data), CAL-DUP and STAN-DUP
forms. One big precaution: ANOVA cannot be done using "chain-link-sampled"
duplicates.

General Equation for the Overall Variance of a Single or Average Determination:

The general equation for the overall variance of a single or average unbiased measurement of the concentration of a single ingredient in a single sample homogenate in PPM 2 at M-level (each term is to be referenced in serial order from top to bottom) is given by:
(FE)2 * (f)2 * 1/E(u)2 [ c2/E(m)2 { Var (all chemical processing stages) + ( Var (IRV + IBV) of a single instrument reading * (BE)2 ) /Nr } [Nd + c2/E(m)2 { Var (rb) /Nrb }

+ c2/E(m)2 { Var (measured VSAM in the material sample homogenate) }
+ 1/E(m)2 { ( Var (m) /Nm) * E(BMAC)2 } ]

+ 1/E(u)2 { Var (u-bar) * E(UMAC)2 }

For the single or average measurement obtained for a particular material sample at a particular measurement level in a single analytical run of a particular analytical chemistry method (equal sample weights or volumes), the measurement being calculated in PPM at M-level as:
(FE * f * c )/u-bar { X or X-bar } = (FE * f * c )/u-bar { see term--next line below }
{ BE/(m or m-bar) [(Y or Y-bar) - (rb or rb-bar)] }

(X or X-bar) is the concentration obtained from the calibration graph in PPM ( g/ml) at Q2-level.
(Y or Y-bar) is the instrument reading in AU, XAU, or AREA for the sample at Q,-level.

"Var" is the variance operator.
"E" is the expectation operator.
"u and u-bar" are the percent recovery (decimal equivalent) at the particular measurement level.
"c" is the "c" factor for the standard calculations formula (must be a statistical constant).
'f' is the 'f' factor for the standard/non-standard--sample weight or volume ratio.
"m and m-bar" are the single/average slope of the calibration line (regression line).
"rb and rb-bar" are the single/average reading of one or more reagent blanks.
"Nrb" is the number of (averaged) reagent blanks being run.
"Nm" is the number of (averaged) slopes (one or two) for a block of samples in the run.
"Nr" is the number of (averaged) instrument readings on the sample extract for each single or replicate determination.

"Nd" is the number of (averaged) replicate determinations done on the particular sample homogenate.

"FE" is the front-end overall superimposed dilution/concentration factor.
"BE" is the back-end overall superimposed dilution/concentration factor.

"VSAM" is the measured (as opposed to actual) residual variation of the concentration of the ingredient (analyte) in the single material sample homogenate.

`BMAC" is the possibly biased measurement at M-level in PPM (gg/g or gg/ml) of the actual concentration of the ingredient (analyte) in the single material sample homogenate as determined by the particular analytical chemistry method.

"UMAC" is the unbiased (having been unbiased--the verb) measurement at M-level (gg/g or g/ml) of the actual concentration of the ingredient (analyte) in the single material sample homogenate and closest practicable approximation to the actual concentration.
Special Note to Patent Office: This file is to be included with Description-file l .pdf, Description-file3.pdf and Description-file4.pdf as previously submitted for this application. This particular file (Description-file2-corrected.pdf) contains a number of corrections and is being submitted to replace the file (Description-file2.pdf) as previously submitted.

Claims

1. A documented analytical chemistry method (or other documented analytical method) being a stochastic process, if said stochastic process is fully characterized, then the required confidence intervals for each chemical (or other) measurement can be obtained from the said characterization of said stochastic process itself, at the particular measurement level for each measurement, according to the characterizing process described in the invention (rather than said confidence intervals always having to be obtained on demand from a plurality of measurements produced by said stochastic process for the purpose of computing the said confidence intervals, which said plurality of measurements are costly and time-consuming to obtain), this said characterizing process being the most fundamental process of the invention claimed by the inventor.

2. The second most fundamental process of the invention claimed by the inventor is that the routine measurements and their respective confidence intervals so obtained from said stochastic process, recited in Claim 1, at the particular measurement level for each measurement, can all be unbiased using control sample data, sometimes using the same control sample data that is used to obtain the confidence intervals, but only according to the unbiasing process described in the invention so that the integrity of the laboratory operations is maintained.

3. The characterizing process, recited in Claim 1, is known about but apparently no one else has attempted to utilize it to develop the appropriate systems and software to obtain confidence intervals for chemical (and other) measurements, as recited in Claim 1, due to various problems and complexities which the only the inventor has described, analyzed and overcome according to the described invention, said systems and the first stages of software development hereby being claimed by the inventor.

4. The unbiasing process, recited in Claim 2, is known about but is usually not done by laboratory operators since it could compromise the integrity of laboratory operations and make the comparing of the performance of the same analytical method in different laboratories impossible if using only the unbiased measurements from the said analytical method as it is being done in each of the said different laboratories to do the comparison, this problem being overcome by the controlled computerization of the system according to the described invention, the systems and first stages of software development to do said controlled computerization hereby being claimed by the inventor.

5. That control samples are run (analyzed) for a limited period of time in the course of routinely running (analyzing) regular samples over several analytical runs and their measurements thus recorded on specially designed computer forms such as computer spreadsheet forms (or similarly programmed computer forms), which are called Parameter Acquisition Forms or PAF's, from which said recorded measurements on said PAF's, the required statistical estimates of certain population parameters at specific measurement levels are automatically and cumulatively computed over several analytical runs according to the described invention, automatically testing the said measurement data statistically for outliers and/or systematic error between analytical runs as said measurement data accumulates, is claimed as part of the invention.

6. That the adjusting and correcting of parameters (standard deviations and percent recoveries at specific measurement levels as required to obtain specific confidence intervals for measurements at said measurement levels and to respectively unbias said measurements and their confidence intervals) are done by various interpolation and extrapolation subroutines according to the described invention, in conjunction with an algorithm according to the described invention, both said subroutines and said algorithm being contained in the overall programming code of a special computer program called a Derived Parameter Supplying Program (DPSP) developed by the inventor according to the described invention, that, in accordance with the described invention, is adapted to and set up to be specific to a particular analytical method as it is being done in a particular laboratory, is claimed as part of the invention.

7. Any quality control procedure requires control samples to be routinely run for a period of time, over several analytical runs, to determine the proper estimates of certain population parameters before any said quality control procedure can begin, but that the parts of the invention recited in Claim 3 and Claim 4 determines those parameters, recited in Claim 6, at three distinct measurement levels, low, medium, and high, on the appropriate PAF's, recited in Claim 5, so that, in conjunction with the DPSP, recited in Claim 6 (which is specifically adapted to a particular analytical method as it is being done in a particular laboratory), the said parameters are adjusted and corrected so that the recited characterizing process in Claim 1 and the recited unbiasing process in Claim 2 can both be fully accomplished for all measurement levels throughout the whole measurement spectrum of said analytical method according to the described invention, is claimed as part of the invention.

8. The described algorithm, recited in Claim 6, which said algorithm is programmed into the DPSP, recited in Claim 6, is the equivalent of pseudocode programming language, which said pseudocode programming language is convertible to any other computer programming language, and is therefore claimed as part of the invention.

9. The specially designed computer forms, called Parameter Acquisition Forms or PAF's, recited in Claim 5, are all specially designed by the inventor and are claimed as part of the invention, including the specified constraints in using them and the specific titles given to them.

10. The use of a little known statistical formula, called Formula-2 (variance form), that is derived and explained in the description of the invention, the legitimate use of which (said formula) can be proved using the random variable generating capability of the computer spreadsheet, to acquire standard deviations at particular or stratified measurement levels on some of the PAF's, recited in Claim 5, and which (said formula) lends itself very effectively to the programming of the said PAF's (whereas in contrast, the regular pooled variance formula equivalent would be very difficult and cumbersome to program), is claimed as part of the invention.

11. The use of a little known statistical formula, called Formula-3 (variance form), that is derived and explained in the description of the invention, the legitimate use of which (said formula) can be proved using the random variable generating capability of the computer spreadsheet, to determine the control limits for the control charts and pop-up histograms on some of the PAF's, recited in Claim 5, that use Formula-2, recited in Claim 10, to acquire standard deviations at particular or stratified measurement levels on the said PAF's, is claimed as part of the invention.

12. Chain-link-sampling, the legitimate use of which (said chain-link-sampling) can be proved using the random variable generating capability of the computer spreadsheet, as derived and adapted by the inventor and explained in the description of the invention, that is used to obtain serially consecutive duplicate measurements to be entered into some of the PAF's, recited in Claim 5, that use Formula-2, recited in Claim 10, to acquire the required standard deviations at particular or stratified measurement levels on the said PAF's, cutting the time and cost of regular statistical sampling methods in half in order to acquire the same number of degrees of freedom for the said required standard deviations at particular or stratified measurement levels, is claimed as part of the invention.

13. The fact that only single or duplicate control samples, as recited in Claim 5, or only one or more (as desired) sets of duplicate subsample extracts from the same singularly prepared homogenate of one or more (as desired) particular regular samples having already been submitted for analysis (it being emphasized, that one of the said set of said duplicate subsample extracts has already been singularly chosen to be included in the analytical run, the other one of the said set of said duplicate subsample extracts therefore amounting to only one extra subsample determination having to be included in the analytical run) need to be occasionally run as time and circumstances permit alongside all of the other regular samples being processed per analytical run for a limited time period at particular or stratified measurement levels, greatly simplifying the operational aspects of the computerized system and also making it possible to benefit from using the chain-link-sampling feature, recited in Claim 12, to obtain a plurality of serially consecutive said sets of said duplicate subsample extracts, is claimed as part of the invention.

14. That a special problem in scientific research is solved by obtaining the required parameters, recited in Claim 6, to construct confidence intervals that enable scientists to compare their data with that of other scientists, according to the recited characterizing process in Claims 1 and 3 and the recited unbiasing process in Claims 2 and 4, obviating the need for said scientists having to compute T-distribution confidence intervals, over and over again, exclusively from their data, as each new set of data is accumulated, in order to present their data to other scientists, these said T-distribution confidence intervals being useless and deceiving as a descriptive statistic anyway, is claimed as part of the invention.

15. That the entire computerized system, as described by the inventor, could be developed by any of several software development companies and made available for purchase to corporate customers throughout the world, is claimed as part of the invention.

16. That the entire computerized system, as described by the inventor, could be adapted and operated from a particular internet website making it available to fee-paying subscribers all over the world, is claimed as part of the invention.