GB2395101A

GB2395101A - Determining structure factor from X-ray crystallography data

Info

Publication number: GB2395101A
Application number: GB0404593A
Authority: GB
Inventors: Zelijko Dzakula
Original assignee: Accelrys Inc
Current assignee: Dassault Systemes Biovia Corp
Priority date: 2000-05-08
Filing date: 2001-05-08
Publication date: 2004-05-12
Anticipated expiration: 2021-05-08
Also published as: GB0404593D0; GB2395101B

Abstract

A method reduces the structure factor phase ambiguity corresponding to a selected reciprocal lattice vector. The method includes generating an original phase probability distribution corresponding to a selected structure factor phase of the selected reciprocal lattice vector. The original phase probability distribution includes a first structure factor phase ambiguity. The method further includes combining the original phase probability distribution with a plurality of phase probability distributions of a plurality of structure factor phases of other reciprocal lattice vectors using a phase equation or inequality. The method further includes producing a resultant phase probability distribution for the selected structure factor phase of the selected reciprocal lattice vector. The resultant phase probability distribution includes a second structure factor phase ambiguity which is smaller than the first structure factor phase ambiguity. In addition, a method uses linear prediction analysis to define a first structure factor component for a first reflection from x-ray crystallography data. The method includes expressing the first structure factor component as at first linear equation in which the first structure factor component is equal to a sum of a first plurality of terms.

Description

I STRUCI URE FACTOR DETERMINATIONS

Background of the Invention

Fiend of the Invention I 5 The invention relates to x-ray crystallography.

Description of the Related Art

In x-ray diffraction crystallography, a crystalline form of the molecule under study is exposed to a beam of x-rays, and the intensity of diffracted radiation at a variety of angles from the angle of incidence is measured. The beam of x-rays is diffracted into a plurality of diffraction 10 "reflections," each reflection representing a reciprocal lattice vector. From the diffraction intensities of the reflections, the magnitudes of a series of numbers, known as "structure factors," are determined. The structure factors in general are complex numbers, having a magnitude and a phase in the complex plane, and are defined by the electron distribution within the unit cell of the crystal., 15 The magnitudes of t'ne complex numbers are relatively easy to experimentally determine from measured diffraction intensities of the various reflections. However, a map of electron density and/or atomic position within the unit cell of the crystal cannot be generated without determining the phases of the structure factors as well. Thus, the central problem in x-ray diffraction crystallography is the determination of phases for structure factors whose amplitudes 20 are already known.

In attempts to determine the structure of large biomolecules such as proteins, one of the most frequently used approaches to solve this problem is based on isomorphous replacement. In single isomorphous replacement (SIR) analysis, one or more heavy atoms are attached to the protein, creating a heavy atom derivative or isomorph of the protein. An analysis of the difference 25 between the x-ray diffraction intensities from the native protein and from its heavy atom derivative can limit the phase of at least some structure factors to two plausible possibilities. For each structure factor, this SIR analysis results in a phase probability distribution curve which is typically substantially bimodal, with peaks positioned at the two most probable phases for that structure factor. 30 To remove the ambiguity of which probability peak corresponds to the correct phase for each structure factor, a plurality of heavy atom derivatives can be used to generate a set of phase probability distribution curves for each structure factor. In this multiple isomorphous replacement (M1R) analysis, the probability distribution curves for a selected structure factor are mathematically combined such that the resulting phase value is consistent across all of the heavy 35 atom derivatives for the selected structure factor. In essence, the resulting phase value common to -1

the set of phase probability distribution curios corresponds to the correct phase of the structure factor. An alternative analysis, multiple anomalous diffraction (AD) has mathematical formalisms which are similar to those of MA analysis. Aspects of these two procedure are described in Section 8.4, pages 255-267, of An Introduction to X-RaY Cnstalloeraphy by Michael

5 M. Wooltson, Cambridge University Press (1970, 1997). The complete content of the Woolfson textbook is hereby incorporated by reference in its entirety.

The heavy atom derivative method is commonly used when the structure of the protein or other molecule(s) in the unit cell is wholly unlmown. However, the preparation of heavy atom derivatives is slow and tedious, and the creation of a sufficient number of heavy atom isomorphs to 10 sufficiently reduce the phase ambiguity is not always possible.

The structure factors used to calculate atomic coordinates from measured x-ray diffraction intensities are oscillatory functions of the indices of the reciprocal lattice vectors with an overall decaying envelope. One expression for these structure factors has the following form: 15 Equation l: Fh,d = V qjTj fJ 1cos[2(hxj + J + IZJ)3+ iSm[2(f + 1 + kj)] , where Fhr, is the structure factor for the reciprocal lattice vector with indices h,k,l; qf are the occupancy populations of each site; Tj are the temperature factors which correspond to thennal motions; and fj are the atomic scattering factors. While the populations q; are constants, the 20 temperature factors TJ and atomic scattering factors fj decrease as the indices h,k,l increase.

Working from the magnitudes and phases of the structure factors, the electron density and/or atomic positions within the unit cell of the crystal can be determined. Structural determinations using x-ray diffraction data are described in An Introduction to X-Rav

CrvstalloraPhY by Michael M. Woolison, Cambridge University Press (1970, 1997), which is 25 hereby incorporated by reference in its entirety.

In principle, all of the x-ray diffraction reflections are capable of being known or measured (i.e., cognizable). However, due to various aspects of the systems used to experimentally measure the reflection intensities, the set of measured intensities may be incomplete, or may contain errors.

First, some x-ray diffraction measurement systems do not provide a measurement of the (0, 0, 0) 30 reflection, which can contain useful information regarding the contents of the crystal. Second, the range of reflections accessible by the x-ray measurement system can be constrained to some value, preventing the measurement of reflections corresponding to larger reciprocal lattice vectors. These larger reciprocal lattice vectors can contain high-resolution information (i.e., corresponding to shorter distances in direct space) regarding the crystal structure. Third, various other reflections -2

may be partially or wholly occluded by various portions of the x-ray diffraction measurement system. Fourth there may be other experimental factors, such as signal-to-noise, which reduce the confidence of a particular measurement by the x-ray measurement system.

Summary of the Invention

5 According to one aspect of the present invention, a method reduces the structure factor phase ambiguity corresponding to a selected reciprocal lattice vector. The method comprises generating an original phase probability distribution corresponding to a selected structure factor phase of the selected reciprocal lattice vector. The original phase probability distribution comprises a first structure factor phase ambiguity. The method further comprises 10 combining the original phase probability distribution with a plurality of phase probability distributions of a plurality of structure factor phases of other reciprocal lattice vectors using a phase equation or inequality. The phase equation or inequality defines a mathematical relationship between the selected structure factor phase of the selected reciprocal lattice vector and the plurality of structure factor phases of other reciprocal lattice vectors. The method further comprises I 15 producing a resultant phase probability distribution for the selected structure factor phase of the selected reciprocal lattice vector. The resultant phase probability distribution comprises a second structure factor phase ambiguity which is smaller than the first structure factor phase ambiguity.

According to another aspect of the present invention, a method defines a structure factor phase for a reflection derived from x-ray crystallography data. The method comprises generating a 20 first probability distribution for the structure factor phase of the reflection. The method further comprises generating two or more additional probability distributions for the structure factor phases of other reflections. The method farther comprises calculating a composite probability distribution for the structure factor phase of the reflection. The composite probability distribution is derived from the first probability distribution of the reflection and the two or more additional 25 probability distribution of the other reflections.

According to another aspect of the present invention, the methods described herein are implemented on computer readable medium having instructions stored thereon which causes a general purpose computer system to perform the methods described herein. According to another aspect of the present invention, a computer-implemented x-ray crystallography analysis system is 30 programmed to perform the methods described herein.

According to another aspect of the present invention, a computerimplemented Gray crystallography analysis system comprises a means for retreiving a first phase probability distribution corresponding to a selected structure factor phase of a selected reciprocal lattice vector. The system further comprises a means for retreiving a plurality of second phase probability 35 distributions corresponding to other structure factor phases of other reciprocal lattice vectors. The -3

system father comprises a means for combining the first phase probability distribution and plurality of second phase probability distributions so as to produce a resultant phase probability distribution for the selected structure factor phase of the selected reciprocal lattice vector.

According to another aspect of the present invention, a method refines xray diffraction data. The 5 method comprises combining structure factor phase probability distributions for different reciprocal lattice vectors so that the structure factor phase probability distribution for at least one of the reciprocal lattice vectors is more heavily weighted toward a phase value.

According to one aspect of the present invention, a method uses linear prediction analysis to define a first structure factor component for a first reflection from x-ray crystallography data.

10 The x-ray crystallography data comprises a set of cognizable reflections. The method comprises expressing the first structure factor component as a first linear equation in which the first structure factor component is equal to a sum of a first plurality of terms. Each term comprises a product of (1) a structure factor component for a cognizable reflection from the x-ray crystallography data, wherein the cognizable reflection has a separation in reciprocal space from the first reflection, and 15 (2) a linear prediction coefficient corresponding to the separation between the cognizable reflection and the first reflection. The method further comprises calculating values for the linear prediction coefficients. The method further comprises substituting the values for the linear prediction coefficients into the first linear equation, thereby defining the first structure factor component for the first reflection.

20 According to another aspect of the present invention, a method refines x-ray diffraction data. The method comprises deriving a value of a first structure factor from a linear combination of other structure factors.

According to another aspect of the present invention, a computer readable medium has instructions stored thereon which cause a general purpose computer to perform a method of using 25 linear prediction analysis to define a first structure factor component for a first reflection from x ray crystallography data. The x-ray crystallography data comprises a set of cognizable reflections.

The method comprises expressing the first structure factor component as a first linear equation in which the first structure factor component is equal to a sum of a first plurality of terms. Each term comprises a product of (1) a structure factor component for a cognizable reflection from the x-ray 30 crystallography data, wherein the cognizable reflection has a separation in reciprocal space from the first reflection, and (2) a linear prediction coefficient corresponding to the separation between the cognizable reflection and the first reflection. The method further comprises calculating values for the linear prediction coefficients. The method further comprises substituting the values for the linear prediction coefficients into the first linear equation, thereby defining the first structure factor 35 component for the first reflection.

According to another aspect of the present invention, a computerimplemented x-ray crystallography analysis system comprises a structure factor component generator for generating a first structure factor component for a first reflection from x-ray crystallography data using linear prediction analysis. The x-ray crystallography data comprises a set of cognizable reflections. The 5 first structure factor component is expressed as a first linear equation in which the first structure factor component is equal to a sum of a first plurality of terms. Each term comprises a product of (1) a structure factor component for a cognizable reflection from the x-ray crystallography data, wherein the cognizable reflection has a separation in reciprocal space from the first reflection, and (2) a linear prediction coefficient corresponding to the separation between the cognizable reflection 10 and the first reflection. The system further comprises a calculating module for calculating values for the linear prediction coefficients. The system further comprises a resultant structure factor component definer for deeming the first structure factor component for the first reflection by substituting the values for the linear prediction coefficients into the first linear equation.

According to another aspect of the present invention, a computerimplemented Dray 15 crystallography analysis system comprises a means for generating a first structure factor component for a first reflection from x-ray crystallography data using linear prediction analysis.

The x-ray crystallography data comprises a set of cognizable reflections. The first structure factor component is expressed as a first linear equation in which the first structure factor component is equal to a sum of a Burst plurality of teens. Each term comprises a product of (1) a structure factor 20 component for a cognizable reflection from the x-ray crystallography data, wherein the cognizable reflection has a separation in reciprocal space from the first reflection, and (2) a linear prediction coefficient corresponding to the separation between the cognizable reflection and the first reflection. The system further comprises a means for calculating values for the linear prediction coefficients. The system further comprises a means for defining the first structure factor 25 component for the first reflection by substituting the values for the linear prediction coefficients into the first linear equation.

Brief Descrintion of the Drawings Figure 1 is a flowchart of one embodiment of a method of reducing structure factor phase ambiguity corresponding to a selected reciprocal lattice vector.

30 Figure 2 schematically illustrates an example of a substantially bimodal phase probability distribution Pi) for the phase G.: corresponding a reciprocal lattice vector k.

Figures 3A-3C schematically illustrate phase probability distributions pi) , p(q) I), and P(-h) for reciprocal lattice vectors k, h, and k - h, respectively.

-5

Figure 3D schematically illustrates the resultant phase probability distribution Pi) for the structure factor phase corresponding to reciprocal lattice vector k, based on the three phase probability distributions shown in Figures 3A-3C.

Figures 4A 4C schematically illustrate phase probability distributions P(<P') PI h) 5 and P(k h) for reciprocal lattice vectors k, h, and k - h, respectively.

FiBe 4D schematically illustrates the resultant phase probability distribution Pi) for the structure factor phase corresponding to reciprocal lattice vector k, based on the three phase probability distributions shown in Figures 4A-4C.

Figure S is a flowchart of one embodiment of a method of defining a structure factor phase 10 for a reflection derived from x-ray crystallography data.

Files 6A-6D schematically illustrate an example of an embodiment of the present invention as applied to certain reflections of experimental data.

Figures 7A-7D schematically illustrate an example of art embodiment of the present invention as applied to certain reflections of experimental data.

15 Figure 8 schematically illustrates a "true" value of the phase obtained from density modification techniques corresponding to the reciprocal lattice vector k.

Figures 9A-9D schematically illustrate an example of an embodiment of the present invention as applied to certain reflections of experimental data.

Figure 9E schematically illustrates a "true" value of the phase obtained from density 20 modification techniques corresponding to the reciprocal lattice vector k.

Figure lOA schematically illustrates an artificial one-dimensional electron distribution composed of ten randomly positioned atoms.

Figure lOB schematically illustrates the correlation between the "calculated" structure factor phases produced by one embodiment of the present invention and the "true" structure factor 25 phases computed from the electron distribution of Figure 10\ Figure lOC schematically illustrates the electron distribution calculated from the set of structure factor phases from one embodiment of the present invention.

Figure lOD schematically illustrates the electron distribution calculated from the structure factors with random phases.

30 Figure 11 is a flowchart of one embodiment of a method of using linear prediction analysis to define a first structure factor component for a first reflection from x-ray crystallography data.

Figure 12 is a flowchart of one embodiment of calculating values for the linear prediction coefficients. -6

Figure 13 is a flowchart of another embodiment of calculating values for the linear prediction coefficients.

Figure 14 is a flowchart of another embodiment of calculating values for the linear prediction coefficients.

5 Figure 15 is a flowchart of another embodiment of calculating values for the linear prediction coefficients.

Figure 16A schematically illustrates an electron distribution of a hypothetical one dimensional system of ten atoms.

Figure 16B schematically illustrates the agreement between the true values for the 10 structure factor components corresponding to the electron distribution of Figure 6A and the corresponding linear prediction estimates from an embodiment of the present invention.

Figure 17A schematically illustrates another electron distribution of a hypothetical one dimensional system of ten atoms.

Figure 17B schematically illustrates the agreement between the true values for the I 15 structure factor components corresponding to the electron distribution of Figure 7A and the corresponding linear prediction estimates from an embodiment of the present invention.

Figure 18A schematically illustrates another electron distribution of a hypothetical one dimensional system of ten atoms.

Figure 18B schematically illustrates the agreement between the true values for the 20 structure factor components corresponding to the electron distribution of Figure 8A and the corresponding linear prediction estimates from an embodiment of the present invention.

Figure l9A schematically illustrates another electron distribution of a hypothetical one dimensional system of thirty atoms.

Figure l9B schematically illustrates the agreement between the true values for the 25 structure factor components corresponding to the electron distribution of Figure 9A and the corresponding linear prediction estimates from an embodiment of the present invention.

Figure 20A schematically illustrates another electron distribution of a hypothetical one dimensional system of thirty atoms.

Figure 20B schematically illustrates the agreement between the true values for the 30 structure factor components corresponding to the electron distribution of Figure IDA and the O corresponding linear prediction estimates from an embodiment of the present invention.

Figure 21A schematically illustrates another electron distribution of a one-dimensional projection of a hypothetical three-dimensional system of 500 atoms.

-7

Figure 21B schematically illustrates the agreement between the true values for the structure factor components corresponding to the electron distribution of Figure IIA and the corresponding linear prediction estimates from an embodiment of the present invention.

Figure 22A schematically illustrates another electron distribution of a one-dimensional 5 projection of a hypothetical three-dimensional system of 500 atoms.

Figure 22B schematically illustrates the agreement between the true values for the structure factor components corresponding to the electron distribution of Figure 12A and the corresponding linear prediction estimates from an embodiment of the present invention.

Detailed Description of the Preferred Embodiment

10 In describing embodiments of the invention, the terminology used is not intended to be interpreted in any limited or restrictive manner, simply because it is being utilized in conjunction with a detailed description of certain specific embodiments of the invention. Furthermore, embodiments

of the invention may include several novel features, no single one of which is solely responsible for its desirable attributes or which is essential to practicing the inventions herein described. I 15 In many embodiments, the present invention is useful in computer-implemented xray crystallography analysis processes. In these processes, x-ray crystallography data is analyzed using software code running on general purpose computers, which can take a wide variety of forms, including, but not limited to, network servers, workstations, personal computers, mainframe computers, and the like. The code which configures the computer to perform these analyses is 20 typically provided to the user on a computer readable medium, such as a CD-ROM. The code may also be downloaded by a user from a network server which is part of a local or wide-area network, such as the Internet.

The general purpose computer running the software will typically include one or more input devices such as a mouse andlor keyboard, a display, and computer readable memory media 25 such as random access memory integrated circuits and a hard disk drive. It will be appreciated that one or more portions, or all of the code may be remote from the user and, for example, resident on a network resource such as a LAN server, Internet server, network storage device, etc. In typical embodiments, the software receives as an input a variety of information, such as the x-ray crystallographic data and arty user-determined parameters for the analysis.

30 Figure 1 is a flowchart of one embodiment of a method 50 of reducing structure factor phase ambiguity corresponding to a selected reciprocal lattice vector. The method 50 comprises generating an original phase probability distribution in an operational block 60. The original phase probability distribution corresponds to a selected structure factor phase of the selected reciprocal lattice vector, and comprises a first structure factor phase ambiguity. The method 50 further 35 comprises combining the original phase probability distribution with a plurality of phase

probability distributions of a plurality of structure factor phases of other reciprocal lattice vectors using a phase equation or inequality in an operational block 70. The phase equation or inequality defines a mathematical relationship between the selected structure factor phase of the selected reciprocal lattice vector and the plurality of structure factor phases of other reciprocal lattice 5 vectors. The method 50 further comprises producing a resultant phase probability distribution for I the selected structure factor phase of the selected reciprocal lattice vector in an operational block | 80. The resultant phase probability distribution comprises a second structure factor phase ambiguity which is smaller than the first structure factor phase ambiguity.

In the operational block 60, an original phase probability distribution is generated which 10 corresponds to a selected structure factor phase of the selected reciprocal lattice vector. In certain embodiments, the original phase probability distribution is generated using single- isomorphous replacement (SIR) analysis. Other examples of analyses which can generate the original phase probability distribution in other embodiments include, but are not limited to single anomalous dispersion (SAD), multiple isomorphous replacement (My) and multiple anomalous dispersion 15 (MAD).

As is known to those of skill in the art, the usual result of SIR analysis is a set of Hendrickson-Lattman coefficients a,b,c',d- for each reciprocal lattice vector k. These coefficients define the original phase probability distribution p(q',ak,bk,c',dk) for each corresponding structure factor according to the following standard formula: 25 Equation 1: P(Ok, as, be, ok, d') = exp [ak COS() + bE sing) + cat cos(2') + d' sin(2i)], where UPS corresponds to the structure factor phase of a reciprocal lattice vector k, and 30 at,bt,ck,dk correspond to the Hendrickson-Lathnan coefficients for the reciprocal lattice vector k. The normalization factor of Equation I has been omitted for simplicity.

As described above, the shapes of the phase probability distributions generated from SIR analysis are generally bimodal (i.e., the distribution has two prominent probability modes). In such a bimodal phase probability distribution, the phase has a sig.uticant likelihood of being in either 35 mode of the distribution. An example of a substantially bimodal phase probability distribution pi) is illustrated in Figure 2 for the phase Arkcorresponding to a reciprocal lattice vector

k. The phase probability distribution P(i) iD Figure 2 has a mode centered at approximately 30 degrees and a second, approximately equal mode at approximately 170 degrees. The value of the phase O then has an approximately equal probability of being either approximately 30 degrees or approximately 170 degrees. The structure factor phase ambiguity of a phase probability 5 distribution can be defined in terms of the relative weight of each mode of the bimodal distribution.

As illustrated in Figure 2, the two modes of the phase probability distribution pi) have approximately equal weights, so it is equally likely that the phase has a value in one mode as in the other mode. Therefore the phase probability distribution P(Or) has a relatively high structure factor phase ambiguity. The ambiguity of a phase probability distribution can be 10 quantified by calculating a centroid which represents the ensemble average value for the phase, and a "figure of merit" (FOM) which is a measure of the reliability of the centroid. A FOM value of zero represents complete ambiguity, and a FOM value of one represents total certainty (i.e., a sharp, single-peak phase probability distribution). The phase probability distribution schematically illustrated in Figure 2 has a centroid of 129 degrees and a FOM value of 0.19.

15 In the crystallographic analysis of large molecules such as proteins, there are thousands of reciprocal lattice vectors or reflections to be examined, and thus thousands of ambiguous phase determinations defined by phase probability distributions, such as the phase probability distribution p(q,) illustrated in Figure 2, each comprising a structure factor phase ambiguity. As described above, FOR analysis can reduce the structure factor phase ambiguities from heavy atom derivatives 20 by analyzing x-ray crystallography data obtained for multiple heavy atom derivatives of the molecule under study. However, the preparation of these additional heavy atom derivatives is slow and tedious, and the creation of a sufficient number of heavy atom isomorphs to sufficiently reduce the structure factor phase ambiguity is not always possible.

The preparation of these additional heavy atom derivatives can be avoided by certain 25 embodiments of the present invention. In such embodiments, the original phase probability distribution P((k) is combined with a plurality of phase probability distributions of a plurality of structurefactor phases of other reciprocal lattice vectors using a phase equation or inequality in the operational block 70 of Figure 1. The phase equation or inequality defines a mathematical relationship between the selected structure factor phase of the selected reciprocal lattice vector and 30 the plurality of structure factor phases of other reciprocal lattice vectors.

Various mathematical relationships exist between the phases and/or the amplitudes of different structure factors. Such relationships have been used in various direct methods for solving crystal structures to find the most probable structure factor phases which are consistent with the -10

measured reflections. To date, these direct methods have found application only to solving structures for relatively small molecules, where the crystal structure includes less than about 150 non-hydrogen atoms in the asymmetric unit cell. Several such methods are described in Sections 8.6, 8.7, and 8.8 of the Woolfson reference described above. Embodiments of the present 5 invention differ from the direct methods by using experimentally determined phase probability distributions as inputs (e.g., from M1R, MAD, SIR, SAD analyses). The direct methods utilize only structure factor amplitudes as inputs.

In certain embodiments of the present invention, these mathematical relationships may be used to reduce the structure factor phase ambiguity present in the Dray crystallography data for 10 large molecules, such as proteins having hundreds or thousands of non-hydrogen atoms per unit cell. In certain embodiments, the phase equation or inequality can define a mathematical relationship known as the phase addition relationship: Equation 2: (Pk + (4-h k-l, where Ark iS the structure factor phase for the reciprocal lattice vector k, is the structure factor phase for the reciprocal lattice vector -h, and -h iS the structure factor phase for the reciprocal lattice vector k - h. The phase addition relationship is based on two axioms: (1) the electron density is non-negative; and (2) the atoms are identical and discrete, with random 20 positions in the unit cell. Certain other embodiments can utilize other phase equations or inequalities which define other mathematical relationships in accordance with the present invention. An example of another phase equation or inequality is described more fully below.

As applied to bimodal phase probability distributions, if three bimodal phase probability _ _ _ _

distributions for reciprocal lattice vectors k, -h, and k-h have been generated, the most 25 probable phase for reciprocal lattice vector k is the one which adds to a likely correct phase from the phase probability distribution for reciprocal lattice vector - h to produce a likely correct phase from the phase probability distribution for reciprocal lattice vector k - h. Figures 3A-3D schematically illustrate the combination of an original phase probability distribution P(qk) with the phase addition relationship between a selected structure factor phase 30 of a selected reciprocal lattice vector k and a set of structure factor phases of other reciprocal lattice vectors. Figures 3A-3C schematically illustrate three bimodal phase probability distributions for reciprocal lattice vectors k, - h, and k - h. The phase probability distributions of Figures 3A-3C have been generated synthetically to provide well-resolved mode peaks which -1 '

can be easily resolved by visual analysis for illustration purposes. Such synthetically-generated functions can imitate the ambiguity found in xray crystallography data.

In Figure 3A, the phase probability distribution P(Pú) for reciprocal lattice vector k has two mode peaks, a peak 12 centered at 30 degrees, and an approximately equal peak 14 centered at 5 170 degrees. In Figure 3B, the phase probability distribution P(<P h-) for reciprocal lattice vector -h has two mode peaks, a peak 16 centered at 60 degrees, and a peak 18 centered at 330 degrees, and in Figure 3C, the phase probability distribution P(P h) for reciprocal lattice vector k -

also has two mode peaks, a peak 20 centered at 90 degrees, and a peak 22 centered at 170 degrees.

The phase addition relationship implies that the true phase from reciprocal lattice vector k should 10 add to the true phase of reciprocal lattice vector - h to produce the true phase of reciprocal lattice vector k - h. Examination of the peaks in Figures 3A-3C shows that the phase of peak 12 for reciprocal lattice vector k plus the phase of peak 16 for reciprocal lattice vector - h produces the phase of peak 20 for reciprocal lattice vector k - h. Thus, consistency between the phases of these reciprocal lattice vectors selects peak 12 at about 30 degrees as the correct phase for 15 reciprocal lattice vector k.

In certain embodiments, the combination of the original phase probability distribution P(k-) with the phase equation defining the phase addition relationship in the operational block 70 of Figure 1 is performed in a more mathematically robust and accurate manner by combining the phase addition relationship with the Hendrickson-Lattman formula as follows: Equation 3: 2r ( I) P( k' k' i' Body) |d hP(O h,a i,b i,C h-,d-h)p( + itat h,bi hick h,di i) where P(Ok) is a resultant phase probability distribution for the selected structure factor phase of 25 the selected reciprocal lattice vector k. Equation 3 statistically combines the phase addition relationship with the original phase probability distribution for reciprocal lattice vector k to produce a resultant probability distribution P(<P) for the structure factor phase corresponding to reciprocal lattice vector k. As described below, in other embodiments the resultant phase probability distribution can be a composite probability distribution expressed in alternative forms.

30 In certain embodiments, in which the original phase probability distributions are of the forth shown in Equation 1, producing a resultant phase probability distribution P(<I>k-) for the -12

selected structure factor phase of the selected reciprocal lattice vector k in the operational block 80 comprises evaluating the integral of Equation 3 analytically. Such an analysis can yield an infinite series involving hypergeometric Bessel functions. In other embodiments, the resultant phase probability distribution PIE) is produced using numerical integration, in which the form 5 of Equation 3 may be conveniently transformed into the standard form of Equation 2. In such embodiments, the resultant phase probability distribution PIE) for the selected structure factor phase of the selected reciprocal lattice vector k can be expressed in terms of a revised set of Hendrickson-Lattman coefficients.

Figure 3D schematically illustrates the resultant phase probability distribution PIE) for 10 the structure factor phase corresponding to reciprocal lattice vector k, based on the three phase probability distributions shown in Figures 3A-3C. The resultant phase probability distribution P(OE) is substantially unimodal (i.e., the distribution has only one prominent probability mode).

As compared to the original phase probability distribution for the reciprocal lattice vector k, the resultant phase probability distribution PIE) has a peak 22 centered at 30 degrees, as does the 15 original phase probability distribution p(<I,), but only has an almost completely suppressed small peak 24 at approximately 170 degrees which corresponds to second peak 14 of the original phase probability distribution Pi)- In addition, the peak 22 of the resultant phase probability distribution Pi) is narrowed as compared to the corresponding peak 12 of the original phase probability distribution P(k) 20 The resultant phase probability distribution is weighted more heavily to a correct phase than is the original phase probability distribution. Because the resultant phase probability distribution P(q>) has a larger fraction of its weight distributed among a smaller range of phases, the structure factor phase ambiguity of the resultant phase probability distribution PAP) is smaller than that of the original phase probability distribution p(<P I). The original phase 25 probability distribution, as illustrated in Figure 3A, has its centroid at 100 degrees (far away from the true value of 30 degrees) and a FOM value of 0.23. However, the resultant phase probability distribution, as illustrated in Figure 3D, has its centroid at 28 degrees, and a FOM value of 0.92.

Therefore, the resultant phase probability distribution has a smaller ambiguity than does the original phase probability distribution.

-13

For embodiments in which the phase probability distributions pi), P()-h), and P(q> I) consist of wider peaks, as schematically illustrated in Figures 4A-4C respectively, the resultant phase probability distribution PA) is still bimodsl, as schematically illustrated in Figure 4D. However, as compared to the original phase probability distribution p(O ':) of Figure 5 4A, the resultant phase probability distribution PA) of Figure 4D emphasizes the correct peak mode over the incorrect peak, thereby reducing the structure factor phase ambiguity corresponding to the reciprocal lattice vector k.

Despite the wider peaks of the phase probability distributions of Figures 4A-4C, the resultant phase probability distribution of Figure 4D is weighted more heavily to a correct phase 10 than is the original phase probability distribution of Figure 4A. The original phase probability distribution, as illustrated in Figure 4A, has its centroid at 100 degrees (far away from the true value of 30 degrees) and a FOM value of 0. 28. However, the resultant phase probability l distribution, as illustrated in Figure 4D, has its centroid at 89 degrees (approximately 11 degrees closer to the true value of 30 degrees). For essentially complete suppression of the incorrect peak 15 mode of a bimodal original phase probability distribution, the widths of the peaks in the original phase probability distributions should be less than approximately tP,, (my + gI-h), where Elk and tP h represent the positions of the incorrect phase peak modes in the original phase probability distributions P(Gk), P(-h-) for the reciprocal lattice vectors k and - h, respectively. k-h can be the position of either the correct or incorrect phase mode for the 20 reciprocal lattice vector k - h. Although this condition may not always be met, as schematically illustrated by the original phase probability distributions of Figures 4A4C, a typical x-ray crystallography data set contains enormous numbers of redundant reciprocal lattice vector triplets.

In certain embodiments, these reciprocal lattice vector triplets can be combined using a phase equation or inequality to reduce the structure factor phase ambiguity corresponding to a single 25 reciprocal lattice vector. Typically, where the reciprocal lattice vectors are related according to their Miller indices, the structure factors are also related. In such embodiments, the cumulative analysis of multiple reciprocal lattice vector triplets as outlined above can substantially minimize the structure factor phase ambiguity even when the original phase probability distributions are extremely wide. Using multiple redundant reciprocal lattice vector triplets can produce a resultant 30 phase probability distribution which is analogous to that produced by analyzing multiple heavy atom isomorphs. Thus, the structure factor phase ambiguity can be reduced for all reciprocal lattice vectors by scanning the entire x-ray crystallography data set for reciprocal lattice vector -14

triplets k, - h, and k - h. In certain embodiments, the procedure can be iterated until a self-

consistent, converged solution is found. Furthermore, in embodiments in which multiple heavy atom derivatives are available, using the above procedures improves the efficiency and accuracy of the analysis because the accuracy of the resultant phase probability distributions produced in the 5 initial SIR analysis can be improved.

Figure 5 is a flowchart of one embodiment of a method 200 of defining a structure factor phase for a reflection derived from x-ray crystallography data. The method 200 comprises generating a first probability distribution for the structure factor phase of the reflection in an operational block 210. The method 200 further comprises generating two or more additional 10 probability distributions for the structure factor phases of other reflections in an operational block 220. The method 200 further comprises identifying a relationship between the structure factor phase for the reflection and the structure factor phases of the other reflections in an operational block 230. The method 200 further comprises calculating a composite probability distribution for the structure factor phase of the reflection in an operational block 240. The composite probability 15 distribution is derived from the first probability distribution for the structure factor phase of the reflection and the two or more additional probability distributions for the structure factor phases of the other reflections.

In certain embodiments, generating the first probability distribution for the structure factor phase of the reflection of the operational block 210 is performed as described above. Similarly, 20 generating two or more additional probability distributions for the structure factor phases of other reflections of the operational block 220 is performed as described above.

In certain embodiments, identifying the relationship between the structure factor phase for the reflection and the structure factor phases of the other reflections of the operational block 230 is performed by identifying a phase equation or inequality as described above. For example, the 25 relationship can be identified to be the phase addition relationship expressed by Equation 2.

Alternatively, in other embodunents, the relationship between structure factor phases can be expressed by the so-called tangent formula: HI E iEiEi I so ±h) Equation4: tg(<P) = -

1 E-hEkEk-i | COS(k +' h) where Ek represents the structure factor Fit ire which the scattering factor has been set to one.

30 Equation 4 is based on the assumption that HE hErEi h has vanishing phase, and that At| E h-EiEk i | Sm( + t + ' i) -15

In certain embodiments, calculating the composite probability distribution for the structure factor phase of the reflection of the operational block 240 is performed by combining the original phase probability distribution with a phase equation or inequality and producing a resultant phase probability distribution as described above. For example, the phase addition relationship of 5 Equation 2 can be combined with the original phase probability distribution, thereby producing Equation 3 for the resultant phase probability distribution which can be solved. Alternatively, in other embodiments in which the relationship between structure factor phases is provided by the tangent formula of Equation 4, the composite probability distribution can be expressed in the following form: Equation 5: P((3h) = Ph (ash i Ok |dl i) APE (car)Pr (at-h)) X I| Ei Ek -h- | S}n(k, + k,-h) l At| E4Ei, h- I COS(, , + r,-h-), where P(h) is the composite probability distribution and S(x) is the delta function. In certain 15 embodiments, the delta function can be replaced by a Gaussian function to account for experimental errors, errors in the model, and missing reflections.

In certain embodiments, the composite probability distribution is calculated in the operational block 240 by minimizing a penalty function based on the tangent formula and the probability distributions for the structure factor phases. The penalty function of certain 20 embodiments has the following form: E = Kit | EkEk-h | Sin(k + Elk -k)-tg(h) |EkEk-h | COS (ark + k-h)3 Equation 6: h k k -K2 [ah cost) + bh sing) + Ch C S(2h) + dh sin(2h)] b In certain embodiments, Monte Carlo techniques can be utilized to start from an initial 25 guess for a set of structure factor phases. The Monte Carlo techniques are related to those used in simulations of annealing procedures, as described by Glykos and Koldcinidis in Acta Cryst., Vol. D56, page 169, (2000), which is incorporated by reference herein in its entirety. In other embodiments, other optimization techniques can be used.

-16

Figures 6A-6D and 7A-7D schematically illustrate an example of an embodiment of the present invention as applied to experimental data from the Protein Data Bank, code entry 3APP corresponding to x-ray diffraction data from penicillopepsin, as published by Sielecki and James in J. Mol. Bio., volume 163, page 299 (1983), which is incorporated by reference herein in its 5 entirety. Figures 6A-6C schematically illustrate the phase probability distributions for the k = (9, 3, 0), - h = (-7, -1, 0), and k - h = (2, 2, 0) reciprocal lattice vectors, respectively. The original phase probability distribution for the reciprocal lattice vector k in Figure 6A is bimodal with a first peak mode centered at approximately 50 degrees and a second peak mode centered at approximately 210 degrees with an intensity approximately equal to that of the first peak. The 10 probability distributions for the structure factor phases for the reciprocal lattice vectors - h and k - h in Figures 6B and 6C respectively are substantially unimodal. As can be seen in the resultant phase probability distribution for the reciprocal lattice vector k in Figure 6D, the intensity of the second peak mode has nearly disappeared, and the first peak has been sharpened somewhat. 15 For the purposes of comparison, density modification techniques can be used as an alternative method for refining the phase probability distribution. Density modification techniques have several sub-categories, based on assumptions such as non-crystallographic symmetry, solvent flattening, non-negativity of electron distributions, etc. A description of density modification

techniques is provided by "Principles of Protein X-Ray Crystallography" by Jan Drenth, Chapter 8, 20 pages 183-198, Springer-Verlag, New York, 1999, which is incorporated in its entirety by reference herein. The original phase probability distribution, illustrated in Figure 6A, has a centroid at 129 degrees (far away from the value obtained from the density modification technique of 56 degrees) and a FOM value of 0.19. However, the resultant phase probability distribution, illustrated in Figure 6D, has a centroid at 76 degrees (closer to the density modification value of 56 25 degrees) and a FOM value of 0.B0. Therefore, the resultant phase probability distribution for the reciprocal lattice vector k has a structure factor phase ambiguity which is smaller than that of the original phase probability distribution for the reciprocal lattice vector k. In addition, the centroid of the resultant phase probability distribution for k = (9, 3, 0) is in better agreement with that of the phase obtained from the density modification technique, which is schematically illustrated in 30 Figure 8.

Similarly, Figures 7A-7C schematically illustrate the phase probability distributions for the k = (9, 3, O), - h - (-5, -1, 0), and k - h = (4, 2, O) reciprocal lattice vectors, respectively.

However, the phase probability distribution for the reciprocal lattice vector - h in Figure 7B is -17

substantially birnodal while the phase probability distribution for the kh in Figure 7C is substantially ununodal but broad. As can be seen in the resultant phase probability distribution for the reciprocal lattice vector k in Figure 7D, the intensity of the second peak mode still exists but has been reduced as compared to the intensity of the first peak, and the first peak has been 5 sharpened somewhat.

The original phase probability distribution, illustrated in Figure 7A, has a centroid at 129 degrees (far away from the value obtained from the density modification technique of 56 degrees) and a FOM value of 0.19. However, the resultant phase probability distribution, illustrated in Figure 7D, has a centroid at 98 degrees (closer to the density modification value of 56 degrees) and 10 a FOM value of 0.43. Therefore, the resultant phase probability distribution for the reciprocal lattice vector k has a structure factor phase ambiguity which is smaller than that of the original phase probability distribution for the reciprocal lattice vector k. Again, the centroid of the resultant phase probability distribution for k = (9, 3, 0) is in better agreement with that of the l phase obtained from density modification technique, which is schematically illustrated in Figure 8.

15 Figures 9A-9C schematically illustrate the phase probability distributions for the k = (6, 4, o), - h = (-4, -2, a), and k - h = (2, 2, o) reciprocal lattice vectors, respectively. The original phase probability distribution for the reciprocal lattice vector k in Figure 9A is bimodal with a first peak mode centered at approximately 150 degrees and a second peak mode centered at approximately 315 degrees with an intensity approximately equal to that of the first peak The 20 probability distributions for the structure factor phases for the reciprocal lattice vectors k and - h in Figures 9B and 9C respectively arc substantially unimodal, but broad. As can be seen in the resultant phase probability distribution for the reciprocal lattice vector k in Figure 9D, the intensity of the second peak mode has been eliminated as compared to the intensity of the first peak and the first peak has been sharpened somewhat 25 The original phase probability distribution, illustrated in Figure 9A, has a centroid at 220 degrees (far away from the value obtained from the density modification technique of 148 degrees) and a FOM value of 0.074. However, the resultant phase probability distribution, illustrated in Figure 9D, has a centroid at 136 degrees (closer to the density modification value of 148 degrees) and a FOM value of 0.88. Therefore, the resultant phase probability distribution for the reciprocal 30 lattice vector k has a structure factor phase ambiguity which is smaller than that of the original phase probability distribution for the reciprocal lattice vector k. The centroid of the resultant phase probability distribution for k = (6, 4, a) is in better agreement with that of the phase obtained from the density modification technique, as schematically illustrated in Figure 9E.

-18

As a further example of an embodiment of the present invention, an artificial one-

dimensional electron distribution composed of 10 randomly positioned atoms, as schematically illustrated in Figure lOA, was used to compute the corresponding structure factors, and then to back-compute the electron distribution from the structure factors. All scattering factors were set 5 equal to one, as well as the temperature factors and occupancies. The structure factors were also used in conjunction with the tangent formula of Equation 4 for comparison. Figure lOB schematically illustrates the correlation between the "calculated" structure factor phases produced by the tangent fonnula used by an embodiment of the present invention and the ''true" structure factor phases computed from the electron distribution. As can be seen from Figure lOB, the 10 embodiment of the present invention yielded structure factor phases which had a correlation with the true phases of nearly one.

The subset of low-order structure factor phases from the embodiment of the present invention were then used to calculate the electron distribution, as schematically illustrated in Figure lOC. In calculating the phase probability distribution of Figure lOC, negative values for 15 electron densities were excluded, which is a physical constraint. Since the phase probability distribution of Figure lOC was obtained from a truncated set of structure factors which are actually used in the Monte Carlo optimization, it has a reduced resolution as compared to Figure lOA. A comparison of the original electron distribution of Figure lOA and the resultant electron distribution of Figure lOC reveals some correlation. This correlation is highlighted by comparing 20 the original electron distribution of Figure lOA with the calculated electron distribution of Figure IOD, which schematically illustrates the electron distribution calculated from the structure factors with phases set to random numbers between -180 degrees and 180 degrees. Figure lOD was also calculated by excluding negative values for electron densities. The reduction of correlation with the original electron distribution of Figure lOA by ignoring the phases resulting from the 25 embodiment of the present invention provides furler support for the validity of the structure factor phases produced by embodiments of the present invention.

As described above, an analysis of x-ray diffraction reflections from a crystal results in an indexed set of complex numbers, called structure factors, from which characteristics of the atomic configuration within the crystal can be derived. In three dimensions, the structure factors Fh are 30 indexed by a triplet of integer indices h, k, 1, which correspond to the three orthogonal directions in reciprocal space. Higher indices correspond to structure factors which provide information with better spatial resolution of the atomic configuration within the crystal.

The nature of the experimental process limits the maximum values of the h, k, and I indices for which structure factors can be accurately derived. embodiments of the invention, resolution 35 is improved despite experimental limitations by using experimentally determined structure factors -19

to derive approximate values for the structure factors that cannot be or were not experimentally determined. In advantageous embodiments of the invention, the value of an unknown structure factor is derived from a linear combination of other structure factors having experimentally determined values. As described in further detail below, the coefficients of the linear formula used 5 to derive unknown structure factor values are themselves derived from the experimentally determined structure factor values.

Figure 11 is a flowchart of one embodiment of a method 100 of using linear prediction analysis to define a first structure factor component for a first reflection from x-ray crystallography data. The x-ray crystallography data comprises a set of cognizable reflections. The method 100 10 comprises expressing the first structure factor component in an operational block 110 as a first linear equation in which the first structure factor component is equal to a sum of a first plurality of terms. Each term comprises a product of (1) a structure factor component for a cognizable reflection from the x-ray crystallography data, wherein the cognizable reflection has a separation in reciprocal space from the first reflection, and (2) a linear prediction coefficient corresponding to 15 the separation between the cognizable reflection and the first reflection. The method 100 Hurdler comprises calculating values for the linear prediction coefficients in an operational block 120. The method further comprises substituting the values for the linear prediction coefficients into the first linear equation in an operational block 130, thereby defining the first structure factor component for the first reflection.

20 In the operational block 110, the first structure factor component is expressed by a first linear equation as equal to a sum of a first plurality of terms. In certain embodiments, the first structure factor component is real or imaginary. Alternatively, in still other embodiments, the first structure factor component is the magnitude or the phase of the corresponding structure factor.

In certain embodiments, the first structure factor component Fhk' is expressed as the first 25 linear equation in the following form: Calf Equation 2: Fhil = a,F(h-,h)(k-'6t)(1-56l) -1 where NCo is the number of terms in the sum, and S66, shy, sA' represent the separation along 30 the axes a*, b*, and c* in reciprocal space between the first reflection and the cognizable reflection. To produce accurate values for nonexperimentally determined structure factors, the value of NCoef isgenerally at least as large as the number of scatterers in the unit cell, which for protein x-ray crystallography is typically several hundred to a several thousand.

-20

In the form of Equation 2 for the first structure factor component Fit,, each term comprises the product of two elements. One element is a structure factor component F(h-)(-3A,Xi-s6/) for a cognizable reflection from the x-ray crystallography data which is separated in reciprocal space from the first reflection corresponding to the Fb structure factor 5 component. In certain embodiments, the structure factor components of the sum correspond to adjacent reflections in reciprocal space (for example, A=l, Ai=0, A'=0) Reverse linear prediction corresponds to negative values for one or more of Ah, At, or A'.

The other element is a linear prediction coefficient a' corresponding to the separation between the cognizable reflection and the first reflection. As is described below, these linear 10 prediction coefficients as are initially unknown, but can be solved for using various methods. In the form of Equation 2, the first structure factor component is expressed as a linear equation comprising a linear combination of other structure factor components with indices which are less I than the indices for the first structure factor.

As an example of a first linear equation in accordance with embodiments of the present 15 invention, the FN! reflection (where k and l are constants) can be expressed as a linear combination of the structure factor components F(N-)kr Equation 3: FNkl = al F(N-I) + a2F(N-2)U + a3F(N-3)k! + À -- + an - v F(N_N -) 20 In this example, the structure factor components F(N_S)U are selected along a direction parallel to the a* axis in reciprocal space (i.e., Ah = I, and A = A' = 0), and s represents the number of steps along this direction. In certain embodiments, the structure factor components FN-.)kr are known, but the linear prediction coefficients as are not known. While in principle, structure factor components for cognizable reflections with all combinations of Ah Ok. 6' can be used in the 25 first linear equation, in certain embodiments, only a subset will be useful due to missing or erroneous experimental data corresponding to certain reflections.

As a simple one-dimensional example, structure factors F1 through Flo may be known, and it may be desired to predict the value of F1 1 A series of linear equations may be formed as follows: Equation 4: Fs = a,F4 + a2F3 + a3F2 F6 = Aft + a2F4 + a3F3 -21

F7 = 4F6 + a2F5 + a3F4 F8 = at, + at + a3Fs F9 - alF8 + a2F7 + at to = alF9 + at + As F2 through F1o are measured, known values, the three linear prediction coefficients al, a2, and a3 may be selected so as to force these six equations to be true with a nururnum total error. Once these linear prediction coefficients as have been selected, a value for unknown F1 1 is predicted with the formula: Equation 5: Fl, = alF,0 + a2F, + a3F8 Several techniques for determining linear prediction coefficients are described in further detail below with reference to Figures 2, 3, 4, and 5.

15 In certain embodiments which use Equation 2 to express the first structure factor component, the separation between the first reflection and each cognizable reflection has the same number of steps along each of the reciprocal space axes a*, b*, and c*, by virtue of using the single index s for all three components of the separation. other embodiments, two or three indices are used in place of the single index of Equation 2 to include cognizable reflections in the first linear 20 equation which have different numbers of steps along the three reciprocal space axes. Persons skilled in the art are able to express the first structure factor component as a first linear equation in accordance with these embodiments of the present invention.

It will be appreciated by those in the art that a variety of mathematical techniques for selecting a set of linear prediction coefficients as from already measured structure factor values 25 have been developed and may be used in embodiments of the invention. In general, the techniques involve selecting a set of linear prediction coefficients that predicts, with the least total error, a set of the known structure factor values from other known structure factor values using a series of linear equations of the form of Equation 2. This set of linear prediction coefficients is then used in the linear formula of Equation 2 to predict the value of an unknown structure factor component 30 Fhil from other known structure factor values. Such techniques have been applied in communication signal processing and analysis applications, but have never been utilized in the analysis of x-ray diffraction data.

In the operational block 120, values for the linear prediction coefficients are calculated.

Figure 12 is a flowchart of one embodiment of the calculation corresponding to operational block 35 120. In the embodiment illustrated in Figure 12, calculating values for the linear prediction coefficients comprises expressing a plurality of second structure factor components for a plurality -22

of second reflections from the set of cognizable reflections in an operational blowfly 121 as a plurality of second linear equations. In the plurality of second linear equations, each second structure factor component is equal to a sum of a second plurality of terms. Each term comprises a product of (1) a structure factor component for a cognizable reflection from the x-ray 5 crystallography data, wherein the cognizable reflection has a separation in reciprocal space from the second reflection, and (2) the linear prediction coefficient corresponding to the separation between the cognizable reflection and the second reflection. In certain embodiments, each of the second linear equations is similar in form to the first linear equation, using the linear prediction coefficients a, corresponding to the separation between the cognizable reflection and the second 10 reflection. Calculating values for the linear prediction coefficients further comprises solving the plurality of second linear equations for a set of values for the linear prediction coefficients iD an operational block 122.

Continuing the example described above, the plurality of second structure factor components can have the following form: Equation 6: F(N-I)Id = a F(N-2)itl + a2F(N 3)' + a3F(N-4))tl + À- - + aNco4rF[N(Ncqf +l)]1tl F(N-2) 1K alF(N-3) +a2F(N-4)U +a3F(N 5)H ±--+aN - 5N-(N - +2)]H F(N-3)1d alF(N4)H + a2F(N-5)U + a3F(N-6)L! + + aNCORF[N-(N14r+3)]H The second reflections are from the set of cognizable reflections, so each second structure factor component is capable of being measured or known. In embodiments in which the second structure 25 factor components are known, and are expressed as linear combinations of other known structure factor components, the only unknown parameters are the linear prediction coefficients a,.

In the operational block 122, the plurality of second linear equations is solved for a set of values corresponding to the set of coefficients. In embodiments in which there are Nof unknown linear prediction coefficients al, solving for a set of values utilizes at least Noon' independent 30 second linear equations. Persons skilled in the art are able to solve the plurality of second linear equations for the set of values for the linear prediction coefficients a, in accordance with embodiments of the present invention.

In the operational block 130, the set of values for the linear prediction coefficients a, are substituted into the first linear equation, thereby defining the first structure factor component -23

for the first reflection. In this way, the first structure factor component Fig is then expressed solely in terms of known parameters.

By using the structure factor components for reflections related to the reflection of the structure factor component Fh to be defined, embodiments of the present invention utilize linear 5 prediction to increase the number of observables used in the optimization of the molecular geometry. A relatively small extension (e.g., 20%) along all lines in reciprocal space will lead to large increases in the number of reflections because the number of reflections within a given volume of reciprocal space defined by a reciprocal lattice vector increases with the cube of the indices h,k,l of the reciprocal lattice vector. For example, an extension of the maximum 10 reciprocal lattice vector from Ashy +k2 +12 = 20 to 24 increases the number of reflections available for use in the optimization of the molecular geometry by (243 - 203) /203 so 70% Embodiments of the present invention can also extrapolate measured data to higher resolution. Reflections for reciprocal lattice vectors with larger indices h, k, I correspond to longer vectors in reciprocal space, which imply shorter distances in direct space. In this way, a significant 15 improvement in resolution can be achieved. For example, when the length of the unit cell of a hypothetical one-dimensional crystal is a= 50 A, the corresponding reciprocal unit cell edge is a* = 0. 02 A-l, and the resolution for h = 20 is d = 2.5 A. A 20% increase of h (i.e., from 20 to 24) improves the resolution to d = 2.08 A. Embodiments of the present invention can also be used to complement incomplete or 20 erroneous x-ray crystallography data sets. Embodiments of the present invention can provide a method to detect and replace "outlier" reflections, i.e., measured reflections which, for one reason or another, are aberrant or erroneous. this way, hidden experimental errors can be identified and eliminated or corrected. Such utility is particularly important with regard to multiple isomorphous replacement (MIR) analysis and multiple anomalous diffraction (MAD) analysis.

25 With regard to missing reflections, embodiments of the present invention can be used to interpolate to provide the missing reflections and to improve data completion within each resolution shell. This utility of embodiments of the present invention can be important when resolution shells contain too few data for cross-validation. Resolution shells are concentric spheres in reciprocal space, designed so that each shell contains an approxnately equal number of 30 reflections. Shells with smaller diameters correspond to lower resolution, while shells with larger diameters correspond to higher resolution. The division of the reciprocal space into resolution shells is equivalent to division of the resolution axis into subintervals. Embodiments of the present invention can also be used to evaluate the zeroth-order reflection | Fooo I and enable subsequent -24

absolute scaling of the set of measured reflections based on the known total number of electrons in the unit cell.

Figure 13 is a flowchart of one embodiment of the calculation corresponding to operational block 120. In the embodiment illustrated in Figure 13, calculating values for the linear prediction 5 coefficients comprises expressing a first subset of the cognizable structure factor components as vector elements of a first vector in an operational block 221. Calculating values for the linear prediction coefficients further comprises expressing a second subset of the cognizable structure factor components as vector elements of a second vector in an operational block 222. Calculating values for the linear prediction coefficients further comprises expressing the first vector in a matrix 10 equation as being equal to the product of a matrix and the second vector in an operational block 223. The matrix comprises the linear prediction coefficients, with each linear prediction coefficient corresponding to a separation in reciprocal space between the cognizable reflection corresponding to one cognizable structure factor component from the first vector and the cognizable reflection corresponding to one cognizable structure factor component from the second l 15 vector. Calculating values for the linear prediction coefficients further comprises solving the matrix equation for values of the linear prediction coefficients in an operational block 224.

In certain embodiments, a first subset of the cognizable structure factor components are expressed as vector elements of a first vector in the operational block 221, and a second subset of the cognizable structure factor components are expressed as vector elements of a second vector in 20 the operational block 222. For example, where k and l are constants such as in the example described above, the first vector can have the following form: Equation 7: |Fn) = |F(N-)IF(N-2)kI,F(N-3)---) 25 and the second vector can have the following form: Equation 8: |Fmk') = |F(N-2)k] F(N-3)1U F(N-44 ---) In certain embodiments, the first vector IFS,) is expressed in the operational block 223 as a 30 matrix equation in which the first vector |Fni) is equal to the product of a matrix M,,,n and the second vector IFmk,). Continuing the example from above, the matrix equation can have the following form: Equation 9: -25

|Fn,t,) = M',nlFm,v) al al al l O a, at as IF(N-I)kLJF(N-2)AUF(-3)U) 0 0 a, a2 a3 I F(N-Z) F(N-3)}dF(N-4)---) I O O O a, al In the operational block 224, the matrix equation is solved for values of the linear prediction 5 coefficients. Persons skilled in the art are able to solve the matrix equation and substitute the resulting values into the linear equation in accordance with embodiments of the present invention to define the first structure factor component Fh. Persons skilled in the art are also able to recognize the equivalence of the two embodiments of the example described above.

Figure 14 is a flowchart of one embodiment of the calculation corresponding to operational 10 block 120. In the embodiment illustrated in Figure 14, calculating values for the linear prediction coefficients comprises expressing a first subset of the cognizable structure factor components as matrix elements of a first matrix in an operational block 321. Calculating values for the linear prediction coefficients further comprises expressing a second subset of the cognizable structure factor components as vector elements of a first vector in an operational block 322. Calculating 15 values for the linear prediction coefficients further comprises generating a second matrix representing a generalized inverse of the first matrix in an operational block 323. Calculating values for the linear prediction coefficients further comprises expressing the linear prediction coefficients as vector elements of a second vector in an operational block 324. Calculating values for the linear prediction coefficients further comprises equating the second vector to the product of 20 the second matrix and the first vector in an operational block 325, thereby generating values for the linear prediction coefficients.

In certain embodiments, in the operational block 321, a first subset of the cognizable structure factor components is expressed as matrix elements of a first matrix M,,,,,, in the operational block 322, a second subset of the cognizable structure factor components is expressed 25 as vector elements of a first vector |Fn), and in the operational block 324, the linear prediction coefficients a' are expressed as vector elements of a second vector | a,). For example, where k and l are constants such as in the example described above, the first matrix Mnm can have the following form: 30 Equation 10: Mnm =||17+N--Gil In=l (No Ncorf)m=l Scoff|| -26

the first vector can have the following form: Equation 11: |Fnt)=|F(N - + I)'FN d), and the second vector can have the following form: Equation 12: | a,) = | al, a2,. À.aN,,), I O where NTIlI] is typically on the order of tens of thousands.

In certain embodiments, in the operational block 323, the second matrix (Mn-)' represents a generalized inverse of the first matrix Mn. The values of the linear prediction coefficients a, are then generated in the operational block 325 by equating the second vector | a,) to the product of the second matrix (Mnm) and the first vector | Fin) Equation 13: | a,) = (Mnm) | Fnl/) I By substituting the values of the coefficients into the linear equation, the first structure factor component for the first reflection can be defined.

20 Figure I 5 is a flowchart of one embodiment of the calculation corresponding to operational block 120. In the embodiment illustrated in Figure 15, calculating values for the linear prediction coefficients comprises defining a matrix having matrix elements in an operational block 421. Each matrix element comprises an autocorrelation function between selected structure factor components.

Calculating values for the linear prediction coefficients further comprises expressing the linear 25 prediction coefficients as vector elements of a first vector in an operational block 422. Calculating values for the linear prediction coefficients further comprises solving a matrix equation for values for the linear prediction coefficients in an operational block 423. The matrix equation expresses the product of the mature and the first vector as equal to a second vector with constant vector elements.

In certain embodiments, the autocorrelation functions of the matrix in the operational block 30 421 have the following fonn: Equation 14: Eli N + 1 j tF(h-.AhXt-.Ail-s6)ph-tJ+i)-('j)iXi-+i)Ai] Autocorrelation functions of this form represent autocorrelations between structure factor components 3 5 along a selected line in reciprocal space.

certain embodiments, the matrix in the operational block 421 has the following coma: -27

0 1 2 AND

1 0 1 IN - -I

Equation 15: ME t;2 1 to AN - -2 AND AN - -1 AN - -2 0

Such a matrK is a symmetric Toeplitz matrix (i.e., a matrix whose elements are constant along 5 diagonals).

In certain embodiments, the linear prediction coefficients a, are expressed in the operational block 422 as vector elements of the first vector |1,a,a2,..,aN), and in the operational block 423, a matrix equation of the following fonn is solved for values of the linear prediction coefficients: 0 1 2 ANN 1 an (at, Do A, À-- GIN--' I a, O 10 Equation 16: (I)2 ' to AN - -2 as = O. )Nco/ (Pigeon Nooc-2 gO (aN O In Equation 16, as is a dummy value, as described in 'Numerical Recipes in C, The Art of Scientific Progrmummg," by W.H. Press, B.P. FlanneTy, S.A. Teukolsky, and W.T. Vetterling, Cambridge University Press, Cambridge, 1989, pages 452-464, which is incorporated in its entirety by reference 1 5 herein.

As with all recursive (i.e., infinite-impulse response) digital filters, solving the matrix equation described above is vulnerable to instabilities and divergences. In certain embodiments, solving the matrix equation of Equation 16 comprises limiting instabilities and divergences by calculating complex roots of a characteristic polynomial equation in a complex plane and forcing all 20 complex roots into a unit circle in the complex plane. Stability is increased by calculating the complex roots of the following characteristic polynomial equation: Equation 17: zNcotl Ma N - -} o 1.t 25 and forcing all the solutions into the unit circle in the complex plane Z. This result is achieved by moving the roots of the characteristic polynomial onto the unit circle, or more preferably by reflecting them into the unit circle (i.e., by replacing z with l/z9. The linear prediction analysis of embodiments of the present invention extrapolates from the known structure factor components using -28

the characterization of the known structure factor components in terms of the poles in the complex plane, which differs from techniques such as the maximum entropy method.

An example of this embodiment is provided by Figures 16A and 16B. Figure 16A schematically illustrates an electron distribution of a hypothetical one-dimensional system of ten 5 atoms along a line segment of unit length. For simplicity, all atoms are assigned unit scattering factors and the temperature factors Tj have been set to facilitate visual inspection. The electron distribution schematically illustrated in Figure 16A is then used in an embodiment of the present invention to compute a set of 66 structure factor components corresponding to Miller indices h=1,...,66. Structure factor components h=42,..,66 were estimated by means of linear prediction, 10 using the 40 data points h=1,,40 and 20 poles. Figure 16B schematically illustrates the agreement between the true values for the structure factor components and the corresponding linear prediction estimates from this embodiment. The resulting agreement has a correlation coefficient of approximately 0.83. Similarly, in another example embodiment of the present invention, Figures 17A and 17B schematically illustrate another hypothetical one-dimensional electron distribution and the 15 agreement between the true structure factor components h=42,...,66 and the same structure factor components estimated using linear prediction from structure factor components h=1,...,40 and 20 poles. The resulting agreement has a correlation coefficient of approximately 0.97.

Figure 18A schematically illustrates another example embodiment of a hypothetical one-

dimensional electron distribution with ten atoms. In this embodiment, structure factor components 20 h=27,,30 were estimated using 25 data points (h=1,,25) and 20 poles. The resulting agreement between true and estimated structure factor components schematically illustrated in Figure 1 8B has a correlation coefficient of approximately 0.98.

Figure l9A schematically illustrates another example embodiment of a hypothetical one-

dimensional electron distribution with thirty atoms. In this embodiment, structure factor components 25 h=92,,100 were estimated using 90 data points (h=1,,9o) and 30 poles. The resulting agreement between true and estimated structure factor components schematically illustrated in Figure l9B has a correlation coefficient of approximately 0.78.

Figure 20A schematically illustrates another example embodiment of a hypothetical one-

dimensional electron distribution with thirty atoms. In this embodiment, structure factor components 30 h=92,,100 were estimated using 90 data points (h=1,,90) and 35 poles. The resulting agreement between true and estimated structure factor components schematically illustrated in Figure 20B has a correlation coefficient of approximately 0.78.

Figure 21A schematically illustrates an example embodiment of a onedimensional projection of a hypothetical three dimensional electron distribution with 500 atoms created in a cube 35 with unit edges. For simplicity, all atoms are assigned unit scattering factors and the temperature -29

factors Tj have been set to facilitate visual inspection. In this embodiment, structure factor components (h, k, (17, 1, 2) and (18, 1, 2) were estimated using 15 data points (h=1,...,15; k=1; l=2) and 5 poles. The resulting agreement between true and estimated structure factor components schematically illustrated in Figure 21B.

5 Figure 22A schematically illustrates another exernple embodiment of a oneimensional projection of a hypothetical three-dimensional electron distribution with 500 atoms created in a cube with unit edges. For simplicity, all atoms are assigned unit scattering factors and B-scaling factors equal to 0.01. In this embodiment, structure factor components (h, k 1)=(18, O. O) and (19, O. O) were estimated using 16 data points (h=1,.

,16; k=O; l=O) and 4 poles. The resulting agreement between I O true and estimated structure factor components is schematically illustrated in Figure 22B...DTD: This invention may be embodied in other specific forms without departing from the essential characteristics as described herein. The embodiments described above are to be considered in all respects as illustrative only and not restrictive in any manner. The scope of the invention is indicated by the following claims rather than by the foregoing description. Any and all changes which come

15 within the meaning and range of equivalency of the claims are to be considered within their scope.

-30

Aspects and features of the invention are set out in the following numbered clauses: 1. A method of reducing structure factor phase ambiguity corresponding to a selected reciprocal lattice vector, the method comprising: generating an original phase probability distribution corresponding to a selected 5 structure factor phase of the selected reciprocal lattice vector, the original phase probability distribution comprising a first structure factor phase ambiguity; combining the original phase probability distribution with a plurality of phase probability distributions of a plurality of structure factor phases of other reciprocal lattice vectors using a phase equation or inequality, the phase equation or inequality defining a 10 mathematical relationship between the selected structure factor phase of the selected reciprocal lattice vector and the plurality of structure factor phases of other reciprocal lattice vectors; and producing a resultant phase probability distribution for the selected structure factor phase of the selected reciprocal lattice vector, the resultant phase probability distribution 15 comprising a second structure factor phase ambiguity which is smaller than the first structure factor phase ambiguity.

2. The method of clause 1, wherein the original phase probability distribution is substantially bimodal.

3. The method of clause 1, wherein the resultant phase probability distribution is 20 substantially urumodal.

4. The method of clause 1, wherein the resultant phase probability distribution is weighted more strongly to a correct phase than is the original phase probability distribution.

5. The method of clause 1, wherein the original phase probability distribution is generated by single isomorphous replacement, single anomalous dispersion, multiple isomorphous 25 replacement, or multiple anomalous dispersion.

6. The method of clause!, wherein the phase equation or inequality is the phase addition equation. 7. A method of defining a structure factor phase for a reflection derived from x-ray crystallography data, the method comprising: 30 generating a first probability distribution for the structure factor phase of the reflection; generating two or more additional probability distributions for the structure factor phases of other reflections; identifying a relationship between the structure factor phase for the reflection and 35 the structure factor phases of Me other reflections; and -31

calculating a composite proooility disbubon for the structure factor phase of Ike reflection, whereby the composite probability distribution is derived from the first probability distributions for the structure factor phase of the reflection and the two or more additional probability distributions for the structure factor phases of the other reflections.

5 8. The method of clause7, wherein the first probability distribution is defined by a set of Hendrickson-Lattman coefficients.

9. The method of clause 8, wherein the set of Hendrickson-Lat'rr coefficients are generated by single isomorphous replacement, single anomalous dispersion, multiple isomorphous replacement, or multiple anomalous dispersion.

10 10. The method of clause 7, wherein the first probability distribution is substantially bimodal. 11. The method of clause7, wherein the composite probability distribution is substantially unimodal. 12. The method of clause 7, wherein the relationship between the structure factor phase for 15 the reflection and the structure factor phases for the other reflections is additive.

13. The method of clause 12, wherein the relationship is given by the phase addition equation. 14. A computer readable medium having instructions stored thereon which cause a general purpose computer to perform a method of reducing structure factor phase ambiguity corresponding 20 to a selected reciprocal lattice vector, the method comprising: generating an original phase probability distribution corresponding to a selected structure factor phase of the selected reciprocal lattice vector, the original phase probability distributioncomprising a first structure factor phase ambiguity; combining the original phase probability distribution with a phase equation or 25 inequality, the phase equation or inequality defining a mathematical relationship between the selected structure factor phase of the selected reciprocal lattice vector and a set of structure factor phases of other reciprocal lattice vectors; and producing a resultant phase probability distribution for the selected structure factor phase of the selected reciprocal lattice vector, the resultant phase probability distribution 30 comprising a second structure factor phase ambiguity which is smaller than the first structure factor phase ambiguity.

15. A computer-implemented Tray crystallography analysis system comprising: an original phase probability distribution generator for generating an original phase probability distribution corresponding to a selected structure factor phase of the -32

selected reciprocal lattice vector, the original phase probability distribution comprising a first structure factor phase ambiguity; a combination module for combining the original phase probability distribution with a phase equation or inequality, the phase equation or inequality defining a 5 mathematical relationship between the selected structure factor phase of the selected reciprocal lattice vector and a set of structure factor phases of other reciprocal lattice vectors; and a resultant phase probability distribution producer for producing a resultant phase probability distribution for the selected structure factor phase of the selected reciprocal 10 lattice vector, the resultant phase probability distribution comprising a second structure factor phase ambiguity which is smaller than the first structure factor phase ambiguity.

16. A computer-implemented x-ray crystallography analysis system comprising: a means for retreiving a first phase probability distribution corresponding to a selected structure factor phase of a selected reciprocal lattice vector; I 15 a means for retreiving a plurality of second phase probability distributions | corresponding to other structure factor phases of other reciprocal lattice vectors; and a means for combining the first phase probability distribution and plurality of second phase probability distributions so as to produce a resultant phase probability distribution for the selected structure factor phase of the selected reciprocal lattice vector.

20 17. A method of refining Gray diffraction data, the method comprising combining structure factor phase probability distributions for different reciprocal lattice vectors so that the structure factor phase probability distribution for at least one of the reciprocal lattice vectors is more heavily weighted toward a phase value.

18. A method of using linear prediction analysis to define a first structure factor 25 component for a first reflection from Gray crystallography data, the x-ray crystallography data comprising a set of cognizable reflections, the method comprising: expressing the first structure factor component as a first linear equation in which the first structure factor component is equal to a sum of a first plurality of terms, each term comprising a product of (1) a structure factor component for a cognizable reflection from 30 the x-ray crystallography data, wherein the cognizable reflection has a separation in reciprocal space from the first reflection, and (2) a linear prediction coefficient corresponding to the separation between the cognizable reflection and the first reflection; calculating values for the linear prediction coefficients; and substituting the values for the linear prediction coefficients into the first linear 35 equation, thereby defining the first structure factor component for the first reflection.

-33

( 19. The method of clausel8, wherein the first structure factor component is real.

20. The method of clausel 8, wherein the first structure factor component is irnary.

21. The method of clause 18, wherein the first structure factor component is a magnitude.

22. The method of clause 18, wherein the first structure factor component is a phase 5 23. The method of clause 18, wherein calculating values for the linear prediction coefficients comprises: expressing a plurality of second structure factor components for a plurality of second reflections from the set of cognizable reflections as a plurality of second linear equations in which each second structure factor component is equal to a sum of a second 10 plurality of terms, each tenn comprising a product of (1) a structure factor component for a cognizable reflection from the xray crystallography data, wherein the cognizable reflection has a separation in reciprocal space from the second reflection, and (2) the linear prediction coefficient corresponding to the separation between the cognizable reflection and the second reflection; and 15 solving the plurality of second linear equations for the values for the linear prediction coefficients.

24. The method of clause 18, wherein calculating values for the linear prediction coefficients comprises: expressing a first subset of the cognizable structure factor component as vector 20 elements of a first vector, expressing a second subset of the cognizable structure factor components as vector elemenb of a second vector; expressing the first vector in a matrix equation as being equal to the product of a matrix and the second vector, wherein the matrix comprises matrix elements comprising 25 the linear prediction coefficient, such that each matrix element comprises the linear prediction coefficient corresponding to a separation in reciprocal space between a corresponding cognizable reflection from the second vector and a corresponding cognizable reflection from the first vector; and solving the matrix equation for values of the linear prediction coefficients.

30 25. The method of clause 18, wherein calculating values for the linear prediction coefficients comprises: expressing a first subset of the cogmzable structure factor components as matrix elements of a first matrix; expressing a second subset of the cognizable structure factor components as vector 35 elements of a first vector; -34

generate a second matrix representing a generalized inverse of the first matrix; expressing the linear prediction coefficients as vector elements of a second vector, and equating the second vector to the product of the second matrix and the first vector, 5 thereby generating the values for the linear prediction coefficients.

26. The method of clause 18, wherein calculating values for the linear prediction coefficients comprises: defining a matrix having matrix elements, each matrix element comprising an autocorrelation function between selected structure factor components; 10 expressing the linear prediction coefficients as vector elements of a first vector; solving a matrix equation for values for the linear prediction coefficients, the matrix equation expressing the product of the matrix and the first vector as equal to a second vector with constant vector elements.

27. The method of clause26, wherein the matrix elements are constant along diagonals of 15 the matrix.

28. The method of clause 26, wherein solving the matrix equation comprises limiting instabilities and divergences by calculating complex roots of a characteristic polynomial equation in a complex plane and forcing all complex roots into a unit circle in the complex plane.

29. A method of refining Tray diffraction data comprising deriving a value of a first 20 structure factor from a linear combination of other structure factors.

30. The method of clause29, wherein said other structure factors comprise a series of structure factors which are adjacent to said first structure factor in reciprocal space.

31. A computer readable medium having instructions stored thereon which cause a general purpose computer to perform a method of using linear prediction analysis to define a first structure 25 factor component for a first reflection from x-ray crystallography data, the Tray crystallography data comprising a set of cognizable reflections, the method comprising: expressing the Fiat structure factor component as a first linear equation in which Me first structure factor component is equal to a sum of a first plurality of terms, each term comprising a product of (1) a structure factor component for a cognizable reflection from 30 the x-ray crystallography data, wherein the cognizable reflection has a separation in reciprocal space from the first reflection, and (2) a linear prediction coefficient corresponding to the separation between the cognizable reflection and the first reflection; calculating values for the linear prediction coefficients; and substituting the values for the linear prediction coefficients into the first linear 35 equation, thereby defining the first structure factor component for the first reflection.

-35

( 32. A computer-implemented x-ray crystallography analysis system comprising: a structure factor component generator for generating a first structure factor component for a first reflection from x-ray crystallography data using linear prediction analysis, the x-ray crystallography data comprising a set of cognizable reflections, the first S structure factor component expressed as a first linear equation in which the first structure factor component is equal to a sum of a first plurality of terms, each term comprising a product of (1) a structure factor component for a cognizable reflection from the x-ray crystallography data, wherein the cognizable reflection has a separation in reciprocal space from the first reflection, and (2) a linear prediction coefficient corresponding to the 10 separation between the cognizable reflection and the first reflection; a calculating module for calculating values for the linear prediction coefficients; and a resultant structure factor component definer for defining the first structure factor component for the first reflection by substituting the values for the linear prediction I IS coefficients into the first linear equation.

33. A computer-implemented x-ray crystallography analysis system comprising: a means for generating a first structure factor component for a first reflection from x-ray crystallography data using linear prediction analysis, the x-ray crystallography data comprising a set of cognizable reflections, the first structure factor component expressed 20 as a first linear equation in which the first structure factor component is equal to a sum of a first plurality of terms, each term comprising a product of (1) a structure factor component for a cognizable reflection from the x-ray crystallography data, wherein the cognizable reflection has a separation in reciprocal space from the first reflection, and (2) a linear prediction coefficient corresponding to the separation between the cognizable reflection 25 and the first reflection; a means for calculating values for the linear prediction coefficients; and a means for defining the first structure factor component for the first reflection by substituting the values for the linear prediction coefficients into the first linear equation.

-36

Claims

( CLAIMS

1. A method of using linear prediction analysis to define a first structure factor component for a first reflection from x-ray crystallography data, the x-ray crystallography data comprising a set of coguzable reflections, the method comprising: expressing the first structure factor component as a first linear equation in which the first structure factor component is equal to a sum of a first plurality of terms, each term comprising a product of (1) a structure factor component for a cognizable reflection from the x-ray crystallography dam, wherein the cognizable reflection has a separation in reciprocal space from the first reflection, and (V a linear prediction coefficient corresponding to the separation between the cognizable reflection and the first reflection; 10 calculating values for the linear prediction coefficients; and substituting the values for the linear prediction coefficients into the first linear equation, thereby defining the first structure factor component for the first reflection.

2. The method of Claim 1, wherein the first structure factor component is real.

3 The method of Claim I, wherein the first structure factor component is imagist.

15 4. The method of Claim I, wherein the first structure factor component is a magnitude.

5. The method of Claim I, wherein the first structure factor component is a phase.

6 The method of Claim 1, wherein calculating values for the linear prediction coefficients comprises: expressing a plurality of second structure factor components for a plurality of second reflections from the set of cognizable reflections as a plurality of second linear equations in which each second structure factor component is equal to a sum of a second plurality of teens, each term comprising a product of (1) a structure factor component for a cognizable reflection from the Gray crystallography data, wherein the cognizable reflection has a separation in reciprocal space from the second reflection, and (2) the linear prediction coefficient corresponding to the separation between Me cognizable reflection 25 and the second reflection; and solving the plurality of second linear equations for the values for the linear prediction coefficients.

7. The method of Claim 1, wherein calculating values for the linear prediction coefficients comprises: expressing a first subset of He cognizable structure factor components as vector 30 elements of a first vector, - 37

expressing a second subset of the cognizable structure factor components as vector elements of a second vector, expressing the first vector in a matrix equation as being equal to the product of a matrix and the second vector, wherein the matrix comprises matrix elements comprising the linear prediction coefficients, such that each matrix element comprises the linear prediction coefficient corresponding to a separation in reciprocal space between a corresponding cognizable reflection from the second vector and a corresponding cognizable reflection from the first vector; and solving the matrix equation for values ofthe linear prediction coefficients.

8. The method of Claim 1, wherein calculating values for the linear prediction coefficients comprises: 10expressing a first subset of the cognizable structure factor components as matrix elements of a first rnatnx; expressing a second subset of the cognizable structure factor components as vector I elements of a first vector; generate a second matrix representing a generalized inverse of the first matrix; 15expressing the linear prediction coefficients as vector elements of a second vector; and equating the second vector to the product of the second matrix and the first vector, thereby generating the values for the linear prediction coefficients.

9 The method of Claim 1, wherein calculating values for the linear prediction coefficients comprises: 20defining a matrix having matrix elements, each matrix element comprising an autocorrelation function between selected structure factor components; expressing the linear prediction coefficients as vector elements of a first vector; solvmg a matrix equation for values for the linear prediction coefficients, the matrix equation expressing the product of the matrix and the first vector as equal to a second vector with constant vector elements.

2510. The method of Claim 9, wherein the matrix elements are constant along diagonals of the matrix.

I 1. The method of Claim 9, wherein solving the matrix equation comprises limiting \ instabilities and divergences by calculating complex roots of a characteristic polynomial equation in a complex plane and forcing all complex roots into a unit circle in the complex plane.

\ 3012. A method of refining x-ray diffraction data comprising deriving a value of a first structure factor from a linear combination of other structure factors.

- 38

13 The method of Claim 12, wherein said other structure factors comprise a series of structure factors which are adjacent to said first structure factor in reciprocal space.

14. A computer readable medium having instructions stored thereon which cause a general purpose computer to perform a method of using linear prediction analysis to define a first structure factor component for a first reflection from x-ray crystallography data, the x-ray crystallography data comprising a set of cognizable reflections, the method comprising: expressing the first structure factor component as a first linear equation in which the first structure factor component is equal to a sum of a first plurality of terms, each tenn comprising a product of (1) a structure factor component for a cognizable reflection from the x-ray crystallography data, wherein the cognizable reflection has a separation in reciprocal space from the first reflection, and (2) a linear prediction coefficient 10 corresponding to the separation between the cogruzable reflection and the first reflection; calculating values for the linear prediction coefficients; and substituting the values for the linear prediction coefficients into the first linear equation, thereby defining the first structure factor component for the first reflection.

15. A computer-implemented x-ray crystallography analysis system comprising: 15 a structure factor component generator for generating a first structure factor component for a first reflection from x-ray crystallography data using linear prediction analysis, the x-ray crystallography data comprising a set of cogmzable reflections, the first structure factor component expressed as a feat linear equation in which the first structure factor component is equal to a sum of a first plurality of terms, each term comprising a 20 product of (1) a structure factor component for a cognizable reflection from the x-ray crystallography data, wherein the cognizable reflection has a separation in reciprocal space from the first reflection, and (2) a linear prediction coefficient corresponding to the separation between the cognizable reflection and the first reflection; a calculating module for calculating values for the linear prediction coefficients; and 25 a resultant structure factor component definer for defining the first structure factor component for the first reflection by substituting the values for the linear prediction coefficients into the first linear equation.

16. A computer-implemented x-ray crystallography analysis system comprising: a means for generating a first structure factor component for a first reflection from 30 x-ray crystallography data using linear prediction analysis, the x-ray crystallography data comprising a set of cognizable reflections, the first structure factor component expressed \ 39

as a fast linear equation m which Me fast structure factor COTnpOnent equal to a sum of a first plurality of terms, each term comprising a product of (1) a He factor compel for a cognizable reflocti" from the Gray yllography data, wherein the coatable reflection has a separation m reciprocal space from the fit reflection, and (2) a Diver prdictian coefficient c to the separation between the cognihlo reflecdan and the first reflections; 5 a meam for calculating value' for the linear prediction coefficient; and a moum for defining Be feet sbusburo factor compact far the f ret reflection by substituting the values for the linear prediction coofficients into tho Ant linear equation.

17. A method of using linear prediction analysis substantially as hereinbefore 10 described with reference to the accompanying drawings.

18. A computer readable medium having instructions stored thereon which cause a general purpose computer to perform the method of claim 17.

15 19. A computer-implemented x-ray crystallography analysis system programmed to perform the method of claim 17.