EP1277050A2 - Procede de determination de la forme tridimensionnelle d'une macromolecule - Google Patents

Procede de determination de la forme tridimensionnelle d'une macromolecule

Info

Publication number
EP1277050A2
EP1277050A2 EP00937870A EP00937870A EP1277050A2 EP 1277050 A2 EP1277050 A2 EP 1277050A2 EP 00937870 A EP00937870 A EP 00937870A EP 00937870 A EP00937870 A EP 00937870A EP 1277050 A2 EP1277050 A2 EP 1277050A2
Authority
EP
European Patent Office
Prior art keywords
molecule
protein
mass
distance
fragments
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00937870A
Other languages
German (de)
English (en)
Inventor
Bradford W. Gibson
Irwin D. Kuntz
Ning Tang
Gavin Dollinger
Connie M. Oshiro
Judith C. Hempel
Eric Taylor
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of California
Original Assignee
University of California
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of California filed Critical University of California
Publication of EP1277050A2 publication Critical patent/EP1277050A2/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6848Methods of protein analysis involving mass spectrometry
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6818Sequencing of polypeptides
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6842Proteomic analysis of subsets of protein mixtures with reduced complexity, e.g. membrane proteins, phosphoproteins, organelle proteins
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6848Methods of protein analysis involving mass spectrometry
    • G01N33/6851Methods of protein analysis involving laser desorption ionisation mass spectrometry
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01JELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
    • H01J49/00Particle spectrometers or separator tubes
    • H01J49/02Details
    • H01J49/04Arrangements for introducing or extracting samples to be analysed, e.g. vacuum locks; Arrangements for external adjustment of electron- or ion-optical components
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment

Definitions

  • Papac et al. "Epitope mapping of the gastrin-releasing peptide/anti-bombesin monoclonal antibody complex by proteolysis followed by matrix-assisted laser desorption ionization mass spectrometry.” Protein Sci. 1994 Sep;3(9): 1485-92; Cohen et. al. "Probing the solution structure of the DNA-binding protein Max by a combination of proteolysis mass spectrometry.” Protein Sci. 1995 Jun;4(6): 1088-99; Gomes et al. "Proteolytic mapping of human replication protein A: evidence for multiple structural domains and a conformational change upon interaction with single-stranded DNA. Biochemistry.
  • the present invention provides a fast and efficient method for determining the three-dimensional structure or conformation of a protein or other macromolecule.
  • the steps of the method of the invention include: 1) generating physical distance constraints, e.g., forming intramolecular chemical crosslinks of known length between residues of a protein; 2) enriching the number of the molecules that have intramolecular chemical crosslinks in the reaction pool, e.g., by size separation to remove proteins with intermolecular bonds; 3) exposing the enriched reaction pool to one or more protease that proteolyzes the protein at specific or non-specific sites to produce peptide fragments; 4) identifying the peptide fragments to determine linkage sites with a certain spatial relationship in the protein; and 5) interpreting the data produced to determine spatial geometry and protein structure based on the deduced spatial relationship of the linkage sites.
  • the information is preferably analyzed with aid of a computer system, which can be used to generate and/or analyze distance constraints and spatial geometry between domains and/or folds within a protein.
  • the obtained data is optionally compared to proteins of known structure, and structural modeling using techniques such as threading can be employed to aid in the determination of protein folding.
  • the combined use of these techniques provides a surprisingly accurate 3 -dimensional chemical structure much more quickly and efficiently than other conventional methods used currently in the field.
  • the chemical reagent used to form intramolecular crosslinks in a protein preferably will react with at least one predicted residue in the protein, e.g., at least one end of the chemical crosslinking residue will bind to a predicted site on the protein, such as any two e-amino groups within a lysine within the protein.
  • the chemical reagent used for crosslinking the protein will react with two predicted functional sites on a protein, e.g., the crosslinking reagent will crosslink any two lysine residues in the protein.
  • An aspect of the invention is a method of analyzing molecules such as proteins in a manner which results in obtaining information regarding the three-dimensional (tertiary) structure of the protein.
  • Some proteins cannot be crystallized and so cannot be analyzed by X-ray crystallography.
  • Membrane proteins are examples of proteins that are difficult or impossible to crystallize. Many proteins are not soluble enough to use NMR.
  • the current invention is applicable to essentially all proteins.
  • a system for determining information the structural details of a molecule including a mass spectrometer, and a computational system that accepts mass information from the mass spectrometer and outputs structural details of the molecule by processing that information.
  • the system can provide structural details of polypeptides, nucleic acids and other macromolecules.
  • the molecule has at least one distance constraint placed on it, in the case of a polypeptide, often a crosslinker such as BS3 (Bis[sulfosuccinimidyl] suberate).
  • the number of constraints imposed on the polypeptide can be less than about 20% of the number of amino acid residues.
  • the system also carries out constrained threading and homology modeling in order to output a three-dimensional structure of the macromolecule.
  • just the computational system for carrying out these same procedures and outputting structural details of the molecule is provided.
  • the computational system accepts information from another source, such as a mass spectrometer, in order to do this.
  • a computer-implemented method for scoring candidates of a molecule including the steps of accepting mass information, generating or storing expected fragments of the molecule, matching the mass information to the expected fragments, and scoring the candidates.
  • the system can provide structural details of polypeptides, nucleic acids and other macromolecules.
  • the molecule has at least one distance constraint placed on it, in the case of a polypeptide, often a crosslinker such as BS3 (Bis[sulfosuccinimidyl] suberate).
  • the number of constraints imposed on the polypeptide can be less than about 20% of the number of amino acid residues.
  • a computer-program product is provided for carrying out these scoring procedure.
  • amino acids of the protein such are crosslinked using a detectably labeled crosslinking agent.
  • Another advantage of the invention is that the protein to be studied does not need to be as pure as for NMR or X-ray crystallography.
  • Yet another advantage is that less protein is needed for analysis than for analysis using NMR or X-ray crystallography.
  • Figure 1 is a flow chart illustrating the integral steps of the methods of the invention.
  • Figure 2 is a high-level flowchart of the computational processes that are used in the mvention.
  • Figure 3 is a flowchart illustrating how the present invention is used as part of a larger genomic or proteonomic investigation.
  • Figure 4 is a schematic depiction of the formation of physical distance constraints in a protein using chemical crosslinking.
  • Figure 5 illustrates the potential outcomes of the crosslinking reaction.
  • Figure 6 is a flowchart illustrating a computational process for generating distance constraint information.
  • Figure 7 is a schematic illustration of a binned list of calculated protein fragments as used in the flow-chart of Figure 6.
  • Figure 8 is a schematic illustration of a user report of the results from the computational process of Figure 6.
  • Figure 9 is a schematic illustration of the mass spectrometer and computational system apparatus of the current invention.
  • Figure 10 is a line graph showing the difference between monomer and dimer crosslinked molecules in the elution of a size selection chromatography procedure.
  • Figure 11 is a schematic depiction of the proteolysis of the crosslinked protein.
  • Figure 12 is an illustration of mass spectrometric analysis of the peptide fragments present following proteolysis of the crosslinked protein.
  • Figure 13 is a flow-chart illustrates the computational threading process for generating and ranking structures.
  • Figure 14A is a schematic illustration of a protein structure with a gap that must be accounted for with homology modeling.
  • Figure 14B is a schematic illustration of a protein structure with extra amino acid residues that must be accounted for homology modeling.
  • Figure 15 is a schematic diagram of the steps for integrating information in order to model the three-dimensional protein structure.
  • Figure 16 is an HPLC chromatogram of a tryptic digest of BS3 crosslinked FGF.
  • Figure 17 is a MALDI-TOF spectrum of an HPLC fraction form the tryptic digest of BS3 crosslinked FGF-2.
  • Figure 20 (part a) shows a threading alignment of interleukin-l ⁇ (IL-l ⁇ ) and FGF-2 (FGF2) used for homology modeling. Insertions are indicated by dashes. The bars above and below the alignment show the beta strand positions in interleukin-l ⁇ (above) and FGF-2 (below) as defined in the PDB structure files. The sequence alignment identity is 12.7%).
  • Figure 18 (part b) also shows DALI structural alignment of IL-l ⁇ and FGF-2. The structural root-mean-square deviation (RMSD) of the DALI alignment is 2.1 A over 101 residues.
  • RMSD structural root-mean-square deviation
  • Figure 21 illustrates the structural alignment of the FGF-2 homology model to FGF-2 (4FGF).
  • Figures 22 A and 22B illustrate a computer system suitable for implementing embodiments of the present invention.
  • crosslinker refers to any reagent that chemically links amino acids in a protein that are in sufficient proximity to allow reaction between reactive sites on two or possibly more amino acids.
  • the crosslinker has the ability to react with reactive functional groups on a protein that are within a maximum distance for that particular crosslinker, wherein the reactive groups (X and or Y) are designed to react in a specific or general manner with various functional groups present on the amino acid side chains.
  • a bifunctional crosslinker can be homobifunctional (X-X or Y-Y) where the reactive groups are the same, or heterobifunctional where the reactive groups are different (X-Y).
  • amine-specific homobifunctional linkers examples include BS3 (Bis[sulfosuccinimidyl] suberate) and sulfo-DSP (Dithiobis(succinimidylpropionate]).
  • the functional groups X and/or Y may be any functional site on an amino acid that will chemically react with a crosslinker, e.g., an ⁇ -amine or methylene.
  • An example of an amine-specific and methylene-specific heterobifunctional macromolecule is SAND (Sulfosuccininzidyl 2-[ozido-o-nitro- benzamido]ethyl-l,3'dithiopropionate) which has an arylazide, a photoactivatable group specific for insertion into C-H bond as the second and orthogonal reactive sites.
  • SAND Sulfosuccininzidyl 2-[ozido-o-nitro- benzamido]ethyl-l,3'dithiopropionate
  • trifunctional crosslinker refers to any macromolecule that, in addition to containing two amino acid-specific reactive groups, also contains a third group, for example, an affinity group that allows for ease of purification of the linked peptides.
  • Sulfo-SBED Sulfosuccinimidyl [2-o-(biotinamido)-2-(p-azidobenzamido)-hexamido] ethyl 1,3'- dithiopropionate
  • Sulfo-SBED Sulfosuccinimidyl [2-o-(biotinamido)-2-(p-azidobenzamido)-hexamido] ethyl 1,3'- dithiopropionate
  • biotin group which can be affinity selected using avidin.
  • on-line chromatography-mass spectrometry refers to a method by which a chromatography effluent flows into a mass spectrometer. The effluent may directly flow into the spectrometer, or alternatively may flow through other detection means prior to entering the mass spectrometer.
  • off-line chromatography refers to a method by which chromatography is performed, fractions are separated, and subsequently analyzed.
  • low resolution refers to resolution of structures above about 5 A.
  • Moderate resolution refers to structures between about 2-5 A.
  • the invention herein can provide resolution of structures between about 2-5 A, and more usually about 3-5 A.
  • the present invention is based on the finding that the integrated technique of determining physical distance constraints and analysis of the constraint information can reliably yield sufficient amino acid proximity information to allow the determination of the structural aspects of a macromolecule to a level of resolution between about 3 A to about 5 A, and more particularly between about 3.5 A to 4.5 A.
  • the technique is described herein in terms of determining spatial geometry of a protein. This technique may be used to determine structural aspects of other macromolecules as well, e.g., structural relationships of RNA, DNA and/or the relationship of interactions of these molecules with proteins (e.g., regulatory binding) and the methods of the invention are not meant to be limited to determining protein structure. Accordingly, although the following disclosure is directed to using the methods of the present invention to determine the tertiary structure of a protein, it is understood that the same general concepts are applicable to identifying structures of a wide range of different types of macromolecules.
  • Figure 1 illustrates the steps in one embodiment of the method of the present invention.
  • the first step in the method of the invention involves identification of spatial constraints of a protein using chemical or physical means.
  • One embodiment of the invention utilizes chemical crosslinking agents to determine limits on the spatial relationship of residues in a protein. Since only residues having functional groups compatible with the crosslinker and having proximity to allow chemical reaction will actually crosslink, identification of crosslinked residues can be used to determine the geometric constraints on the conformation of the protein. Multiple crosslinkers with different spatial constraints and/or functional group specificities may be used in the determination of a protein structure.
  • the second step in the method of the mvention is enrichment of the crosslinked reaction pool for intra-molecular crosslinked proteins.
  • the reaction pool is enriched for proteins with intrapeptide crosslinks, and preferably the molecules with interpeptide bonds are removed completely, e.g., by a size separation technique.
  • the third step in the method of the invention is proteolysis of the enriched reaction pool.
  • the crosslinked peptides are subject to proteolysis with a proteolytic enzyme that reacts with at a known cleavage site, e.g., trypsin.
  • the crosslinked fragments will remain connected following proteolysis, and since the number of peptide fragments can be predicted for the protein before it is crosslinked, determination of the sizes of fragments produced after proteolysis of the crosslinked protein will allow identification of the residues that react with a certain-sized chemical crosslinking reagent.
  • the fourth step in the method of the invention is the analysis of the peptide fragments produced by proteolysis. In one embodiment of the invention, mass spectrometry (MS) techniques are used to identify the crosslinked fragments.
  • MS mass spectrometry
  • MS time- of- flight
  • MS/MS tandem mass spectrometry
  • the final step of the methods of the invention involves protein modeling, and particularly modeling using spatial geometry software.
  • the high sensitivity and mass range of more modern mass spectrometry methods used in conjunction with protein modeling techniques e.g., homology modeling, allow domain-mapping and the construction of moderate-resolution structures, i.e. structures between about 3 A to 5 about A. Integration and interpretation of this data can determine the structural conformation of the protein, and thus is indicative of the tertiary structure of the protein.
  • the structural questions that can be addressed by intramolecular crosslinking are not restricted to fold recognition. In the limit of few constraints, domain-domain placement can be done with ⁇ 3 constraints per domain pair. Rossi et al.
  • the method of the invention has several advantages that give the method significant utility, especially in light of limitations in other techniques for determining protein structure available in the art.
  • This method generates the first reliable modest resolution (3 to 5 A) structure that can, in principle, be used as a starting point to refine X- ray crystallography and NMR data, saving considerable of time and effort.
  • the method of the invention is relatively fast to employ, and so is particularly useful in analyzing large numbers of peptides quickly.
  • the experimental protocol is fully automatable and is thus amenable to a high-throughput approach.
  • the present invention is particularly suited to analyzing the results of genomic and proteomic studies. Intramolecular crosslinking is enhanced under conditions of very low protein concentrations, so only a small amount of protein is required.
  • Protein purity is less critical for the methods of the present invention than for other techniques, such as NMR or X-ray crystallography, as only peaks consistent with crosslinked peptides, based on molecular weight and sequence information, are of interest.
  • the invention is applicable to obtaining tertiary structure in a relatively short period of time (ranging on average from one day to at the most several weeks) with a protein.
  • the methods described herein can be used with arbitrary protein mixtures; such as, in one specific example, protein samples of only moderate purity (e.g., from greater than about 60%> to greater than about 80%o purity), as would be expected from a typical in vitro His-tagged protein expression system followed by simple one or two step purification.
  • Step 203 involves the assignment of peptide fragment sequences to observed mass spectrometry peaks from the proteolyzed protein to generate distance constraint information by identifying protein fragments containing cross-linked residues.
  • Step 205 involves the generation of a ranked list of candidate secondary structures by a threading approach.
  • Step 207 is a re-ranking of those candidate structures based on their compatibility with physical constraint criteria such as, but not limited to, (i) hydrophobic interactions between residues or (ii) the distance constraint information of step 203.
  • Step 209 is application of homology modeling to the top candidate or candidates determined in step 207 to obtain a further refinement of the structure by positionally matching residues of the protein in question with residues of the top candidate.
  • Figure 3 illustrates how the present invention can be used as part of a larger genomic or proteonomic investigation for identifying, designing and/or analyzing proteins, particularly enzymes, or macromolecules that bind with such proteins.
  • Three-dimensional protein structures may be generated in various manners. The two paths on the right side of Figure 3 represent conventional techniques for analyzing proteins and generating protein structures from genomic data.
  • researchers typically begin by identifying or designing a gene/polynuclotide sequence. See 301.
  • a virtual protein would then be generated from the primary nucleotide/amino acid sequence.
  • 303 Various well-known processes can then be used to predict the 3-D structure of the resulting protein. See 305.
  • Such processes may be done entirely in silico starting only with the primary sequence of the protein, i.e., without using supplemental experimental data. At this point in time, entirely in silico techniques work well for predicting a protein's secondary structure, but are inadequate for predicting higher resolution features beyond the secondary structure. In any event, the predicted three- dimensional structure of the protein is then sometimes used to perform virtual experiments, such as virtual docking with ligands of interest. See 307. Such docking is only as useful as the protein structure is accurate. Similar docking experiments can be done with structures derived from the other two flow-chart paths.
  • the present invention allows for the use of very limited empirical information in the form of cross-linking residues to obtain a very good prediction (within 2 to 5 A RMS) of the actual 3-D structure of a protein. It has been found that a surprisingly small number of cross-links, typically about 10%o of the number of amino acid residues, is adequate for purposes of the invention. See 315. This constraint information is then used to determine general structural features of the protein, 317, which is used to validate or improve 3-D structures that were determined entirely in silico or via NMR or X-ray crystallographic experiments.
  • PHYSICAL DISTANCE CONSTRAINT DETERMINATION Numerous techniques for determining physical distance constraints between residues in a protein may be employed, including fluorescence resonant energy transfer and spin-labeling techniques. In a preferred embodiment, distance constraints are determined by crosslinking the protein and then using mass spectroscopy to identify linked fragments.
  • Figure 4 is a schematic illustration of such a crosslinked protein.
  • the crosslinker region can be a simple alkyl chain, and the length of the crosslinker can be varied, e.g., by varying the ethylene group.
  • the crosslinker region may be short or long, and may define a more exact proximity (e.g., binding of a reagent with a rigid crosslinker region) or define an outer boundary for binding proximity (e.g., binding of a reagent with a flexible crosslinker region).
  • the crosslinker can also be chemically modified to change other properties, e.g., a hydroxyl group can be added to make the crosslinker more hydrophilic or an aromatic group can be added to make the crosslinker more rigid.
  • Many different linkers can be used in the methods of the invention, including bifunctional and trifunctional chemical crosslinkers. For any crosslinker, at least one, and preferably both, of the possible reactive sites are known.
  • the reactive groups can be considered orthogonal or non-orthogonal relative to their reactivity.
  • a more diverse set of amino acid functionalities can be targeted by a library of crosslinking agents with spacer arms of differing lengths and flexibilities. More rigid or shorter spacers narrow the range of possible distances between crosslinked residues, thereby providing more discrimination in fold recognition.
  • experiments performed with a library of crosslinkers can be used to improve the overall precision of the constraints. By providing more distance constraints for conformation analysis, the number and precision of the experimentally-derived constraints define the types of structural questions that can be answered.
  • the crosslinking reagents used may be chosen using various factors known to those of skill in the protein and peptide chemistry arts, including predicted structural motifs in the protein, e.g., motifs that may be predicted from the primary sequence of the protein. If certain structural aspects of a protein are to be identified, e.g., screening of multiple proteins to identify specific domains and/or folds, then the crosslinking reagents may be selected based on their efficacy in identifying certain domains and or folds.
  • crosslinkers with varying lengths, rigidity, specificity and the like can be employed, as will be apparent to one skilled in the art upon reading the present disclosure.
  • a series of homobifunctional reagents of variable lengths and/or specificity can be created to provide crosslinkers with appropriate lengths and/or chemical compositions suitable for study of a specific protein.
  • the study of a particular protein can be undertaken using a series of homobifunctional reagents of variable lengths with amine specificity.
  • Crosslinkers homologous to the crosslinker BS3 (Bis[sulfosuccinimidyl] suberate), which has 6 methylenes, can be produced, e.g., crosslinkers with lengths of 2 and 4 methylenes, to provide a series of amine-specific crosslinkers with varying lengths. Combining data obtained using the various crosslinkers can provide a more detailed analysis of the spatial constraints o. a protein.
  • Exemplary crosslinking reagents for use in the methods of the invention are EDC (l-Ethyl-3-[3-dimethylamniopropyl]-carbodiimide hydrochloride); DSP (Dithiobis [succinimidylpropionate]) also known as Lomant's reagent; BS3 (Bis[sulfosuccinimidyl] suberate) and DSS (Disuccinimidyl suberate). DSP and DSS are both homobifunctional, amine reactive agents differing only on the fact that the disulfide bond in DSP allows for it to be cleaved whereas DSS is non-cleavable.
  • BS3 is a water soluble analogue of DSS that is membrane impermeable.
  • EDC is versatile, being water soluble and capable of converting carboxyl groups (either Asp or Glu residues present in the target protein or carboxyhc groups on the crosslinker) to their active esters and allowing for nucleophilic attack by amine-containing molecules (protein or crosslinker) to form a stable amide crosslinks.
  • carboxyl groups either Asp or Glu residues present in the target protein or carboxyhc groups on the crosslinker
  • the selected crosslinker is added to the protein solution and allowed to react under conditions effective to allow crosslinking.
  • the conditions e.g., buffer, relative concentrations of protein and crosslinker, pH, temperature, time, and the like, are selected to be suitable for forming a covalent bond with its target functional amino acid groups, as can be predicted by one skilled in the art.
  • a homobifunctional crosslinker or homotrifunctional crosslinker, X-z-Y
  • both groups would be allowed to react and some percentage of the crosslinkers would form crosslinks between two spatially distinct amino acids on the same protein (intramolecular crosslink) or between two separated protein molecules (intermolecular crosslink).
  • a second set of conditions would be subsequently employed for the orthogonal group to react with its target sites.
  • target sites e.g., light
  • a photoactivatable groups such as an arylazide
  • a change in pH in the case of a sulfyldryl-selective group.
  • BS3 and DST were found to react well under the following generic reaction conditions with FGF-2: 25°C, 2 hours, 5 ⁇ M protein with a 20:1 molar ratio of crosslinker to protein in 100 mM Hepes buffer, pH 7.5.
  • Crosslinking reactions with HIV- 1 integrase which can be unstable at certain temperatures, was accomplished using a reduction in the reaction temperature with an increase in the overall reaction time (0°C, at 40 hours).
  • a Lys-Cys heterobifunctional crosslinker such as sulfo-EMCS (N-[e - Maleimidocaproloxy] sulfosuccinimide ester) or sulfo-GMBS (N-[ ⁇ -
  • Maleimidobutyryloxy]sulfo-succinimide ester react with Lys through NHS-ester and Cys through maleimide functional groups.
  • the maleimide group is most selective for sulfhydryl when the pH is between 6.5 to 7.5, and above this pH, the reaction with primary amines become more significant.
  • This reaction can be carried out in one step at pH 7.0 for the NHS-ester and maleimide groups to react at the same time; or it can be separated in two steps, one at pH 6.5 for maleimide group and then a second step at pH 7.5 for NHS-ester.
  • the resulting products will contain a mixture of proteins containing the following outcomes: 1) a crosslinker covalently attached to the protein at only one end (a dead-end crosslinker case, little useful information regarding distances), 2) a crosslink involving two spatially distinct sites attached to a single protein (protein monomer with two covalently linked sites, the desired outcome), and/or 3) a crosslinker joining two separate protein molecules (inter-protein crosslinking, generally not desired unless protein-protein interactions are being investigated). See Figure 5.
  • Figure 6 describes one suitable computational process for generating distance constraint information. See 601. The process begins at 603, with the computational system generating many of the expected fragments, given the particular cross-linker(s) and protease(s) used on the protein.
  • the system usually requires at least the following inputs: a primary sequence, identification of a protease, and identification of a cross-linking agent. For example, if the protein was treated with trypsin, which C- terminally cleaves lysine and arginine, then all the potential fragments generated from the primary sequence with these cleavage products are considered. In addition, some of the protein fragments will have the cross-linking agent attached to them, so these modified fragments may be listed as well. For instance, if the cross-linker BS3 is used (which bonds to lysine), then some additional potential fragments having lysine residues and bound BS3 may be listed with the mass of the cross-linker added.
  • the list of expected fragments does not include many or any fragments that contain two or more peptide backbones linked by one or more cross-linking agent. Such species may be accounted for later in the process.
  • the bound cross-linking agent will have one free (unbound) terminus.
  • various sub-species may be present depending upon the chemical state of the agent's terminus. For example, the same protein fragment may be listed with the following molecular weight variations: fragment with the entire linker attached (linker plus leaving group) and the fragment with hydrolyzed linker arm attached (usually hydrolyzed).
  • Figure 7, 701 is a schematic illustration of a binned list of calculated protein fragments, including several individual bins. See 702 for example. The individual mass species within a bin are represented as a linked list. See 703. The mass spectrometry data obtained from the actual proteolyzed protein can then be matched against the binned collection of expected species.
  • the actual mass spectrometry data may be analyzed.
  • the system considers each MS peak generated from an analysis of the proteolyzed protein. In process 601, this is represented as operations 605 and 606, where the system sets a variable N equal to the number of peaks to be considered (605) and iterates over those various peaks (606). Iterative loop operation 606 initially sets an index value "i" equal to 1. It then determines whether the current value of i is greater than the value of N. If not, it performs various operations to identify the chemical structure of the species that created the peak.
  • a control spectrum or spectra are subtracted from the MS data before the process of Figure 6 is carried out. Because only the protein fragments with linked residues (even if the linked is attached to nothing at the other end) are generally of interest, it is helpful to subtract the MS data corresponding to the residues with cross-linker in this manner.
  • the list of mass species can be partially built up using an already existing library of peptides, thus simplifying the task of generating the list.
  • the observed mass of that peak will be truncated and matched to its corresponding bin. See 607.
  • the system will then traverse the list of mass species in that bin, and calculate a parts per million (PPM) error for each. See 609.
  • the program will then output all the fragments that fall within a chosen allowable PPM range of the calculated mass species. See 611. Note that one input to the system may be a user-adjustable PPM error window.
  • the process has not yet accounted for mass species that correspond to fragment- to-fragment cross-links.
  • the program does not store all these combinations, but instead searches the list of individual protein fragments and determines if two fragments, linked together, match an observed MS peak. See 613.
  • This process is as follows. The process will search for combinations for each MS peak in an iterative process much like steps 605 and 606. Since it can be assumed that there is a linker in the combination, the process will take the weight of the MS peak and subtract the linker weight. It will then go to the lowest occupied molecular weight bin.
  • the system again determines whether the current value of i exceeds the value of N. Assuming that there are more peaks to consider, the next peak (i) is selected and process control returns to operations 607, 609, 611, 613, and 615, which are performed as described above, but with reference to the new peak (i).
  • a typical readable output format lists the mass of the MS peak, the protein fragment or combination of fragments it conesponds to, the number of times the peak was observed over the series of MS scans, the PPM error, and the positions of cross-link attachment.
  • Mass redundancies that is, the MS peaks that are found to correspond to more than mass species, have been found to be fairly rare, particularly if an allowable PPM error of about 5 is chosen. These can be resolved after the computer program has output the final results.
  • the results are used as distance constraint information to re-rank the candidate structures (see operation 207 of Figure 2 and Figure 13).
  • the software has been written to accommodate other cross-linkers, in addition to BS3, and it should be understood that the software of this invention can work with other proteolytic and cross-linking reagents. It can be extended to handle embodiments where more than one cross-linker or protease has been used, and where multiple digests, each with different cross-linkers and proteases, has been carried out.
  • FIG. 9 An apparatus overview of this mass spectrometer and computational system is illustrated in Figure 9, 901.
  • Cross-linked molecule fragments are an input to the mass spectrometer 903, in this embodiment, protein fragments.
  • the mass spectrometer outputs M Z (mass over charge) for each fragment, which is fed into the computational system 905, along with the primary sequence of the molecule.
  • the computational system then outputs 3-D structural details of the molecule.
  • reaction conditions of the crosslinking reaction(s) can be chosen to provide reaction products having a degree of single site and/or intermolecular crosslinking sufficiently low that enrichment following the crosslinking reactions is unnecessary.
  • an initial mass spectrometric analysis of the protein products is optionally carried out to determine the overall reaction stoichiometry.
  • the shift in mass of the unmodified protein (M) to modified protein (M 1 ) will give the average number of crosslinker modifications, since the expected mass of the crosslinker modification is known.
  • Both the absolute and relative concentrations of the crosslinker and protein are important parameters in the experimental design. Ideally, one would like the average total number of covalent modifications of the protein made by the crosslinker to be fewer than one crosslink per protein (i.e., number of crosslinker modification/protein •» 1) to avoid significant perturbation of the protein tertiary structure that could generate false distance constraints).
  • the crosslinking reaction can produce more single-site dead end modifications to the protein than the desired two-site intra-protein crosslinks. However, it is likely that simple single-site modifications have considerably smaller perturbation on the overall structure than a two-site crosslink.
  • Lys-Lys specific crosslinker e.g., BS3
  • the mass spectrometer has a resolving power capable of resolving the mass difference between these two reaction possibilities, e.g., ⁇ 0J%>.
  • the mass of a singly labeled site (Lys-labeled) with the second end hydrolyzed by water would be 20,156. This mass shift is 18 Da higher in mass than if a two-site reaction (Lys to Lys) has occurred with the protein.
  • M 20,138.
  • the crosslinking agent can potentially form covalent bonds with amino acid residues of two (or more) different proteins
  • size-exclusion chromatography or other separation techniques can be employed (either under denaturing or non-denaturing conditions) to isolate intracrosslinked proteins from proteins having inter-protein crosslinks.
  • the crosslinked dimers can be removed using BioRad BIOSELECTTM columns. Under non-denaturing condition (lOOmM NH 4 HCO 3 , pH 7.0) using BioRad BioSelect 125-5 columns (300 x 7.8 mm, each), two peaks are generally observed, an early eluting peak containing protein dimers and a later eluting peak containing monomers (Figure 10).
  • the fraction containing the protein monomers can be further separated under denaturing conditions (8M urea, 100 mM citrate buffer, pH 5) using a TosoHaas G2000 column into two sub-components; an early eluting peak containing primarily protein monomer with dead-end or singly-labeled linkers (no actual crosslinks, just surface labeling) and a later eluting peak containing primarily monomers with actual intramolecular crosslinked amino acids.
  • separating the dimer from monomer can be achieved by SDS PAGE. Then individual protein gel bands can be excised and the protein can be electro-eluted.
  • FRAGMENTATION AND SIZE SEPARATION Following crosslinking, the protein of interest is fragmented into peptides by digestion and the peptide products are subsequently separated, e.g., by reverse-phase chromatography (see Figure 11).
  • Proteolytic enzymes for fragmentation in the method of the invention possess the activity used to cleave the crosslinked protein into smaller, more manageable pieces. This may be any enzyme or chemical activity known in the art which is capable of repeatedly and accurately cleaving a protein at particular cleavage sites during digestion. Suitable activities are widely known and a suitable activity may be selected using conventional practices.
  • Examples of such enzyme or chemical activities would include, as representative examples: the enzyme trypsin which hydrolyzes peptide bonds on the carboxyl side of lysine and arginine; the enzyme chymotrypsin which hydrolyzes peptide bonds on the carboxyl side of aromatic residues (phenylalanine, tyrosine, and tryptophan); cyanogen bromide (CNBr) which chemically cleaves proteins at methionine residues; endoproteinase Glu-C which hydrolyzes highly specifically peptide bonds at the carboxyhc side of Glu (in ammonium bicarbonate, pH 7.8 or ammonium acetate buffer, pH 4.0) or Glu and Asp (in phosphate buffer, pH7.8); and endoproteinase Asp-N, which hydrolyzes peptide bonds at the amino side of Asp and Cysteic acid.
  • trypsin which hydrolyzes peptide bonds on the carboxyl side of lysine and
  • proteases can also be used in order to obtain manageable peptides, such as: Thermolysin, which hydrolyzes peptide bonds involving the amino group of hydrophobic amino acids with bulky side chains like Leu, He, Met, Phe, Trp and Val; and pepsin, which cleaves proteins preferentially at peptide bonds involving the carboxyhc groups of aromatic amino acids and other hydrophobic amino acids (Phe and Leu).
  • the enzyme trypsin is often a preferred enzyme activity for cleaving proteins into smaller, more manageable pieces because trypsin is characterized by low cost and highly reproducible and accurate cleavage sites at the amino acids arginine and lysine occurring in the amino acid sequence of protein molecules.
  • Typical reactions conditions used to generate the final peptide mixtures from the labeled protein using trypsin are 50 mM NH HCO 3 , pH 9, 20: 1 weight ratio of trypsin to protein, and a 2 hour incubation at 37°C.
  • a combination of the proteases and chemical reagents can also be applied to the crosslinked proteins to generate a peptide mixture.
  • in-gel digest of protein by proteases can be used and the resulting peptides can be extracted from the gel slice.
  • the resulting peptide mixture will contain unlabeled and labeled peptides, where the labeled fraction is further divided into intermolecular, intramolecular or singly labeled crosslinks.
  • the fragments can be fractioned using any methodology known to one skilled in the art.
  • the peptides are fractioned using a chromatographic column.
  • the chromatographic column includes a chromatographic medium which, in cooperation with a suitable solvent system, is capable of chromatographically fractionating peptide digests following the digestion reaction.
  • the chromatographic column includes an inlet port for receiving the peptide digests and an exit port for discharging an effluent comprising the chromatographically fractionated peptide digests.
  • the chromatographic column is a reverse-phase HPLC analytical column comprising a fractionating medium capable of fractionating the peptide digests when the digests are eluted through chromatographic column using reverse phase HPLC techniques.
  • a fractionating medium capable of fractionating the peptide digests when the digests are eluted through chromatographic column using reverse phase HPLC techniques.
  • the chromatographic medium is hydrophobic because the peptide digests themselves tend to by hydrophobic in nature.
  • An exemplary HPLC analytical column suitable for use in the practice of the present invention is commercially available as the Vydac TM C-l 8 HPLC column from the Separations Group, Inc., of Hesperia, California.
  • the peptide fragments are then identified in order to assign crosslinks to specific peptide fragments within the protein structure. This may be done using various techniques, including Edman sequencing, chromatography, mass spectrometry, or a combination of these methods. Grant et al. Methods Enzvmol. 1997 289:395-419.
  • One method of the identification of the crosslinked peptides will involve either online chromatography-mass spectrometry or off-line chromatography followed by mass spectrometry.
  • the chromatography component consists of reversed-phase separation using C4, C8, C18 or similar separation schemes.
  • a gradient elution profile starting from 100%) aqueous to 70-100%) organic (e.g., acetonitrile or methanol) is employed and peptides are either collected in fractions off-line or eluted directly into the source of an appropriately configured mass spectrometer.
  • TFA trifluoroacetic acid
  • formic acid can be used instead of TFA.
  • an Eldex MicroPro HPLC can be used, and preferably is fitted with a Michrom MAGIC MS reverse-phase column (0.2 x 5 0 mm) operating at l ⁇ l/min.
  • an LC Packing Fusica II reverse-phase column (0J x 150 mm, 5mL/min) with a higher loading capacity can be used, depending on the amount of material one has and degree of peptide separation desired.
  • the peptides will be detected at 210 nm with an ABI 785A UV detector fitted with a LC Packings capillary Z-cell and either collected into Eppendorf tubes or directly onto plates for subsequent MS analysis.
  • MS instruments that are suitable for the detection of the crosslinked peptides, including but not limited to 1) matrix-assisted laser desorption ionization (MALDI) time-of-flight (TOF) instruments where individual HPLC fractions were first separated off-line, 2) an electrospray ionization (ESI) orthogonal-TOF mass spectrometer with on-line HPLC and/or a 3) ESI ion-trap instrument, also with on-line HPLC detection. Still other methods will be obvious to one skilled in the art upon reading the present disclosure.
  • MALDI matrix-assisted laser desorption ionization
  • TOF time-of-flight
  • mass accuracy There are several important considerations in this mass determination, including the overall mass accuracy, dynamic range of detection, and mass range.
  • a mass accuracy of better than 100 ppm is desired such that one is able to limit the possible interpretations as to the crosslinked peptide identity.
  • mass accuracies of up to or better than 10 ppm can be achieved on many MS instruments with proper internal calibration. This is highly desirable, as one can more readily assign peptide (and peptide crosslinks) based on this higher level of mass accuracy.
  • a tandem MS experiment can be carried out on selected peptide ions to provide additional fragmentation data ("sequence tags") which is in turn used to confirm peptide identity and/or assign the precise amino acid positions involved in the crosslink.
  • TOF mass spectrometry separates ions according to their mass-to-charge (m/z) ratio by measuring the time it takes generated ions to travel to a detector.
  • TOF mass spectrometers are advantageous in the present invention because they are relatively simple, inexpensive instruments with virtually unlimited mass-to-charge ratio range.
  • TOF mass spectrometers have potentially higher sensitivity than scanning instruments because they can record all the ions generated from each ionization event.
  • TOF mass spectrometers are particularly useful for measuring the mass-to-charge ratio of large organic molecules where conventional magnetic field mass spectrometers lack sensitivity.
  • Exemplary TOF mass spectrometers that may be used in the present invention are shown in U.S. Pat. Nos. 5,045,694, 5,160,840, and 5,627,369 specifically incorporated by reference herein.
  • the performance of a mass spectrometer is only partially defined by the mass resolution. Other important attributes are mass accuracy, sensitivity, signal-to-noise ratio, and dynamic range. The relative importance of the various factors defining overall performance depends primarily on the type of sample, but generally several parameters must be specified and simultaneously optimized to obtain satisfactory performance for a particular application. These parameters may be varied for optimal resolution in the method of the invention, which would be obvious to one skilled in the art upon reading the present disclosure.
  • MALDI Mass Spectrometry Matrix-assisted laser desorption ionization is particularly advantageous in biological applications, and thus for use in the methods of the invention, since it facilitates desorption and ionization of large biomolecules in excess of 100,000 Da molecular mass while keeping them intact.
  • the MALDI mass spectrometry technique is used.
  • the ions generally have a substantial average velocity after leaving the surface, which is the same to a large extent for ions of all masses, and a large spread around the average velocity.
  • the average velocity leads to a non-linear relationship between the flight time and root of the mass.
  • the spread leads to a low mass resolution and when measuring the signals of individual ion masses, however there are methods which improve mass resolution.
  • the relationship for conversion of flight time into mass is called "mass scale" here for the sake of simplicity.
  • Ion reflectors can be used to compensate for the effects of the initial kinetic energy distribution.
  • An ion reflector is positioned at the end of the free-flight region.
  • An ion reflector consists of one or more homogeneous, retarding, electrostatic fields. As the ions penetrate the reflector, with respect to the electrostatic fields, they are decelerated until the velocity component in the direction of the field becomes zero. Then, the ions reverse direction and are accelerated back through the reflector. The ions exit the reflector with energies identical to their incoming energy but with velocities in the opposite direction. Ions with larger energies penetrate the reflector more deeply and consequently will remain in the ion reflector for a longer time.
  • electrospray ionization For higher mass accuracy ( ⁇ 20-50 ppm) and on-line HPLC/MS analysis, so-called “electrospray ionization” (ESI) mass spectrometry is used in the methods of the invention.
  • electrospray ionization an electric potential is applied to a liquid containing the analyte(s), usually via a conductive capillary needle.
  • An analyte in solution is sprayed from a conducting needle with approximately a 75-100 ⁇ m inner diameter, at a high voltage, e.g., 3000V, towards a conducting aperture plate at a potential between ground and about 300 V leading to the input of the mass spectrometer.
  • a high voltage of the same magnitude but opposite polarity may be applied to the entrance aperture of the mass spectrometer. Ions are produced in the high electric field, and are then analyzed in a mass spectrometer.
  • ESI can convert analytes in solution, at ambient temperature and pressure, directly into gas-phase ions without excessive fragmentation.
  • ESI mass spectrometry is suitable for the analysis of nonvolatile compounds that are either polar or ionic.
  • An advantage of ESI over other soft-ionization techniques such as fast atom bombardment or thermospray is the formation of multiply charged species, making ESI well suited for the analysis of high molecular weight (up to 1,000,000 Da) biomolecules and polymers. See Fenn et al., "Electrospray Ionization-Principle and Practice ' Mass Spectrom. Rev., vol. 9, pp. 37-70 (1990). For general background on the mechanisms of electrospray, see P. Kebarle et al., Anal. Chem.65: 972A-986A (1993).
  • ESI-TOF is preferably carried out using a Mariner ESI-TOF mass spectrometer coupled to an Applied Biosystems 140B syringe pump HPLC system fitted with a capillary HPLC column (Fusica 200-300 ⁇ I.D. by 10- 15 cm; C18 or C4 Packings).
  • a gradient solvent consisting of 0. 1% formic acid in H20 (solvent A) and
  • STRUCTURAL MODELING The computational strategy used for structural modeling utilizes all experimental distance constraints between target amino acid pairs from the experimental peptide fragment data for the determination of fold- family, domain-domain geometries, and/or tertiary structures using a variety of computational approaches. In the limit of many constraints, structures could be generated directly using distance crosslinks. The same techniques can also be used to orient domains within a full-length structure, to determine the interactions between monomers within an oligomer, or to define a receptor-ligand complex. The combination of such analysis will generate a structural model of the tertiary structure of the protein. See Figure 12. Such analysis is preferably performed with the aid of spatial geometry software.
  • Structural modeling can be extended to the study of uncrosslinked, modified, or crosslinked nucleic acid sequences, peptide or peptoid sequences with unusual amino acids, oligosaccharides, or any other polymer of defined sequence.
  • the software can incorporate various different chemical or photochemical crosslinkers with known chemical end products, including data from: monovalent (affinity labeled) reagents, homobivalent crosslinkers, heterobivalent crosslinkers, and crosslinkers with a valency greater than 2.
  • a model's compatibility with constraints is a function of the constraint errors associated with the model and the number of constraints defined by the model, e.g those constraints linking residues in regions defined by the alignment, x-ray crystallography, or NMR.
  • Assessing a model based on its physical properties can involve: calculating the distribution of hydrophobic/hydrophilic amino acids; mapping its hydrogen-bond network; locating disulfide bridges; functional mapping of mutagenesis data; assessing the complementarity of the model's secondary structure and the secondary structures predicted for the sequence; insuring that critical electrostatic interactions are preserved; identifying sites of van der Waals clashes; evaluating the sequence-structure-sequence similarity, or any combination of the above.
  • Fold-family determination could therefore optionally include the generation of hypothetical structural models by threading the sequence of interest through a library of representative protein structures followed by the evaluation of models via the application of distance constraints obtained from the crosslinking data set. If a model is found with a low constraint violation, this model is considered to be a good candidate for further homology modeling studies.
  • the first step in the analysis is the generation of a set of structural models for a sequence of interest.
  • Structural models can be generated by threading a set of known protein structures and calculating de novo structures using either distance geometry or ab initio methods such as constrained energy minimization or molecular dynamics.
  • Structural models can also be generated by using secondary structure prediction methods, motifs in the sequence, homology modeling, or a combination of these and other techniques as apparent to one skilled in the art upon reading this disclosure.
  • Distance geometry programs are of particular use in the methods of the present invention.
  • Distance geometry is a general method for converting a set of (NxN)-N distance bounds into a set of 3xN Cartesian coordinates consistent with these bounds.
  • One such distance geometry program, DGEOM is a distance geometry program for molecular model-building and conformational analysis available from Chiron Corporation of Emeryville, California. Havel, et al. J Theor Biol. 104:359-81 (1983); Havel et al. J Theor Biol.
  • any of the many methods of model generation can be applied at this step in the over-all methodology.
  • the alignment methods described here are merely exemplary, and other methods may be used to deduce structures that are consistent with distance constraints.
  • Two strategies that are particularly useful in the methods of the present invention are constrained threading and constrained sequence/structure alignment.
  • Other possible methods include dynamic programming and clique detection.
  • the first step in the constrained threading procedure is to generate a set of structural models by threading a sequence through a database of sequence-unique protein folds.
  • Various software programs are available in the art to generate such structural models.
  • the specific program we used to generate these models for FGF-2 (FGF2-BOVIN) is the public-domain software 123D. Alexandrov et al. "Fast Protein Fold Recognition via Sequence to Structure Alignment and Contact Capacity Potentials.” Protein Science Bulletin. (1996).
  • This program involves entering the sequence of the protein, determining the alignment mode and allowing the software algorithm to generate the model. In global alignments all positions are considered. In free shift alignments gaps at the beginning or at the end are not scored. Local alignments are maximal common substring alignments.
  • the program will provide a given number of top scoring alignments.
  • a version of this program can be accessed on-line at the http site cartan.gmd.de.
  • Structural models considered by 123D to be the most complementary to the protein sequence, e.g., FGF-2 sequence, are then passed to the next step in our methodology, the model evaluation step.
  • the top 20 threading models can be further examined for their compatibility with the experimentally-derived constraints using the equation:
  • E is the total constraint e ⁇ or, i is the number of distance constraints, do is the pairwise distance separation, and rfy is the pairwise distance defined by the structure for the two residues in constraint /.
  • d j is the distance observed in the candidate threading model. If d j is less than or equal to the distance d 0 defined by the length of the linker arm, then there is no constraint error contributed by that constraint j. If rf, is greater than d 0 , then the constraint error is defined by the difference between these distances.
  • Information pertaining to each selected protein includes its primary sequence, as well as its secondary structure and the 3-D position of each residue.
  • the primary sequence of the protein being analyzed is then threaded through each selected protein structure. See 1303.
  • the backbone of the protein under consideration is laid on top of a backbone for the currently selected protein.
  • the selected protein is scored. See 1305. If the public domain software 123D is being used, for example, it creates a score based on (1) sequence identity between the two proteins, (2) alignment of secondary structures between the two proteins, and (3) a contact capacity potential of the protein in its threaded format.
  • the second scoring criterion involves approximating secondary structures of the protein based on the primary sequence.
  • the third scoring criterion is based on the how closely the local environment (neighboring amino acids) of an amino acid residue matches with its empirically-determined preferred environment. Other software programs and other scoring criteria (e.g., hydrophobicity, potential mean force) can be used. In a typical embodiment, the top twenty candidate structures are then used in the next step of the computational process.
  • the top candidates have their residues converted into 3-D coordinates by a computer program such as DGEOM, available from Chiron Corporation of Emeryville, California.
  • the distance constraint information is applied to each candidate structure according to the formula listed above. See 1309.
  • the candidate structures are then re- ranked according to their fit to the formula. See 1311.
  • constrained sequence/structure alignment employs the constraints to build a set of structural models, and the model evaluation stage consists of applying a pairwise hydrophobic contact potential to each model, and rank-ordering models based on this potential function.
  • the model evaluation stage consists of applying a pairwise hydrophobic contact potential to each model, and rank-ordering models based on this potential function.
  • alignments to the fold are defined by systematically matching residues of the target protein linked by a restraint to residues of the fold for which the interatomic distance of the alpha carbons is less than the extended crosslinker plus side chain atoms ( ⁇ 23.85 Angstroms in the case of the BS3 linkers).
  • the protein sequence can then be mapped onto the fold working back from the first-matched residue to the first residue of the sequence, or to the first of the fold, forward from-the first matched residue and back from the second in a symmetrical fashion, and forward from the second matched residue.
  • Figure 14A illustrates the protein being analyzed 1401, with a "gap" in its sequence 1403, as compared to the Brookhaven fold, 1405. Homology modeling software brings together the residues surrounding this gap in a manner that maintains the lowest energy configuration of the protein.
  • the software can generate a loop that also maintains the lowest energy confirmation.
  • Homology modeling software also will change the orientation of the residues and subgroups so as to minimize the energy conformation of the structure. Examples of homology modeling software that are used with the present invention are Sybyl, from Tripos, Inc. of St. Louis, MO, and Midas, from the Computer Graphics Laboratory of the University of California, San Francisco of San Francisco, California.
  • the model most complementary to the experimental constraints will be selected as a starting point for the construction of a homology model.
  • the threading alignment can be used to match amino acids in the sequence to positions in the structure. Other alignment protocols could be used as well.
  • the model can then be constructed using standard homology modeling techniques. Additionally, distance constraint violations within the model may assist in further refinement of the model. Refinement of the model could be done using distance geometry, energy minimization, and/or molecular dynamics.
  • the methods of the invention as described below were found to produce a moderate resolution structure (2-5 A) structure using far fewer physical constraint distances than had been predicted in the art, generally about 10%> of the number of amino acid residues in the protein. This unexpected and surprising result allows the methods of the invention to produce better resolution structures than would have been otherwise predicted. In addition, reasonable structures may be produced in a shorter amount of time than was predicted.
  • pairwise distance constraints required to construct the three-dimensional structure of a protein of interest was predicted prior to performing the intermolecular crosslinking technique. Seven different constraint types were applied to the calculation of the structures of 5 proteins using distance geometry: BPTI, alpha bungarotoxin, parvalbumin alpha, cyclophilin A, FGF-2. For each protein, an ensemble of 10 structures consistent with the constraints was generated.
  • the structures generated using exact interresidue crystallographic distances were of higher quality than those calculated from inexact distances.
  • the best quality structures, as measured by RMSD from the crystal structure, were those calculated using polar polar amino acid crystallographic distances, secondary structure-derived constraints, and disulfide bond information.
  • the structure of BPTI in particular was readily calculable with an
  • the structures calculated from inexact constraints also ranged in quality depending on the number of constraints. If, for each amino acid, all other amino acids could be classified as in contact ( ⁇ 10 A away) or not in contact (> 10 A away), the resulting DG-generated structures are on average less than 2 A RMSD from the crystallographic structure. This result is consistent with those of Havel et al., 1979.
  • FGF-2 FGF Model Studies The three-dimensional protein structure of FGF-2 was determined using the BS3 crosslinking reagents on FGF-2 followed by RPLC separation and MS analysis (both MALDI and ESI).
  • the half life of hydrolysis for BS3 is 4-5 hours at pH 7.0.
  • NHS-ester hydrolysis competes for the reaction with primary amines, and therefore the reaction products contain a mixture of a) one end of BS3 covalently linked to the protein, while the other end is hydrolyzed (a dead-end crosslinker), resulting a mass addition of 156.08 Da, and b) two lysines crosslinked with BS3 resulting in a mass addition of 138.08 Da.
  • the description of the reactions as described herein results in a ratio of crosslinked to modified peptides was approximately 1 :1.
  • FGF-2 protein obtained from an expression system and was dialyzed overnight at 4°C into a reaction buffer containing 100 mM Hepes pH 7.5, 1 M NaCl and 1 mM EDTA. DTT (10 mM final concentration) was added to the freshly prepared crosslinker and this solution was added to aliquots of the protein-containing reaction buffer.
  • the crosslinkers used were the homobifunctional crosslinkers Bis[sulfosuccinimidyl] suberate (BS3) and Disulfosuccinimidyl-tartarate (sulfo-DST)(Pierce, Rockford, IL), with a 20-fold molar excess of crosslinker (100 ⁇ M) to FGF-2 protein (5 ⁇ M).
  • the reaction was carried out at room temperature from 1-24 hours and quenched with IM Tris-HCl pH 8.0 to a final concentration of 10 mM.
  • the crosslinked FGF-2 was denatured by 8M urea and cysteine residues were protected by adding 50 mM IAM.
  • the modified FGF-2 was concentrated with Centriprep 10 filtration at 4°C prior to size-exclusion purification and proteolytic digestion.
  • Size-exclusion chromatography was employed to separate monomeric and dimeric forms of FGF-2 after the crosslinking reaction.
  • the chemical crosslinking reaction can theoretically result in both intramolecular crosslinking (two crosslinked amino acids on one protein) and intermolecular crosslinking (two protein molecules crosslinked to each other).
  • size exclusion chromatography was performed under denaturing conditions using a Gilson HPLC system equipped with a TosoHaas G2000 (2.0 x 60 cm). The column was equilibrated with 100 mM citrate buffer (pH 5.0), 8M urea and 1 mM DTT at a flow rate of 1 ml/min.
  • Pre-stained standard protein mixture was purchased from Biorad containing myosin (209 kDa), ⁇ -galactosidase (125 kDa), BSA (70 kDa), carbonic anhydrase (42.8 kDa), soybean trypsin inhibitor (32.6 kDa), lysozyme (17.6 kDa) and aprotinin (7.5 kDa) molecular weight proteins.
  • the chromatogram of the modified FGF-2 tryptic digest Figure 16 was significantly different than that of the unmodified FGF-2, suggesting the presence of modified peptides.
  • the labeled peaks were identified later by mass spectrometry to be crosslinked peptides.
  • the crosslinked peptides all came out in the later part of the gradient because the BS3 crosslinker arm is hydrophobic.
  • the identification of the crosslinked peptides involved either on-line LC/MS or off-line reversed phase capillary HPLC, in which case fractions were collected.
  • the mass of the crosslinked FGF-2 mixture was measured on a Voyager DE-STR MALDI-TOF instrument from Perseptive Biosystems, of Foster City, California. The instrument used a nitrogen laser (337 nm), delayed extraction optics and an acceleration voltage of 20 kV. In all cases, peptide fractions were mixed with 33 rnM ⁇ -cyano-4-hydroxycinnamic acid in acetonitrile/methanol (1/1; v/v) and air-dried on a gold-plated MALDI target.
  • PSD Post source decay
  • the complete PSD spectrum was produced by stitching the individual focused segments together. Mass calibration in PSD mode was performed using the fragment ions from a standard peptide, ACTH 18-39. A broad peak was observed with an average mass shift (compared to the unmodified FGF-2) of around 250 Da.
  • FIG. 17 shows a MALDI-TOF spectrum of one of the fractions from the tryptic digest. Each spectrum was calibrated with a close approximate external standard. A mass list was generated for each spectrum and the mass assignments were done using the in-house software ASAP, as described above in the computational features of the invention. Briefly, this program can identify crosslinked protein fragments based on the predicted fragmentation of a protein with a specific enzyme.
  • Ion m/z 2465.31 was assigned as an inter-peptide crosslink.
  • PSD of the selected parent ion m z 2465.31 ( Figure 19b) showed the ammonium ions for P/R, K, H, R, F, and Y in the low molecular weight region, " " is used to represent peptide chain 23L-33R and " ⁇ " to represent peptide 45E-52K.
  • the most abundant fragment ion was m/z 696.4 matching both y6 ⁇ and y6 ⁇ .
  • the ion m z 1974J matched fragment b4 ⁇ .
  • Ninety percent of the fragments in the PSD spectrum were consistent with the assignment, thus confirming that peptides 23-33 and 45-52 were crosslinked at K26-K46.
  • ESI-TOF electrospray ionization time-of-flight
  • the peptides were separated by RP-HPLC and eluted directly into the source of mass spectrometer.
  • the ESI-TOF mass spectra were acquired using Mariner electrospray ionization time-of-flight mass spectrometer coupled to an Applied Biosystems 140B solvent delivering system with a Applied Biosystems 759A absorbence detector.
  • Solvent A contained 0J%> formic acid in H 2 O.
  • Solvent B contained 0.05%> formic acid in 5/2 of Ethanol Propanol. The gradient varied from 10%)-60%> B in 70 minutes.
  • a "constrained threading" approach was used for fold recognition.
  • the first step was to submit the bovine FGF-2 sequence (FGF2 BOVIN) to the threading program 123D for fold prediction.
  • the 123D program returned the top scoring 20 sequence-structure alignments found upon threading a database of 635 sequence-unique proteins. Hobohm et al., 1997.
  • the 20 best-scoring sequence-structure pairs found by the 123D threading algorithm for the FGF-2 sequence are listed in Table 2:
  • Each pair defines a structural model for the FGF-2 sequence.
  • Three ⁇ -trefoil proteins are in the top 20 sequence-structure pairs, ranked at positions 1 (FGF-2: 4FGF), 5 (IL-l ⁇ ), and 12 (hisactophilin: 1HCE).
  • the FGF-2 structure 4FGF shares greater than 98% identity with the recombinant sequence, which in part explains why it was ranked #1 by the threading algorithm.
  • the threading algorithm would mis-predict the fold family of FGF-2 to be that of D-UTPase, a ⁇ -clip protein.
  • E t is the total constraint enor, i is the number of distance constraints, do is the pairwise distance separation, and rfy is the pairwise distance defined by the structure for the two residues in constraint /.
  • rf is the distance observed in the candidate threading model. If rf, is less than or equal to the distance rf, defined by the length of the linker arm, then there is no constraint enor contributed by that constraint j. If rf, is greater than rf, then the constraint enor is defined by the difference between these distances.
  • a distance of 23.85 A is the theoretical maximum through-space distance which can be spanned by two lysines crosslinked by BS3.
  • Constraints in some cases could not be defined due to unresolved regions in the crystallographic structure or a gap in the sequence alignment. Only sequence-structure models which had >50% of the pairwise constraints were evaluated to avoid considering models with artificially low constraint enors. The top 20 threading models were ranked in order of increasing constraint enor (Table 3). Table 3: Top 20 Models Re -Ranked by Constraint Enor
  • the structure ranked #3 is a member of the lipocalin fold family, which shares many characteristics with the ⁇ -trefoil family.
  • the lipocalin family is characterized by a closed or open beta banel with a meander motif.
  • the ⁇ -trefoil fold family similarly contains a closed beta banel with a meander motif and a hairpin triplet.
  • the structure of fatty acid binding protein is an open 10-stranded beta banel with a beta-hairpin insertion and is alignable to FGF-2 with an RMSD of 3.6 A over 47 residues. Holm et al. "Protein Structure Comparison by Alignment of Distance Matrices.” J Mol Bio. 1993 233:123-38.
  • the other member of the lipocalin fold family, retinol binding protein (1HBQ) is ranked at position 8 and contains an 8-stranded closed beta banel.
  • a mass spectrum analysis program was developed to assist in the interpretation of our experimental data.
  • the program requires input in the form of: a SwissProt sequence, a mass/charge list, the crosslinker mass, the maximum allowed mass enor, a proteolytic enzyme, a mass type, a maximum charge state, and a minimum peak abundance.
  • a virtual proteolytic library of peptides is constructed based on the known protein sequence and proteolytic specificity. Each peptide in the library is indexed by either its monoisotopic or average mass. Amino acid modifications, intrapeptide labeling, and/or intrapeptide crosslinking are represented in the virtual library. For each unassigned mass, the program searches the virtual library for representatives with masses within the user-defined enor threshold.
  • the program combinatorially searches the library for crosslinkable peptide pairs with an additive mass within the enor threshold of the experimental mass. For each mass, ASAP lists the possible assignment(s) and the mass enor for each assignment relative to the theoretical mass.
  • the distance constraint information derived from the lysine-lysine crosslinks was selective for structures similar to that of FGF-2 present in a set of top-scoring threading models. Specifically, the structure of IL-1 ⁇ was the most compatible with the experimentally-derived distance constraints (ranked second to FGF-2) and shares the same fold as FGF-2 ( ⁇ -trefoil). The threading alignment of FGF-2 to the IL-1 ⁇ structure was then used as a starting point in the construction of a 4.8 A homology model of FGF-2.
  • Figure 20 shows a threading alignment of IL-1 ⁇ and FGF-2 used for homology modeling.
  • the threading alignment defined 119 amino acids in the homology model.
  • the total backbone RMSD of a model built based on this alignment is 8J6 A. If the poorly-aligned N-terminal region is removed from the alignment, the RMSD improves to 4J6 A over 98 amino acids.
  • Figure 21 illustrates the match between structure after homology modeling and the actual protein structure. This RMSD is equivalent to that expected for, on average, a 1 amino acid frameshift in the sequence alignment.
  • the model captures the salient features of the FGF-2 structure even though FGF-2 and IL-1 ⁇ share less than 20% sequence identity.
  • the beta strands at the core of the FGF-2 structure are positioned conectly.
  • the sequence alignment and modeling enors occur mainly in the loop regions, regions that are generally difficult to model accurately for sequences sharing limited homology.
  • Hubert et al. "Structural relationships of homologous proteins as a fundamental principle in homology modeling.” Proteins. 1993 17(2):138-51.
  • the N-terminal 20 amino acids are also poorly aligned by the threading algorithm.
  • the "conect" alignment as defined by a DALI structural alignment of the IL-l ⁇ and 4FGF structures (2.1 A RMSD over 101 residues), is substantially different in the N-terminal region. Holm et al., 199 Gaps left in the structure due to insertions of IL-l ⁇ relative to FGF-2 were closed with 100 steps of energy minimization using Tripos Sybyl 6.4. The root-mean-square deviation (RMSD) of the model to the crystal structure backbone was calculated by aligning equivalent residues in the model to those in the crystal structure. The lowest RMSD we could expect for our homology model conesponds to this structural alignment.
  • RMSD root-mean-square deviation
  • HIV-1 integrase is a 288 amino acid protein containing 3 structural domains: a zinc-finger N-terminal domain, the catalytic core, and a non-specific DNA binding C- terminal domain.
  • a zinc-finger N-terminal domain the catalytic core
  • a non-specific DNA binding C- terminal domain the core domain has been solved by X-ray crystallography (Dyda et al, 1994; Lodi et al., 1995; Cai et al., 1997; Goldgur et al., 1998), the full-length structure of HIV integrase has not been determined.
  • Intramolecular crosslinking with BS3 was applied to the full-length HIV- 1 integrase protein.
  • the protocol was the same as that used for FGF-2, e.g., crosslinking followed by size exclusion chromatography, proteolysis, and LC-MS.
  • the purpose of this experiment was not to determine the fold family of integrase, but rather to map the domain-domain interactions within the full-length structure. Theoretically, less than 9 inter-domain crosslinks are required
  • One crosslinking reaction generated 5 inter-domain crosslinks.
  • the crosslinked lysines were K34-K264, K42-K159, K42-KI86, K42-K236, and K186-K236.
  • Two crosslinks were N-terminal domain/core domain crosslinks, two were N-terminal domain/C-terminal domain crosslinks, and one was a core domain/C -terminal domain crosslink.
  • Each crosslink defined the upper limit on the distance between the two lysines involved in the linkage. Using the distance information derived from the 5 crosslinks, the structures of the 3 domains, and constraints bridging the gaps between domains, we were able to calculate a unique anangement for the three integrase domains using distance geometry.
  • CNase CMP-NeuAc synthetase
  • CNase catalyzes the reaction of CTP and sialic acid (or NeuAc) to form the nucleotide-sugar donor substrate, CMP-NeuAc, which in turn adds sialic acid onto terminal galactose residues in the lipooligosaccharides of infectious bacteria.
  • sialic acid or NeuAc
  • the addition of sialic acid is an important virulence mechanism in bacteria, and the CNase enzymes are potentially attractive targets for drug development.
  • the CNase molecule was also examined using BS3 as a crosslinker.
  • the crosslinked protein and further analysis identified six crosslinked peptides in a single BS3 experiment. (Table 4)
  • Table 4 Using these limited Lys-Lys distance constraints in conjunction with threading methods, we were not able to identify a unique fold family in the database, although ⁇ -banel proteins scored consistently high. Additional distance constraints using other homo- and heterobifuctional reagents are then used to identify not only the fold- family of CNase, but also a full tertiary structure in the 3-5 A enor range.
  • Figures 22A and 22B illustrate a computer system 2200 suitable for implementing embodiments of the present invention.
  • Figure 22A shows one possible physical form of the computer system.
  • the computer system may have many physical forms ranging from an integrated circuit, a printed circuit board and a small handheld device up to a huge super computer.
  • Computer system 2200 includes a monitor 2202, a display 2204, a housing 2206, a disk drive 2208, a keyboard 2210 and a mouse 2212.
  • Disk 2214 is a computer- readable medium used to transfer data to and from computer system 2200.
  • Figure 22B is an example of a block diagram for computer system 2200. Attached to system bus 2220 are a wide variety of subsystems. Processor(s) 2222 (also refened to as central processing units, or CPUs) are coupled to storage devices including memory 2224.
  • processor(s) 2222 also refened to as central processing units, or CPUs
  • storage devices including memory 2224.
  • Memory 2224 includes random access memory (RAM) and read-only memory (ROM). As is well known in the art, ROM acts to transfer data and instructions uni-directionally to the CPU and RAM is used typically to transfer data and instructions in a bi-directional manner.
  • RAM random access memory
  • ROM read-only memory
  • a fixed disk 2226 is also coupled bi-directionally to CPU 2222; it provides additional data storage capacity and may also include any of the computer-readable media described below.
  • Fixed disk 2226 may be used to store programs, data and the like and is typically a secondary storage medium (such as a hard disk) that is slower than primary storage. It will be appreciated that the information retained within fixed disk 2226, may, in appropriate cases, be incorporated in standard fashion as virtual memory in memory 2224.
  • Removable disk 2214 may take the form of any of the computer-readable media described below.
  • CPU 2222 is also coupled to a variety of input/output devices such as display 2204, keyboard 2210, mouse 2212 and speakers 2230.
  • an input output device may be any of: video displays, track balls, mice, keyboards, microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, biometrics readers, or other computers.
  • CPU 2222 optionally may be coupled to another computer or telecommunications network using network interface 2240. With such a network interface, it is contemplated that the CPU might receive information from the network, or might output information to the network in the course of performing the above- described method steps.
  • method embodiments of the present invention may execute solely upon CPU 2222 or may execute over a network such as the Internet in conjunction with a remote CPU that shares a portion of the processing.
  • embodiments of the present invention further relate to computer storage products with a computer-readable medium that have computer code thereon for performing various computer-implemented operations such as inputting assay data, rendering that data in color graded representations in a graphical user interface, and acting on user inputs to affect display parameters of the data.
  • the media and computer code may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well known and available to those having skill in the computer software arts.
  • Examples of computer-readable media include, but are not limited to: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROMs and holographic devices; magneto-optical media such as floptical disks; and hardware devices that are specially configured to store and execute program code, such as application-specific integrated circuits (ASICs), programmable logic devices (PLDs), ROM and RAM devices, and signal transmission media for delivering computer-readable instructions, such as local area networks, wide area networks, and the Internet.
  • Examples of computer code include machine code, such as produced by a compiler, and files containing higher level code that are executed by a computer using an interpreter.
  • the invention also pertains to carrier waves and transport media on which the data and instructions of this invention may be transmitted.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Hematology (AREA)
  • Urology & Nephrology (AREA)
  • Biomedical Technology (AREA)
  • Immunology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medicinal Chemistry (AREA)
  • Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • Food Science & Technology (AREA)
  • Cell Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Optics & Photonics (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne un procédé rapide et efficace qui permet de déterminer la conformation tridimensionnelle d'une protéine. Le procédé selon l'invention comprend les étapes suivantes: 1) on établit des limites de distance physique, par exemple en formant des réticulations chimiques intramoléculaires d'une taille connue entre des restes d'une protéine; 2) on enrichit le nombre de molécules ayant des réticulations chimiques intramoléculaires dans le groupe de réaction, par exemple au moyen d'une séparation fondée sur la taille pour éliminer les protéines ayant des liaisons intermoléculaires; 3) on expose le groupe de réaction enrichi à une protéase qui coupe la protéine au niveau de sites spécifiques pour produire des fragments peptidiques; 4) on mesure la taille des fragments peptidiques pour déterminer les sites de liaison ayant une certaine relation spatiale dans la protéine; et 5) on interprète les données produites pour déterminer le géométrie spatiale et la structure de la protéine sur la base de la relation spatiale déduite des sites de liaison. Ces information sont de préférence analysées à l'aide d'un système informatique, qui peut être utilisé pour générer et/ou analyser des limites de distance entre des acides aminés.
EP00937870A 1999-05-26 2000-05-26 Procede de determination de la forme tridimensionnelle d'une macromolecule Withdrawn EP1277050A2 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US13589199P 1999-05-26 1999-05-26
US135891P 1999-05-26
PCT/US2000/014667 WO2000072004A2 (fr) 1999-05-26 2000-05-26 Procede de determination de la forme tridimensionnelle d'une macromolecule

Publications (1)

Publication Number Publication Date
EP1277050A2 true EP1277050A2 (fr) 2003-01-22

Family

ID=22470229

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00937870A Withdrawn EP1277050A2 (fr) 1999-05-26 2000-05-26 Procede de determination de la forme tridimensionnelle d'une macromolecule

Country Status (4)

Country Link
EP (1) EP1277050A2 (fr)
JP (1) JP2003528288A (fr)
AU (1) AU5298900A (fr)
WO (1) WO2000072004A2 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107449483A (zh) * 2017-08-04 2017-12-08 赵德省 一种物质余量的提醒系统及方法

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ITRM20010088A1 (it) * 2001-02-21 2002-08-21 Idi Irccs Peptide in grado di inibire l'attivita' del fattore di crescita derivato dalle piastrine (pdgf-bb) e del fattore di crescita derivato dai fi
AU2003231525A1 (en) * 2002-04-26 2003-11-10 Ajinomoto Co., Inc. Method of analyzing protein structure, protein structure analyzer, program and recording medium
JP3743717B2 (ja) * 2002-06-25 2006-02-08 株式会社日立製作所 質量分析データの解析方法および質量分析データの解析装置および質量分析データの解析プログラムならびにソリューション提供システム
JP2006282929A (ja) * 2005-04-04 2006-10-19 Taiyo Nippon Sanso Corp 分子構造予測方法
JP4976384B2 (ja) * 2005-05-04 2012-07-18 アイトゲネーシッシュ テヒニッシュ ホッホシュレ チューリッヒ 質量分析法
US7943387B2 (en) * 2006-12-18 2011-05-17 Covalx Ag Direct mass spectrometric analysis of aggregates of therapeutic proteins
US8012695B2 (en) 2007-02-14 2011-09-06 Dana-Farber Cancer Institute, Inc. Methods and compositions relating to promoter regulation by MUC1 and KLF proteins
WO2008113758A1 (fr) * 2007-03-16 2008-09-25 Covalx Ag Analyse par spectrométrie de masse directe de médicaments candidats ciblant des complexes protéiques
PL2352508T3 (pl) 2008-10-17 2014-09-30 Dana Farber Cancer Inst Inc Peptydy domeny cytoplazmatycznej MUC-1 jako inhibitory nowotworu
JP2011152094A (ja) * 2010-01-28 2011-08-11 Sony Corp 耐熱化チロシン依存性酸化還元酵素の設計方法及び耐熱化チロシン依存性酸化還元酵素
US9170221B2 (en) 2010-10-01 2015-10-27 Elizabeth M. HEIDER NMR crystallography methods for three-dimensional structure determination
US9194953B2 (en) * 2010-10-21 2015-11-24 Sony Corporation 3D time-of-light camera and method
WO2015072982A1 (fr) * 2013-11-13 2015-05-21 Heider Elizabeth M Procédés de cristallographie rmn pour la détermination d'une structure tridimensionnelle
GB201511508D0 (en) * 2015-07-01 2015-08-12 Ge Healthcare Bio Sciences Ab Method for determining a size of biomolecules
CN108469495B (zh) * 2018-05-08 2020-02-07 中国海洋大学 一种利用液相色谱串联质谱检测鱼类小清蛋白的方法
CN111755065B (zh) * 2020-06-15 2024-05-17 重庆邮电大学 一种基于虚拟网络映射和云并行计算的蛋白质构象预测加速方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0072004A2 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107449483A (zh) * 2017-08-04 2017-12-08 赵德省 一种物质余量的提醒系统及方法
CN107449483B (zh) * 2017-08-04 2019-12-06 赵德省 一种物质余量的提醒系统及方法

Also Published As

Publication number Publication date
WO2000072004A2 (fr) 2000-11-30
WO2000072004A3 (fr) 2002-11-28
WO2000072004A9 (fr) 2002-08-29
AU5298900A (en) 2000-12-12
JP2003528288A (ja) 2003-09-24

Similar Documents

Publication Publication Date Title
Piersimoni et al. Cross-linking mass spectrometry for investigating protein conformations and protein–protein interactions─ a method for all seasons
EP1277050A2 (fr) Procede de determination de la forme tridimensionnelle d'une macromolecule
Liebler Introduction to proteomics: tools for the new biology
Háda et al. Recent advancements, challenges, and practical considerations in the mass spectrometry-based analytics of protein biotherapeutics: A viewpoint from the biosimilar industry
JP4672615B2 (ja) 迅速かつ定量的なプロテオーム解析および関連した方法
Artigues et al. Protein structural analysis via mass spectrometry-based proteomics
de Koning et al. Computer‐assisted mass spectrometric analysis of naturally occurring and artificially introduced cross‐links in proteins and protein complexes
Merkley et al. Cross-linking and mass spectrometry methodologies to facilitate structural biology: finding a path through the maze
Meri et al. Proteomics: posttranslational modifications, immune responses and current analytical tools
Piotrowski et al. Structural investigation of proteins and protein complexes by chemical cross-linking/mass spectrometry
Loo et al. Application of mass spectrometry for target identification and characterization
Washburn Utilisation of proteomics datasets generated via multidimensional protein identification technology (MudPIT)
US7167819B1 (en) Method of determining the three-dimensional shape of a macromolecule
Tang et al. CLPM: a cross-linked peptide mapping algorithm for mass spectrometric analysis
Rojsajjakul et al. Multi-state unfolding of the alpha subunit of tryptophan synthase, a TIM barrel protein: insights into the secondary structure of the stable equilibrium intermediates by hydrogen exchange mass spectrometry
Akashi Investigation of molecular interaction within biological macromolecular complexes by mass spectrometry
Trnka et al. Role of integrative structural biology in understanding transcriptional initiation
Collins et al. Robust enrichment of phosphorylated species in complex mixtures by sequential protein and peptide metal-affinity chromatography and analysis by tandem mass spectrometry
Olson et al. Production of reliable MALDI spectra with quality threshold clustering of replicates
Chakravarti et al. Three dimensional structures of proteins and protein complexes from chemical cross-linking and mass spectrometry: a biochemical and computational overview
Renzone et al. Mass spectrometry-based approaches for structural studies on protein complexes at low-resolution
US20050233406A1 (en) Methods for high resolution identification of solvent accessible amide hydrogens in polypeptides and for characterization of polypeptide structure
Meyers et al. Protein identification and profiling with mass spectrometry
Piotrowski et al. Structural Investigation of Proteins and Protein Complexes by Chemical
Holt et al. Energetics of Biological Macromolecules, Part E

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20010104

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

RIN1 Information on inventor provided before grant (corrected)

Inventor name: GIBSON, BRADFORD, W.

Inventor name: OSHIRO, CONNIE, M.

Inventor name: TAYLOR, ERIC

Inventor name: KUNTZ, IRWIN, D.

Inventor name: TANG, NING

Inventor name: HEMPEL, JUDITH, C.

Inventor name: YOUNG, MALIN

Inventor name: DOLLINGER, GAVIN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20070614