WO2005000226A2 - Systemes de chromatographie multi-dimensionnels a lit melange et procedes de fabrication et d'utilisation associes - Google Patents

Systemes de chromatographie multi-dimensionnels a lit melange et procedes de fabrication et d'utilisation associes Download PDF

Info

Publication number
WO2005000226A2
WO2005000226A2 PCT/US2004/017647 US2004017647W WO2005000226A2 WO 2005000226 A2 WO2005000226 A2 WO 2005000226A2 US 2004017647 W US2004017647 W US 2004017647W WO 2005000226 A2 WO2005000226 A2 WO 2005000226A2
Authority
WO
WIPO (PCT)
Prior art keywords
reverse phase
bed
chromatography system
rpc
sequence
Prior art date
Application number
PCT/US2004/017647
Other languages
English (en)
Other versions
WO2005000226A3 (fr
Inventor
Jing Wei
Martin Latterich
Original Assignee
Diversa Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Diversa Corporation filed Critical Diversa Corporation
Publication of WO2005000226A2 publication Critical patent/WO2005000226A2/fr
Publication of WO2005000226A3 publication Critical patent/WO2005000226A3/fr

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/26Conditioning of the fluid carrier; Flow patterns
    • G01N30/38Flow patterns
    • G01N30/46Flow patterns using more than one column
    • G01N30/461Flow patterns using more than one column with serial coupling of separation columns
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K1/00General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length
    • C07K1/13Labelling of peptides
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K1/00General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length
    • C07K1/14Extraction; Separation; Purification
    • C07K1/16Extraction; Separation; Purification by chromatography
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K1/00General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length
    • C07K1/14Extraction; Separation; Purification
    • C07K1/16Extraction; Separation; Purification by chromatography
    • C07K1/18Ion-exchange chromatography
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K1/00General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length
    • C07K1/14Extraction; Separation; Purification
    • C07K1/36Extraction; Separation; Purification by a combination of two or more processes of different types

Definitions

  • TECHNICAL FIELD This invention relates to proteomics and mass spectrometry technology.
  • the invention provides novel systems and methods for determining polypeptide profiles and protein expression variations, as with proteome analyses.
  • the present invention provides systems and methods for simultaneously identifying and quantifying individual proteins in complex protein mixtures by selective differential labeling of amino acid residues followed by chromatographic and mass spectrographic analysis.
  • the invention also provides computer program products and computer implemented methods for practicing the systems and methods of the invention.
  • Biochemical pathways and metabolic networks can also be analyzed by globally and quantitatively measuring protein expression in various cell types and biological states (see, e.g., Ideker (2001) Science 292:929-934).
  • State-of-the-art techniques such as liquid-chromatography- electrospray-ionization tandem mass spectrometry have, in conjunction with database- searching computer algorithms, revolutionized the analysis of biochemical species from complex biological mixtures.
  • ICATs isotope-coded affinity tags
  • tandem mass spectrometry The method labels multiple cysteinyl residues and uses stable isotope dilution techniques. For example, Gygi (1999) Nat. Biotechnol. 10:994-999, compared protein expression in a yeast using ethanol or galactose as a carbon source.
  • Parent proteins of methylated peptides are identified by correlative database searching of fragment ion spectra using a computer program assisted paradigms or automated de novo sequencing that compares all tandem mass spectra of dO- and d3 -methylated peptide ion pairs. In Goodlett (2000) supra, ratios of proteins in two different mixtures were calculated for dO- to d3-methylated peptide pairs.
  • Screening markers include, for example, luciferase, beta-galactosidase, and green fluorescent protein. Screening can also be done by observing a cell holistically including but not limited to utilizing methods pertaining to genomics, RNA profiling, proteomics, metabolomics, and lipidomics as well as observing such aspects of growth as colony size, halo formation, etc. Additionally, screening for production of a desired compound, such as a therapeutic drug or "designer chemical" can be accomplished by observing binding of cell products to a receptor or ligand, such as on a solid support or on a column. Such screening can additionally be accomplished by binding to antibodies, as in an ELISA. In some instances the screening process can be automated so as to allow screening of suitable numbers of colonies or cells.
  • FACS fluorescence activated cell sorting
  • Selection is a form of screening in which identification and physical separation are achieved simultaneously, for example, by expression of a selectable marker, which, in some genetic circumstances, allows cells expressing the marker to survive while other cells die (or vice versa).
  • Selectable markers can include, for example, drug, toxin resistance, or nutrient synthesis genes. Selection is also done by such techniques as growth on a toxic substrate to select for hosts having the ability to detoxify a substrate, growth on a new nutrient source to select for hosts having the ability to utilize that nutrient source, competitive growth in culture based on ability to utilize a nutrient source, etc.
  • uncloned but differentially expressed proteins can be screened by differential display (Appleyard et al. Mol. Gen. Gent. 247:338-342 (1995)). Hopwood (Phil Trans R. Soc. Lond B 324:549-562) provides a review of screens for antibiotic production.
  • Omura Microbio. Rev. 50:259-279 (1986) and Nisbet (Ann Rev. Med. Chem.
  • Tagged substrates can also be used.
  • Upases and esterases can be screened using different lengths of fatty acids linked to umbelliferyl. The action of upases or esterases removes this tag from the fatty acid, resulting in a quenching or enhancement of umbelliferyl fluorescence. These enzymes can be screened in microtiter plates by a robotic device.
  • Genomics Genomics can refer to various investigative techniques that are broad in scope but often refers to measuring gene expression for multitudes of genes simultaneously. For a review see Lockhart, D.J. and Winzeler, E.A. 2000. Genomics, gene expression and DNA arrays. Nature, 405 (6788): 827-36. Biological Chips General considerations In some systems, an oligonucleotide probe is tethered, i.e., by covalent attachment, to a solid support, and arrays of oligonucleotide probes immobilized on solid supports have been used to detect specific nucleic acid sequences in a target nucleic acid.
  • bioinformatics involves studying an organism's genome to determine the sequence and placement of its genes and their relationship to other sequences and genes within the genome or to genes in other organisms. Another use of bioinformatics involves studying genes differentially or commonly expressed in different tissues or cell lines (e.g. normal and cancerous tissue). Such information is of significant interest in biomedical and pharmaceutical research, for instance to assist in the evaluation of drug efficacy and resistance.
  • the sequence tag method involves generation of a large number (e.g., thousands) of Expressed Sequence Tags ("ESTs”) from cDNA libraries (each produced from a different tissue or sample). ESTs are partial transcript sequences that may cover different parts of the cDNA(s) of a gene, depending on cloning and sequencing strategy.
  • Each EST includes about 50 to 300 nucleotides. If it is assumed that the number of tags is proportional to the abundance of transcripts in the tissue or cell type used to make the cDNA library, then any variation in the relative frequency of those tags, stored in computer databases, can be used to detect the differential abundance and potentially the expression of the corresponding genes.
  • genomic and EST information manipulation easy to perform and understand, sophisticated computer database systems have been developed. In one database system, developed by Incyte Pharmaceuticals, Inc. of Palo Alto, CA, genomic sequence data and the abundance levels of mRNA species represented in a given sample is electronically recorded and annotated with information available from public sequence databases such as GenBank. Examples of such databases include GenBank (NCBI) and TIGR.
  • the resulting information is stored in a relational database that may be employed to determine relationships between sequences and genes within and among genomes and establish a cDNA profile for a given tissue and to evaluate changes in gene expression caused by disease progression, pharmacological treatment, aging, etc.
  • a relational database developed by Incyte Pharmaceuticals, Inc. of Palo Alto, Calif, abundance levels of mRNA species represented in a given sample are electronically recorded and annotated with information available from public sequence databases such as GenBank.
  • the resulting information is stored in a relational database that may be employed to establish a cDNA profile for a given tissue and to evaluate changes in gene expression caused by disease progression, pharmacological treatment, aging, etc. Genetic information for a number of organisms has been catalogued in computer databases.
  • Bioinformatics includes the development of methods to search databases quickly, to analyze nucleic acid sequence information, and to predict protein sequence and structure from DNA sequence data.
  • ICATs isotope-coded affinity tags
  • tandem mass spectrometry The method labels multiple cysteinyl residues and uses stable isotope dilution techniques. For example, Gygi (1999) Nat. Biotechnol. 10:994-999, compared protein expression in a yeast using ethanol or galactose as a carbon source.
  • Parent proteins of methylated peptides are identified by correlative database searching of fragment ion spectra using a computer program assisted paradigms or automated de novo sequencing that compares all tandem mass spectra of dO- and d3 -methylated peptide ion pairs. In Goodlett (2000) supra, ratios of proteins in two different mixtures were calculated for dO- to d3 -methylated peptide pairs.
  • High throughput genomics refers to application of genomic or genetic data or analysis techniques that use microarrays or other genomic technologies to rapidly identify large numbers of genes or proteins, or distinguish their structure, expression or function from normal or abnormal cells or tissues.
  • an observer can be a person viewing a slide with a microscope or an observer who views digital images.
  • an observer can be a computer-based image analysis system, which automatically observes, analyses and quantitates biological arrayed samples with or without user interaction.
  • the present invention provides for the use of arrays of oligonucleotide probes immobilized in microfabricated patterns on silica chips for analyzing molecular interactions of biological interest.
  • the invention provides several strategies employing immobilized arrays of probes for comparing a reference sequence of known sequence with a target sequence showing substantial similarity with the reference sequence, but differing in the presence of, e.g., mutations.
  • the invention provides a tiling strategy employing an array of immobilized oligonucleotide probes comprising at least two sets of probes.
  • a first probe set comprises a plurality of probes, each probe comprising a segment of at least three nucleotides exactly complementary to a subsequence of the reference sequence, the segment including at least one interrogation position complementary to a corresponding nucleotide in the reference sequence.
  • a second probe set comprises a corresponding probe for each probe in the first probe set, the corresponding probe in the second probe set being identical to a sequence comprising the corresponding probe from the first probe set or a subsequence of at least three nucleotides thereof that includes the at least one interrogation position, except that the at least one interrogation position is occupied by a different nucleotide in each of the two corresponding probes from the first and second probe sets.
  • the probes in the first probe set have at least two interrogation positions corresponding to two contiguous nucleotides in the reference sequence. One interrogation position corresponds to one of the contiguous nucleotides, and the other interrogation position to the other.
  • the invention provides a tiling strategy employing an array comprising four probe sets.
  • a first probe set comprises a plurality of probes, each probe comprising a segment of at least three nucleotides exactly complementary to a subsequence of the reference sequence, the segment including at least one interrogation position complementary to a corresponding nucleotide in the reference sequence.
  • Second, third and fourth probe sets each comprise a corresponding probe for each probe in the first probe set.
  • the probes in the second, third and fourth probe sets are identical to a sequence comprising the corresponding probe from the first probe set or a subsequence of at least three nucleotides thereof that includes the at least one interrogation position, except that the at least one interrogation position is occupied by a different nucleotide in each of the four corresponding probes from the four probe sets.
  • the first probe can have at least 100 interrogation positions corresponding to 100 contiguous nucleotides in the reference sequence.
  • the first probe set can have an interrogation position corresponding to every nucleotide in the reference sequence.
  • the segment of complementarity within the probe set is usually about 9 to 21 nucleotides.
  • the invention provides immobilized arrays of probes tiled for multiple reference sequences, one such array comprises at least one pair of first and second probe groups, each group comprising first and second sets of probes as defined in the first aspect.
  • Each probe in the first probe set from the first group is exactly complementary to a subsequence of a first reference sequence
  • each probe in the first probe set from the second group is exactly complementary to a subsequence of a second reference sequence.
  • the first group of probes are tiled with respect to a first reference sequence and the second group of probes with respect to a second reference sequence.
  • Each group of probes can also include third and fourth sets of probes as defined in the second aspect.
  • the second reference sequence is a mutated form of the first reference sequence.
  • the invention provides arrays for block tiling.
  • Block tiling is a species of the general tiling strategies described above.
  • the usual unit of a block tiling array is a group of probes comprising a wildtype probe, a first set of three mutant probes and a second set of three mutant probes.
  • the wildtype probe comprises a segment of at least three nucleotides exactly complementary to a subsequence of a reference sequence.
  • the segment has at least first and second interrogation positions corresponding to first and second nucleotides in the reference sequence.
  • the probes in the first set of three mutant probes are each identical to a sequence comprising the wildtype probe or a subsequence of at least three nucleotides thereof including the first and second interrogation positions, except in the first interrogation position, which is occupied by a different nucleotide in each of the three mutant probes and the wildtype probe.
  • the probes in the second set of three mutant probes are each identical to a sequence comprising the wildtype probes or a subsequence of at least three nucleotides thereof including the first and second interrogation positions, except in the second interrogation position, which is occupied by a different nucleotide in each of the three mutant probes and the wildtype probe.
  • the invention provides methods of comparing a target sequence with a reference sequence using arrays of immobilized pooled probes.
  • the arrays employed in these methods represent a further species of the general tiling arrays noted above.
  • variants of a reference sequence differing from the reference sequence in at least one nucleotide are identified and each is assigned a designation.
  • An array of pooled probes is provided, with each pool occupying a separate cell of the array.
  • Each pool comprises a probe comprising a segment exactly complementary to each variant sequence assigned a particular designation.
  • the array is then contacted with a target sequence comprising a variant of the reference sequence.
  • the relative hybridization intensities of the pools in the array to the target sequence are determined.
  • each variant is assigned a designation having at least one digit and at least one value for the digit.
  • each pool comprises a probe comprising a segment exactly complementary to each variant sequence assigned a particular value in a particular digit.
  • n x (m-1) pooled probes are used are used to assign each variant a designation.
  • the invention provides a pooled probe for trellis tiling, a further species of the general tiling strategy.
  • a pooled trellis probe comprises a segment exactly complementary to a subsequence of a reference sequence except at a first interrogation position occupied by a pooled nucleotide N, a second interrogation position occupied by a pooled nucleotide selected from the group of three consisting of (1) M or K, (2) R or Y and (3) S or W, and a third interrogation position occupied by a second pooled nucleotide selected from the group.
  • the pooled nucleotide occupying the second interrogation position comprises a nucleotide complementary to a corresponding nucleotide from the reference sequence when the second pooled probe and reference sequence are maximally aligned
  • the pooled nucleotide occupying the third interrogation position comprises a nucleotide complementary to a corresponding nucleotide from the reference sequence when the third pooled probe and the reference sequence are maximally aligned.
  • Standard IUPAC nomenclature is used for describing pooled nucleotides.
  • an array comprises at least first, second and third cells, respectively occupied by first, second and third pooled probes, each according to the generic description above.
  • the segment of complementarity, location of interrogation positions, and selection of pooled nucleotide at each interrogation position may or may not differ between the three pooled probes subject to the following constraint.
  • One of the three interrogation positions in each of the three pooled probes must align with the same corresponding nucleotide in the reference sequence. This interrogation position must be occupied by a N in one of the pooled probes, and a different pooled nucleotide in each of the other two pooled probes.
  • the invention provides arrays for bridge tiling.
  • Bridge tiling is a species of the general tiling strategies noted above, in which probes from the first probe set contain more than one segment of complementarity.
  • a nucleotide in a reference sequence is usually determined from a comparison of four probes.
  • a first probe comprises at least first and second segments, each of at least three nucleotides and each exactly complementary to first and second subsequences of a reference sequences.
  • the segments including at least one interrogation position corresponding to a nucleotide in the reference sequence.
  • first and second subsequences are noncontiguous in the reference sequence, or
  • the arrays of the invention can further comprise second, third and fourth probes, which are identical to a sequence comprising the first probe or a subsequence thereof comprising at least three nucleotides from each of the first and second segments, except in the at least one interrogation position, which differs in each of the probes.
  • the first and second subsequences are separated by one or two nucleotides in the reference sequence.
  • the invention provides arrays of probes for multiplex tiling.
  • Multiplex tiling is a strategy, in which the identity of two nucleotides in a target sequence is determined from a comparison of the hybridization intensities of four probes, each having two interrogation positions.
  • Each of the probes comprising a segment of at least 7 nucleotides that is exactly complementary to a subsequence from a reference sequence, except that the segment may or may not be exactly complementary at two interrogation positions.
  • the nucleotides occupying the interrogation positions are selected by the following rules: (1) the first interrogation position is occupied by a different nucleotide in each of the four probes, (2) the second interrogation position is occupied by a different nucleotide in each of the four probes, (3) in first and second probes, the segment is exactly complementary to the subsequence, except at no more than one of the interrogation positions, (4) in third and fourth probes, the segment is exactly complementary to the subsequence, except at both of the interrogation positions.
  • the invention provides arrays of immobilized probes including helper mutations.
  • Helper mutations are useful for, e.g., preventing self- annealing of probes having inverted repeats.
  • the identity of a nucleotide in a target sequence is usually determined from a comparison of four probes.
  • a first probe comprises a segment of at least 7 nucleotides exactly complementary to a subsequence of a reference sequence except at one or two positions, the segment including an interrogation position not at the one or two positions. The one or two positions are occupied by helper mutations.
  • third and fourth mutant probes are each identical to a sequence comprising the wildtype probe or a subsequence thereof including the interrogation position and the one or two positions, except in the interrogation position, which is occupied by a different nucleotide in each of the four probes.
  • the invention provides arrays of probes comprising at least two probe sets, but lacking a probe set comprising probes that are perfectly matched to a reference sequence. Such arrays are usually employed in methods in which both reference and target sequence are hybridized to the array.
  • the first probe set comprising a plurality of probes, each probe comprising a segment exactly complementary to a subsequence of at least 3 nucleotides of a reference sequence except at an interrogation position.
  • the second probe set comprises a corresponding probe for each probe in the first probe set, the corresponding probe in the second probe set being identical to a sequence comprising the corresponding probe from the first probe set or a subsequence of at least three nucleotides thereof that includes the interrogation position, except that the interrogation position is occupied by a different nucleotide in each of the two corresponding probes and the complement to the reference sequence.
  • the invention provides methods of comparing a target sequence with a reference sequence comprising a predetermined sequence of nucleotides using any of the arrays described above. The methods comprise hybridizing the target nucleic acid to an array and determining which probes, relative to one another, in the array bind specifically to the target nucleic acid.
  • the relative specific binding of the probes indicates whether the target sequence is the same or different from the reference sequence.
  • the target sequence has a substituted nucleotide relative to the reference sequence in at least one undetermined position, and the relative specific binding of the probes indicates the location of the position and the nucleotide occupying the position in the target sequence.
  • a second target nucleic acid is also hybridized to the array. The relative specific binding of the probes then indicates both whether the target sequence is the same or different from the reference sequence, and whether the second target sequence is the same or different from the reference sequence.
  • the relative specific binding of probes in the first group indicates whether the target sequence is the same or different from the first reference sequence.
  • the relative specific binding of probes in the second group indicates whether the target sequence is the same or different from the second reference sequence.
  • Such methods are particularly useful for analyzing heterologous alleles of a gene. Some methods entail hybridizing both a reference sequence and a target sequence to any of the arrays of probes described above. Comparison of the relative specific binding of the probes to the reference and target sequences indicates whether the target sequence is the same or different from the reference sequence.
  • the invention provides arrays of immobilized probes in which the probes are designed to tile a reference sequence from a human immunodeficiency virus.
  • Reference sequences from either the reverse transcriptase gene or protease gene of HIV are of particular interest.
  • Some chips further comprise arrays of probes tiling a reference sequence from a 16S RNA or DNA encoding the 16S RNA from a pathogenic microorganism.
  • the invention further provides methods of using such arrays in analyzing a HIV target sequence. The methods are particularly useful where the target sequence has a substituted nucleotide relative to the reference sequence in at least one position, the substitution conferring resistance to a drug use in treating a patient infected with a HIV virus.
  • the methods reveal the existence of the substituted nucleotide.
  • the methods are also particularly useful for analyzing a mixture of undetermined proportions of first and second target sequences from different HIV variants.
  • the relative specific binding of probes indicates the proportions of the first and second target sequences.
  • the invention provides arrays of probes tiled based on reference sequence from a CFTR gene.
  • An exemplary array comprises at least a group of probes comprising a wildtype probe, and five sets of three mutant probes.
  • the wildtype probe is exactly complementary to a subsequence of a reference sequence from a cystic fibrosis gene, the segment having at least five interrogation positions corresponding to five contiguous nucleotides in the reference sequence.
  • the probes in the first set of three mutant probes are each identical to the wildtype probe, except in a first of the five interrogation positions, which is occupied by a different nucleotide in each of the three mutant probes and the wildtype probe.
  • the probes in the second set of three mutant probes are each identical to the wildtype probe, except in a second of the five interrogation positions, which is occupied by a different nucleotide in each of the three mutant probes and the wildtype probe.
  • the probes in the third set of three mutant probes are each identical to the wildtype probe, except in a third of the five interrogation positions, which is occupied by a different nucleotide in each of the three mutant probes and the wildtype probe.
  • the probes in the fourth set of three mutant probes are each identical to the wildtype probe, except in a fourth of the five interrogation positions, which is occupied by a different nucleotide in each of the three mutant probes and the wildtype probe.
  • the probes in the fifth set of three mutant probes are each identical to the wildtype probe, except in a fifth of the five interrogation positions, which is occupied by a different nucleotide in each of the three mutant probes and the wildtype probe.
  • a chip can comprise two such groups of probes.
  • the first group comprises a wildtype probe exactly complementary to a first reference sequence
  • the second group comprises a wildtype probe exactly complementary to a second reference sequence that is a mutated form of the first reference sequence.
  • the invention further provides methods of using the arrays of the invention for analyzing target sequences from a CFTR gene.
  • the methods are capable of simultaneously analyzing first and second target sequences representing heterozygous alleles of a CFTR gene.
  • the invention provides arrays of probes tiling a reference sequence from a p53 gene, an hMLHl gene and/or an MSH2 gene.
  • the invention further provides methods of using the arrays described above to analyze these genes. The method are useful, e.g., for diagnosing patients susceptible to developing cancer.
  • the invention provides arrays of probes tiling a reference sequence from a mitochondrial genome.
  • the reference sequence may comprise part or all of the D-loop region, or all, or substantially all, of the mitochondrial genome.
  • the invention further provides method of using the arrays described above to analyze target sequences from a mitochondrial genome. The methods are useful for identifying mutations associated with disease, and for forensic, epidemiological and evolutionary studies.
  • the invention provides a method for identifying proteins by differential labeling of peptides, the method comprising the following steps: (a) providing a sample comprising a polypeptide; (b) providing a plurality of labeling reagents which differ in molecular mass but do not differ in chromatographic retention properties and do not differ in ionization and detection properties in mass spectrographic analysis, wherein the differences in molecular mass are distinguishable by mass spectrographic analysis; (c) fragmenting the polypeptide into peptide fragments by enzymatic digestion or by non-enzymatic fragmentation; (d) contacting the labeling reagents of step (b) with the peptide fragments of step (c), thereby labeling the peptides with the differential labeling reagents; (e) separating the peptides by chromatography to generate an eluate; (f) feeding the eluate of step (e) into a mass spectrometer and quantifying the amount of each peptide and generating the sequence of each
  • the sample of step (a) comprises a cell or a cell extract.
  • the method can further comprise providing two or more samples comprising a polypeptide.
  • One or more of the samples can be derived from a wild type cell and one sample can be derived from an abnormal or a modified cell.
  • the abnormal cell can be a cancer cell.
  • the modified cell can be a cell that is mutagenized &/or treated with a chemical, a physiological factor, or the presence of another organism (including, e.g. a eukaryotic organism, prokaryotic organism, virus, vector, prion, or part thereof), &/or exposed to an environmental factor or change or physical force (including, e.g., sound, light, heat, sonication, and radiation).
  • the modification can be genetic change (including, for example, a change in DNA or RNA sequence or content) or otherwise.
  • the method further comprises purifying or fractionating the polypeptide before the fragmenting of step (c).
  • the method can further comprise purifying or fractionating the polypeptide before the labeling of step (d).
  • the method can further comprise purifying or fractionating the labeled peptide before the chromatography of step (e).
  • the purifying or fractionating comprises a method selected from the group consisting of size exclusion chromatography, size exclusion chromatography, HPLC, reverse phase HPLC and affinity purification.
  • the method further comprises contacting the polypeptide with a labeling reagent of step (b) before the fragmenting of step (c).
  • the labeling reagent of step (b) comprises the general formulae selected from the group consisting of: Z A OH and Z B OH, to esterify peptide C-terminals and/or Glu and Asp side chains; Z A NH 2 and Z B NH 2 , to form amide bond with peptide C-terminals and/or Glu and Asp side chains; and Z ⁇ CO 2 H and Z B CO 2 H.
  • Z ⁇ and Z B independently of one another comprise the general formula R-Z 1 - A'-Z 2 -A 2 -Z 3 -A 3 -Z 4 -A 4 -, Z 1 , Z 2 , Z 3 , and Z 4 independently of one another, are selected from the group consisting of nothing, O, OC(O), OC(S), OC(O)O, OC(O)NR, OC(S)NR, OSiRR 1 , S, SC(O), SC(S), SS, S(O), S(O 2 ), NR, NRR 1+ , C(O), C(O)O, C(S), C(S)O, C(O)S, C(O)NR, C(S)NR, SiRR 1 , (Si(RR')O)n, SnRR 1 , Sn(RR')O, BR(OR'), BRR 1 , B
  • the alkyl group (see definition below) is selected from the group consisting of an alkenyl, an alkynyl and an aryl group.
  • One or more C-C bonds from (CRR')n can be replaced with a double or a triple bond; thus, in alternative aspects, an R or an R 1 group is deleted.
  • the (CRR')n can be selected from the group consisting of an o-arylene, an w-arylene and a »-arylene, wherein each group has none or up to 6 substituents.
  • the (CRR')n can be selected from the group consisting of a carbocyclic, a bicyclic and a tricyclic fragment, wherein the fragment has up to 8 atoms in the cycle with or without a heteroatom selected from the group consisting of an O atom, a N atom and an S atom.
  • two or more labeling reagents have the same structure but a different isotope composition.
  • Z ⁇ has the same structure as Z B
  • Z has a different isotope composition than Z B .
  • the isotope is boron- 10 and boron- 11; carbon- 12 and carbon-13; nitrogen- 14 and nitrogen-15; and, sulfur-32 and sulfur-34.
  • x is greater than y.
  • x and y are between 1 and about 11, between 1 and about 21, between 1 and about 31, between 1 and about 41, or between 1 and about 51.
  • the labeling reagent of step (b) can comprise the general formulae selected from the group consisting of: Z A OH and Z B OH to esterify peptide C-terminals; Z A NH 2 / Z B NH 2 to form an amide bond with peptide C-terminals; and, Z ⁇ CO 2 H / Z B CO 2 H to form an amide bond with peptide N-terminals; wherein Z A and Z B have the general formula R-Z'-A'-Z 2 -A 2 -Z 3 -A 3 -Z 4 -A 4 - ; Z 1 , Z 2 , Z 3 , and Z 4 , independently of one another, are selected from the group consisting of nothing, O, OC(O), OC(S), OC(O)O, OC(O)NR, OC(S)NR, OSiRR 1 , S, SC(O), SC(S), SS, S(O), S(O 2 ), NR, NRR 1+ ,
  • a single C-C bond in a (CRR')n group is replaced with a double or a triple bond; thus, the R and R 1 can be absent.
  • the (CRR')n can comprise a moiety selected from the group consisting of an o-arylene, an w-arylene and ap- arylene, wherein the group has none or up to 6 substituents.
  • the group can comprise a carbocyclic, a bicyclic, or a tricyclic fragments with up to 8 atoms in the cycle, with or without a heteroatom selected from the group consisting of an O atom, an N atom and an S atom.
  • R, R 1 independently from other R and R 1 in Z 1 - Z 4 and independently from other R and R in A - A 4 , are selected from the group consisting of a hydrogen atom, a halogen and an alkyl group.
  • the alkyl group (see definition below) can be an alkenyl, an alkynyl or an aryl group.
  • the "n" in Z 1 - Z 4 is independent of n in A 1 - A 4 and is an integer selected from the group consisting of about 51; about 41; about 31; about 21, about 11 and about 6.
  • Z A has the same structure a Z B but Z A further comprises x number of -CH 2 - fragment(s) in one or more A 1 - A 4 fragments, wherein x is an integer. In one aspect, Z A has the same structure a Z B but Z A further comprises x number of -CF 2 - fragment(s) in one or more A 1 - A 4 fragments, wherein x is an integer. In one aspect, Z A comprises x number of protons and Z B comprises y number of halogens in the place of protons, wherein x and y are integers.
  • Z A contains x number of protons and Z B contains y number of halogens, and there are x - y number of protons remaining in one or more A 1 - A fragments, wherein x and y are integers.
  • Z A further comprises x number of -O- fragment(s) in one or more A 1 - A 4 fragments, wherein x is an integer.
  • Z A further comprises x number of -S- fragment(s) in one or more A 1 - A 4 fragments, wherein x is an integer.
  • Z A further comprises x number of -O- fragment(s) and Z B further comprises y number of-S- fragment(s) in the place of-O- fragment(s), wherein and y are integers.
  • Z A further comprises x -y number of -O- fragment(s) in one or more A 1 - A 4 fragments, wherein x andy are integers.
  • x and y are integers selected from the group consisting of between 1 about 51; between 1 about 41; between 1 about 31; between 1 about 21, between 1 about 11 and between 1 about 6, wherein x is greater than y.
  • n, m and y are integers selected from the group consisting of about 51; about 41; about 31; about 21, about 11; about 6 and between about 5 and 51.
  • two or more labeling reagents have the same structure but a different isotope composition.
  • An exemplary labeling reagent pair is N, N, dimethyl-iodoacetamide and N, N, d6-dimethyl-iodoacetamide, having the structures:
  • the methyl group can be replaced by any lower alkyl group (e.g., ethyl, butyl and the like).
  • the separating of step (e) comprises a liquid chromatography system, such as a multidimensional liquid chromatography (e.g., a system of the invention) or a capillary chromatography system.
  • the mass spectrometer comprises a tandem mass spectrometry device or an ion trap mass spectrometer (LCQ or LTQ) or a combination thereof.
  • the ion trap mass spectrometer comprises a Finnigan LCQ Deca XP MAXTM, a Finnigan MDLC LTQTM or a Finnigan LTQ FTTM (Thermo Electron Co ⁇ oration, San Jose, CA), or Agilent's LC/MSD Trap (Agilent Technologies, Palo Alto, CA), or an equivalent mass spectrometer.
  • the Agilent LC/MSD Trap is an 1100 series LC/MSD TRAPTM, or, the LC/MSD Trap SLTM, or, the LC/MSD Trap XCT TM (Agilent Technologies, Palo Alto, CA), or equivalent device.
  • the method further comprises quantifying the amount of each polypeptide or each peptide.
  • the invention provides a method for defining the expressed proteins associated with a given cellular state, the method comprising the following steps: (a) providing a sample comprising a cell in the desired cellular state; (b) providing a plurality of labeling reagents which differ in molecular mass but do not differ in chromatographic retention properties and do not differ in ionization and detection properties in mass spectrographic analysis, wherein the differences in molecular mass are distinguishable by mass spectrographic analysis; (c) fragmenting polypeptides derived from the cell into peptide fragments by enzymatic digestion or by non- enzymatic fragmentation; (d) contacting the labeling reagents of step (b) with the peptide fragments of step (c), thereby labeling the peptides with the differential labeling reagents; (e) separating the peptides by chromatography to generate an eluate; (f) feeding the a
  • the invention provides a method for quantifying changes in protein expression between at least two cellular states, the method comprising the following steps: (a) providing at least two samples comprising cells in a desired cellular state; (b) providing a plurality of labeling reagents which differ in molecular mass but do not differ in chromatographic retention properties and do not differ in ionization and detection properties in mass spectrographic analysis, wherein the differences in molecular mass are distinguishable by mass spectrographic analysis; (c) fragmenting polypeptides derived from the cells into peptide fragments by enzymatic digestion or by non-enzymatic fragmentation; (d) contacting the labeling reagents of step (b) with the peptide fragments of step (c), thereby labeling the peptides with the differential labeling reagents, wherein the labels used in one same are different from the labels used in other samples; (e) separating the peptides by chromatography to generate an eluate; (f) feeding the eluate of step (
  • the invention provides a method for identifying proteins by differential labeling of peptides, the method comprising the following steps: (a) providing a sample comprising a polypeptide; (b) providing a plurality of labeling reagents which differ in molecular mass but do not differ in chromatographic retention properties and do not differ in ionization and detection properties in mass spectrographic analysis, wherein the differences in molecular mass are distinguishable by mass spectrographic analysis; (c) fragmenting the polypeptide into peptide fragments by enzymatic digestion or by non-enzymatic fragmentation; (d) contacting the labeling reagents of step (b) with the peptide fragments of step (c), thereby labeling the peptides with the differential labeling reagents; (e) separating the peptides by multidimensional liquid chromatography to generate an eluate; (f) feeding the eluate of step (e) into a tandem mass spectrometer or an ion trap mass spectrometer or a
  • the ion trap mass spectrometer comprises a Finnigan LCQ Deca XP MAXTM, a Finnigan MDLC LTQTM or a Finnigan LTQ FTTM.
  • the invention provides a chimeric labeling reagent comprising (a) a first domain comprising a biotin; and (b) a second domain comprising a reactive group capable of covalently binding to an amino acid, wherein the chimeric labeling reagent comprises at least one isotope.
  • the isotope(s) can be in the first domain or the second domain.
  • the isotope(s) can be in the biotin.
  • the isotope can be a deuterium isotope, a boron- 10 or boron- 11 isotope, a carbon- 12 or a carbon- 13 isotope, a nitrogen- 14 or a nitrogen- 15 isotope, or, a sulfur-32 or a sulfur-34 isotope.
  • the chimeric labeling reagent can comprise two or more isotopes.
  • the chimeric labeling reagent reactive group capable of covalently binding to an amino acid can be a succimide group, an isothiocyanate group or an isocyanate group.
  • the reactive group can be capable of covalently binding to an amino acid binds to a lysine or a cysteine.
  • the chimeric labeling reagent can further comprising a linker moiety linking the biotin group and the reactive group.
  • the linker moiety can comprise at least one isotope.
  • the linker is a cleavable moiety that can be cleaved by, e.g., enzymatic digest or by reduction.
  • the invention provides a method of comparing relative protein concentrations in a sample comprising (a) providing a plurality of differential small molecule tags, wherein the small molecule tags are structurally identical but differ in their isotope composition, and the small molecules comprise reactive groups that covalently bind to cysteine or lysine residues or both; (b) providing at least two samples comprising polypeptides; (c) attaching covalently the differential small molecule tags to amino acids of the polypeptides; (d) determining the protein concentrations of each sample in a tandem mass spectrometer or an ion trap mass spectrometer or a combination thereof; and, (d) comparing relative protein concentrations of each sample.
  • the sample comprises a complete or a fractionated cellular sample.
  • the ion trap mass spectrometer comprises a Finnigan LCQ Deca XP MAXTM, a Finnigan MDLC LTQTM or a Finnigan LTQ
  • the differential small molecule tags comprise a chimeric labeling reagent comprising (a) a first domain comprising a biotin; and, (b) a second domain comprising a reactive group capable of covalently binding to an amino acid, wherein the chimeric labeling reagent comprises at least one isotope.
  • the isotope can be a deuterium isotope, a boron- 10 or boron- 11 isotope, a carbon- 12 or a carbon- 13 isotope, a nitrogen- 14 or a nitrogen- 15 isotope, or, a sulfur- 32 or a sulfur-34 isotope.
  • the chimeric labeling reagent can comprise two or more isotopes.
  • the reactive group can be capable of covalently binding to an amino acid is selected from the group consisting of a succimide group, an isothiocyanate group and an isocyanate group.
  • the invention provides a method of comparing relative protein concentrations in a sample comprising (a) providing a plurality of differential small molecule tags, wherein the differential small molecule tags comprise a chimeric labeling reagent comprising (i) a first domain comprising a biotin; and, (ii) a second domain comprising a reactive group capable of covalently binding to an amino acid, wherein the chimeric labeling reagent comprises at least one isotope; (b) providing at least two samples comprising polypeptides; (c) attaching covalently the differential small molecule tags to amino acids of the polypeptides; (d) isolating the tagged polypeptides on a biotin-binding column by binding tagged polypeptides to the column, washing non-bound materials off the column, and eluting
  • the ion trap mass spectrometer comprises a Finnigan LCQ Deca XP MAXTM, a Finnigan MDLC LTQTM or a Finnigan LTQ FTTM.
  • the invention provides chromatography systems comprising a first reverse phase column (RPC) (a first dimension), an ion exchange column (e.g., a cation (CX) or anion exchange column) (a second dimension), a second reverse phase column (RPC) (a third dimension), wherein the first reverse phase column (RPC), the ion exchange column (e.g., a cation (CX) or anion exchange column) and the second reverse phase column (RPC) are connected in series; the first reverse phase column (RPC) has a free distal end and a proximal end connected to the ion exchange column (e.g., a cation (CX) or anion exchange column), or, first reverse phase column (RPC) is configured such that either the distal end or the proximal
  • the second reverse phase column (RPC), or the first reverse phase column (RPC), or both are connected to an analytical device on its distal end such that an eluate can be fed into the analytical device.
  • the analytical device can comprise a mass spectrometer.
  • the mass spectrometer can further comprise a nano-spray apparatus.
  • the mass spectrometer comprises a tandem mass spectrometer or an ion trap mass spectrometer or a combination thereof.
  • the ion exchange column (e.g., a cation (CX) or anion exchange column) and the second reverse phase column (RPC) are enclosed in one housing and the first reverse phase column (RPC) is enclosed in a second housing.
  • the three dimensions, or columns are all in different housings, or, the columns are arranged such that they can be easily, and individually, replaced.
  • the ion trap mass spectrometer comprises a Finnigan LCQ Deca XP MAXTM, a Finnigan MDLC LTQTM or a Finnigan LTQ FTTM.
  • a flow valve e.g., a low volume flow valve (e.g., a microvalve) and/or an inline microfilter assembly connects the various columns (e.g., the various housings).
  • each dimensions, or column is in a different housing and one, two or all of the housings are connected from each other by a flow valve, e.g., an inline microfilter assembly, and the like.
  • a flow valve separates the first housing and the second housing.
  • a flow valve e.g., a low volume flow valve and/or an inline microfilter assembly connects the first reverse phase column (RPC) to the ion exchange column (e.g., a cation (CX) or anion exchange column) and the second reverse phase column (RPC).
  • the first reverse phase column (RPC), the ion exchange column and the second reverse phase column (RPC) are enclosed in one housing.
  • valves inputs and/or outputs to or from any or all of the columns are fitted with valves.
  • the flow valve is a one-way, a two-way, a three-way (a "T-valve") or a four way valve.
  • the housing(s) comprise fused silica capillaries.
  • a valve is fitted on the distal end of either reverse phase column (the end not connected to the ion exchange), or both distal ends of the reverse phase columns (this alternative aspect can be in addition to having a valve between the first reverse phase column and the ion exchange/ second reverse phase column assembly).
  • the flow valve is a one-way, a two-way, a three-way (a "T- valve") or a four way valve.
  • this valve or valves are a flow valve, e.g., a low volume flow valve.
  • the valve connection assembly can further comprise an inline microfilter assembly.
  • the system of the invention is fully automated.
  • the system can comprise a sample injector fully integrated with the automated system.
  • the system is integrated to a computer, which can be programmed to run samples on the system, including equilibrating columns, washing, step elution of samples, and the like.
  • an automated system of the invention is used for high throughput proteome profiling with on-line sample collection.
  • the first, second or both reverse phase columns are packed with a reverse phase resin or equivalent.
  • the first, second or both reverse phase resins can comprise a C18 reverse phase resin or equivalent.
  • the ion exchange column can comprise a strong cation exchange (SCX) resin or equivalent.
  • the strong cation exchange (SCX) resin can comprise a polysulfoethyl A strong cation exchange resin.
  • the first reverse phase column (RPC), the second first reverse phase column (RPC), or both are connected to an HPLC on a distal end.
  • the first reverse phase column has about 10%, 20%, 30%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 325%, 350%, 375%, 400%, 425%, 450%, 475%, 500%, 525%, 550%, 575%, 600%, 625%, 650%, 675%, 700%, 725%, 750%, 775%, 800%, 825%, 850%, 875%, 900%, 925%, 950%, 975%, 1000%, or more, greater capacity than the second reverse phase column (RPC) (which, in one aspect, is the third dimension in an exemplary 3-D LC-MS/MS or 3D LC LCQ MS/MS)
  • the first reverse phase column has about 10%, 20%, 30%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 325%, 350%, 375%, 400%, 425%, 450%, 475%, 500%, 525%, 550%, 575%, 600%, 625%, 650%, 675%, 700%, 725%, 750%, 775%, 800%, 825%, 850%, 875%, 900%, 925%, 950%, 975%, 1000%, or more, resin than the second reverse phase column (RPC), e.g.
  • the loading capacity is proportional to the column dimension.
  • the loading capacity is approximately 100 ug protein digest per 10 cm X 180 um CI 8 column, up to milligram sized sample.
  • the chromatography systems can further comprise a computer system operatively linked to the cliromatography system, thereby making the chromatography system an automated operation.
  • the chromatography systems can further comprise a computer system operatively linked to the mass spectrometer for quantifying the amount of each peptide by use of data from the mass spectrometer.
  • the chromatography systems can further comprise a computer system operatively linked to the mass spectrometer for generating the sequence of each peptide by use of data from the mass spectrometer.
  • the invention provides mixed bed multi-dimensional liquid chromatographs comprising a first resin bed (a first dimension), a second resin bed (a second dimension) and a third resin bed (a third dimension) connected in series, wherein the first resin bed comprises a reverse phase resin, the second resin bed comprises an ion exchange (e.g., a cation or anion exchange) resin bed and the third resin bed comprises a reverse phase resin, and the reverse phase resin of the first bed has a free distal end and a proximal end connected to the ion exchange bed, or, the reverse phase resin of the first bed is configured such that the distal end and/or the proximal end are connected to the ion exchange column such that a sample can be loaded into and eluted out of first reverse phase column (RPC) to the ion exchange column from the same end (which can be either the distal end or the proximal end), and the reverse phase resin of the third bed has a free distal end and a proximal end connected to the ion
  • the reverse phase resin of the first bed has a greater capacity than the reverse phase resin of the third bed, or, the reverse phase resin of the third bed has a greater capacity than the reverse phase resin of the first bed.
  • the reverse phase resin of the first bed, the reverse phase resin of the third bed, or both can be connected to an analytical device such that an eluate can be fed into the analytical device.
  • the loading capacity is proportional to the column dimension. For example, in one aspect, the loading capacity is approximately 100 ug protein digest per 10 cm X 180 um C18 column, or equivalent, up to milligram sized sample.
  • the analytical device comprises a mass spectrometer.
  • the mass spectrometer can further comprise a nano-spray apparatus.
  • the mass spectrometer can comprise a tandem mass spectrometer or an ion trap mass spectrometer (LCQ or LTQ) or a combination thereof.
  • the ion trap mass spectrometer comprises a Finnigan LCQ Deca XP MAXTM, a Finnigan MDLC LTQTM or a Finnigan LTQ FTTM.
  • each resin bed is enclosed in a separate housing (for, in some aspects, easy, independent replacement of any individual resin bed).
  • the second resin bed and a third resin bed are enclosed in one housing and the first resin bed is enclosed in a second housing.
  • a flow valve e.g., a low volume flow valve and/or an inline microfilter assembly, connects each housing to each other and/or to any inputs or outputs.
  • a flow valve connects the first housing and the second housing.
  • the inline microfilter assembly can further comprise a valve, e.g., a one way or two way valve.
  • a flow valve e.g., a low volume flow valve, or directional control flow valve, e.g., a one way or two way flow valve
  • an inline microfilter assembly connects the first bed to the second and third resin beds.
  • the first reverse phase resin bed, the ion exchange resin bed and the second reverse phase resin bed are enclosed in one housing.
  • the mixed bed multi-dimensional liquid chromatographs of the invention are fully automated.
  • the chromatographs can comprise a sample injector fully integrated with the automated system.
  • the chromatographs of the invention are integrated to a computer, which can be programmed to run samples, including equilibrating columns, washing, step elution of samples, and the like.
  • chromatographs of the invention are used for high throughput proteome profiling with on-line sample collection. See Figure 22 for an exemplary automated chromatograph system of the invention.
  • the reverse phase resin of the first bed, the reverse phase resin of the third bed or both reverse phase resin beds are packed with a Cx reverse phase resin or equivalent, wherein X is an integer between five and thirty.
  • the Cx reverse phase resin or equivalent comprises a CI 8 reverse phase resin or equivalent.
  • the ion exchange bed is packed with a strong cation exchange (SCX) resin or equivalent.
  • the strong cation exchange resin (SCX) can comprise a polysulfoethyl A strong cation exchange resin.
  • the reverse phase resin of the first bed, or the reverse phase resin of the third bed, or both are connected to an HPLC.
  • the first reverse phase resin bed has about 10%, 20%, 30%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200%, 210%, 220%, 230%, 240%, 250%, 260%, 270%, 280%, 290%, 300%, 325%, 350%,, 375%, 400%, 425%, 450%, 475%,, 500%, 525%, 550%, 575%, 600%, 625%, 650%, 675%, 700%, 725%, 750%, 775%,, 800%, 825%, 850%, 875%, 900%, 925%,, 950%, 975%, 1000%, or more, greater capacity than the second reverse phase resin bed.
  • the mixed bed multi-dimensional liquid chromatographs further comprise a computer system operatively linked to the chromatography system, thereby making the chromatography system an automated operation.
  • the mixed bed multi-dimensional liquid chromatographs further comprise a computer system operatively linked to the mass spectrometer for quantifying the amount of each peptide by use of data from the mass spectrometer.
  • the mixed bed multi-dimensional liquid chromatographs further comprise a computer system operatively linked to the mass spectrometer for generating the sequence of each peptide by use of data from the mass spectrometer.
  • the invention provides methods for separating proteins comprising the following steps: (a) providing a sample comprising a polypeptide; (b) fragmenting the polypeptide into peptide fragments; and (c) separating the peptides by chromatography to generate an eluate using a chromatography system of the invention or a mixed bed multi-dimensional liquid chromatograph of the invention.
  • the peptide fragments are loaded into the reverse phase resin of the first bed of the mixed bed multi-dimensional liquid chromatograph or the first reverse phase column (RPC) of the chromatography system.
  • the peptide fragments are eluted through the distal end of the reverse phase resin of the first bed and/or the reverse phase resin of the third bed of the mixed bed multi-dimensional liquid chromatograph, or the peptide fragments are eluted through the distal end of the first or the second RP column of the chromatography system. In one aspect, the peptide fragments are eluted through the same end from which they were loaded.
  • the peptide fragments can be generated by enzymatic digestion or by non-enzymatic fragmentation. The enzymatic digestion can be by trypsin, endoproteinase or a combination thereof.
  • the peptide fragments are loaded into the reverse phase resin of the first bed of the mixed bed multi-dimensional liquid chromatograph or the first reverse phase column (RPC) of the chromatography system without desalting or removing the detergent, or both.
  • the peptide fragments can be solubilized in a detergent or a denaturing agent before loading into the reverse phase resin of the first bed of the mixed bed multi-dimensional liquid chromatograph or the first reverse phase column (RPC) of the chromatography system, and, in one aspect, loaded without having to remove the detergent.
  • the system is used to analyze membrane proteins, or other hydrophobic proteins or compounds (e.g., organic compounds, e.g., steroids, fats, lipopolysaccharides) by loading samples without removing detergents.
  • the detergent or denaturing agent is SDS or urea.
  • the multi-dimensional chromatographs of the invention are detergent tolerant, and thus are excellent for membrane proteins or any protein or compound needing detergent to be solubilized.
  • the peptide fragments are loaded into reverse phase resin of the first bed of the mixed bed multi-dimensional liquid chromatograph or the first reverse phase column (RPC) of the chromatography system using a pressure bomb.
  • the method can further comprise feeding the eluate into a mass spectrometer and quantifying the amount of each peptide.
  • the method can further comprise feeding the eluate into a mass spectrometer and generating the sequence of each peptide by use of the mass spectrometer.
  • the method can further comprise inputting the sequence into a computer program product to compare the inputted sequence to a database of polypeptide sequences to identify the polypeptide from which a sequenced peptide originated.
  • the separating of step (c) comprises (i) loading a labeled peptide mixture into the first reverse phase column (RPC) of the chromatography system or the reverse phase resin of the first bed of the mixed bed multi-dimensional liquid chromatograph, wherein the first RPC or first reverse phase resin bed absorbs a plurality of peptides; (ii) eluting a fraction of the first RPC-absorbed or first resin bed-absorbed plurality of peptides to the ion exchange column (e.g., a cation (CX) or anion exchange column) of the chromatography system or the ion exchange (CX) resin bed of the mixed bed multidimensional liquid chromatograph, using a reverse phase gradient; (iii) eluting a fraction of the ion exchange column-absorbed or CX resin bed-absorbed plurality of peptides onto the second reverse phase column (RPC) of the chromatography system or the reverse phase resin of the third bed of the mixed bed multi-dimensional
  • the plurality of peptides eluted in step (iv) can be eluted through the distal end of the second reverse phase column (RPC) of the cliromatography system or the distal end of the reverse phase resin of the third bed of the mixed bed multi-dimensional liquid chromatograph.
  • RPC reverse phase column
  • the plurality of peptides eluted in step (iv) is eluted back through the proximal end of the second RPC of the chromatography system or the reverse phase resin of the third bed of the mixed bed multi-dimensional liquid chromatograph, through the ion exchange column (e.g., a cation (CX) or anion exchange column) of the chromatography system or CX resin bed of the mixed bed multi-dimensional liquid chromatograph, and back through the proximal end of the first RPC of the chromatography system or the reverse phase resin of the first bed of the mixed bed multi-dimensional liquid chromatograph, and the eluate passes through the distal end of the first RPC or the first reverse phase resin bed.
  • the ion exchange column e.g., a cation (CX) or anion exchange column
  • step (iv) the fraction of the second RPC-absorbed or third resin bed-absorbed plurality of peptides are eluted using the same reverse phase gradient used to elute the first RPC-absorbed or first resin bed-absorbed fraction of peptides in step (ii).
  • the method further comprises: after step (iii) is completed and before the step (iv) eluting a fraction of the second RPC-absorbed or second reverse phase resin bed-absorbed plurality of peptides is begun, washing the column free of the salts and buffers used to elute a fraction of the ion exchange column-absorbed or CX resin bed-absorbed plurality of peptides.
  • a discrete fraction of the first RPC-absorbed or first resin bed-absorbed plurality of peptides is eluted to the ion exchange column (e.g., a cation (CX) or anion exchange column) of the chromatography system or the ion exchange (CX) resin bed of the mixed bed multi-dimensional liquid chromatograph from using a reverse phase gradient.
  • the reverse phase gradient comprises (X n -X n+ ⁇ %B) over 120 minutes with a flow rate of 250 nl/min
  • the salt gradient steps comprise 12 salt gradient steps comprising 25 mM, 50 mM, 75 mM, 100 mM, 125 mM, 150 mM, 175 mM, 200 mM, 225mM, 250mM, and 2M ammonium acetate, or equivalent.
  • the method further comprises labeling the peptide fragments before loading them into the chromatography system or the mixed bed multi-dimensional liquid chromatograph.
  • the sample can be derived from a cell, a seed or a spore.
  • the cell can be a prokaryotic cell or a eukaryotic cell.
  • the cell, seed or spore can be derived from a bacteria, a yeast, an insect, a plant, a fungus, a protozoa or a mammal.
  • the mammalian cell can be a human cell or a mouse cell.
  • the bacterial cell or spore can be a Bacillus anthracis.
  • the invention provides methods for separating and detecting proteins by differential labeling of peptides.
  • the method comprises the following steps: (a) providing at least two samples comprising a polypeptide; (b) providing at least two sets of labeling reagents (e.g., at least one pair of labeling reagents), wherein each set of labeling reagent differs in molecular mass from the other sets (e.g., wherein each member of a pair differs in molecular mass from the second member of a pair) and the differences in molecular mass are distinguishable by mass spectrographic analysis; (c) fragmenting the polypeptides into peptide fragments by enzymatic digestion or by non-enzymatic fragmentation; (d) contacting the labeling reagents of step (b) with the peptide fragments of step (c), wherein each sample is labeled with a different labeling reagent, thereby differentially labeling the peptides; (e) separating the labeled peptides by chromatography to generate an eluate using a chromatography system of the invention or
  • the method further comprises a step (f) comprising feeding the eluate of step (e) into a mass spectrometer and quantifying the amount of each peptide and generating the sequence of each peptide by use of the mass spectrometer.
  • the method further comprises providing two or more samples from different sources.
  • one sample is derived from a wild type cell and one sample is derived from an abnormal or a modified cell.
  • the abnormal cell can be a cancer cell.
  • the peptide fragments can be labeled with a reagent comprising a general formula selected from the group consisting of: Z A OH for labeling at least a first sample and Z B OH for labeling at least a second sample, to esterify peptide C-terminals and/or Glu and Asp side chains; Z A NH 2 for labeling at least a first sample and Z B NH 2 for labeling at least a second sample, to form amide bond with peptide C-terminals and/or Glu and Asp side chains; and Z A CO 2 H for labeling at least a first sample and Z B CO 2 H ⁇ for labeling at least a second sample to form amide bond with peptide N-terminals and/or Lys and Arg side chains; wherein Z A and Z B independently of one another comprise the general formula R-Z -A -Z -A - Z 3 -A 3 -Z 4 -A 4 - , Z 1 , Z 2 , Z 3 , and Z 4 independently of
  • the alkyl group is selected from the group consisting of an alkenyl, an alkynyl and an aryl group.
  • one or more C- C bonds from (CRR 1 ), are replaced with a double or a triple bond.
  • an R and/or an R 1 group are absent.
  • (CRR') n is selected from the group consisting of an o-arylene, an w-arylene and a »-arylene, wherein each group has none or up to 6 substituents.
  • (CRR 1 ), is selected from the group consisting of a carbocyclic, a bicyclic and a tricyclic fragment, wherein the fragment has up to 8 atoms in the cycle with or without a heteroatom selected from the group consisting of an O atom, a N atom and an S atom.
  • two or more labeling reagents have the same structure but a different isotope composition.
  • Z A can have the same structure as Z B , but Z A has a different isotope composition than Z B .
  • the isotope can be boron-10 and boron-11, carbon-12 and carbon-13, nitrogen-14 and nitrogen-15, sulfur-32 and or sulfur-34.
  • the isotope with the lower mass can be x and the isotope with the higher mass is y, and x and y are integers, x is greater than y. In one aspect, x and y are between 1 and about 11, between 1 and about 21, between 1 and about 31, between 1 and about 41, or between 1 and about 51.
  • the labeling reagent of step (b) comprises the general formulae selected from the group consisting of: i. Z A OH for labeling at least a first sample and Z B OH for labeling at least a second sample to esterify peptide C-terminals; ii. Z A NH 2 for labeling at least a first sample and Z B NH 2 for labeling at least a second sample to form an amide bond with peptide C-terminals; and iii.
  • a single C-C bond in a (CRR 1 ),, group is replaced with a double or a triple bond.
  • R and R 1 are absent.
  • (CRR 1 ) ! comprises a moiety selected from the group consisting of an o- arylene, an /n-arylene and a p-arylene, wherein the group has none or up to 6 substituents.
  • the (CRR 1 ),, group comprises a carbocyclic, a bicyclic, or a tricyclic fragments with up to 8 atoms in the cycle, with or without a heteroatom selected from the group consisting of an O atom, an N atom and an S atom.
  • R, R 1 independently from other R and R 1 in Z 1 - Z 4 and independently from other R and R 1 in A 1 - A 4 , are selected from the group consisting of a hydrogen atom, a halogen and an alkyl group.
  • alkyl group is selected from the group consisting of an alkenyl, an alkynyl and an aryl group.
  • n in Z - Z is independent of n in A - A and is an integer selected from the group consisting of about 51; about 41; about 31; about 21, about 11 and about 6.
  • Z A has the same structure a Z B but Z A further comprises x number of -CH 2 - fragment(s) in one or more A 1 - A 4 fragments, wherein x is an integer. In one aspect, Z A has the same structure a Z B but Z A further comprises x number of -CF 2 - fragment(s) in one or more A 1 - A 4 fragments, wherein x is an integer. In one aspect, Z A comprises x number of protons and Z B comprises y number of halogens in the place of protons, wherein x and y are integers.
  • Z A contains x number of protons and Z B contains y number of halogens, and there are x - y number of protons remaining in one or more A 1 - A 4 fragments, wherein x and y are integers.
  • Z A further comprises x number of -O- fragment(s) in one or more A 1 - A fragments, wherein x is an integer.
  • Z A further comprises x number of -S- fragment(s) in one or more A 1 - A 4 fragments, wherein x is an integer.
  • Z A further comprises x number of -O- fragment(s) and Z further compnses y number of -S- fragment(s) in the place of -O- fragment(s), wherein and y are integers.
  • Z A further comprises x -y number of -O- fragment(s) in one or more A - A fragments, wherein x and y are integers.
  • x and y are integers independently selected from the group consisting of between 1 about 51; between 1 about 41; between 1 about 31; between 1 about 21, between 1 about 11 and between 1 about 6, wherein x is greater than y.
  • the labeling reagent pair used in the method is N, N, dimethyl-iodoacetamide and N, N, d6-dimethyl-iodoacetamide, having the structures: 0 ,CH 3 N CH 3 ⁇ C*D 3 1 ⁇ /, ⁇ /-dimethyliodoacetamide ⁇ /,/V-dimethyl-c/6-iodoacetamide
  • the invention provides methods for separating and detecting a hydrophobic protein (e.g., membrane protein) or a hydrophobic compound, the method comprising the following steps: (a) providing a sample comprising the hydrophobic protein (e.g., membrane protein) or the hydrophobic compound; (b) solubilizing the hydrophobic protein (e.g., membrane protein) or the hydrophobic compound in a detergent or urea; (c) loading the detergent or urea solubilized hydrophobic protein or hydrophobic compound into a chromatography system of the invention or
  • the hydrophobic protein is a membrane protein such as an integral membrane protein, e.g., a protein expressed on the surface of a pathogenic cell or a cancer cell.
  • the hydrophobic compound can be a lipid or a steroid.
  • the invention provides computer program products comprising a computer useable medium having computer program logic recorded thereon for analyzing data generated by a chromatography system, said computer program logic comprising computer program code logic configured to perform operations as set forth in Figure 17, Figure 18, Figure 19, Figure 20 or Figure 21.
  • the invention provides computer program products wherein the chromatography system comprises a system of the invention or a mixed bed multi-dimensional liquid chromatograph of the invention.
  • the invention provides computer-implemented methods for analyzing data generated by a chromatography system comprising the following steps: providing a chromatography system capable of outputting data to a computer; providing a computer capable of storing and analyzing data input from the chromatography system comprising a computer program product embodied therein, wherein the computer program product comprises a computer program product of the invention; and, inputting the data from the chromatography system into the computer and analyzing data input from the chromatography system.
  • the chromatography system comprises a system of the invention or a mixed bed multi- dimensional liquid chromatograph of the invention.
  • an exemplary computer-implemented method comprises an LC-MS data file operatively linked to a component extraction file, operatively linked to a precursor integration and series reconstruction files, operatively linked to a progression file, as schematically illustrated in Example 17.
  • the component extraction aspect of the computer-implemented method is schematically illustrated in Figure 18.
  • the invention provides quantitative proteomics systems comprising a chromatography system comprising a system of the invention or a mixed bed multidimensional liquid chromatograph of the invention, wherein the system is capable of outputting data to a processor; a processor; and a computer program product of the invention embodied within the processor.
  • the invention provides methods for fractionating a proteome of a cell comprising (a) providing a chromatography system comprising a system as set forth in claim 1 or a mixed bed multi-dimensional liquid chromatograph of claim 25; (b) providing a proteome preparation; and (c) fractionating the proteome preparation with the chromatography system, wherein 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41 %, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51 %, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%
  • the invention also provides methods of the invention comprising use of a computer-implemented method for analyzing data generated by a chromatography system comprising the following steps: (a) providing a chromatography system capable of outputting data to a computer; (b) providing a computer capable of storing and analyzing data input from the chromatography system comprising a computer program product embodied therein, wherein the computer program product comprises a computer program product of the invention; (c) inputting the data from the chromatography system into the computer and analyzing data input from the chromatography system.
  • the invention provides quantitative proteomics systems comprising: (a) a chromatography system of the invention or a mixed bed multi-dimensional liquid chromatograph of the invention, and a mass spectrometer, wherein the system is capable of outputting data to a processor; (b) a processor; and (c) a computer program product (e.g., a computer program product of the invention) embodied within the processor.
  • the mass spectrometer comprises an ion trap mass spectrometer, such as a Finnigan LCQ Deca XP MAXTM, a Finnigan MDLC LTQTM or a Finnigan LTQ FTTM. 226
  • FIG. 1 illustrates an exemplary process of the invention wherein samples are combined, separated by multidimensional chromatography, and analyzed by mass spectrometry methods, as described in detail, below.
  • Figure 2 is an illustration of a MALDI MS spectrum of a peptide pairs, as described in detail, below.
  • Figure 3 illustrates an exemplary 3D LC set-up and process, as described in detail, below.
  • Figure 4 illustrates an exemplary multi-dimensional chromatography apparatus of the invention, as described in detail in Example 3, below.
  • Figure 5 graphically depicts the statistics of an exemplary mixed resin chromatography analysis and protein identification (Figure 5A graphically depicts the # MS/ MS spectra; Figure 5B graphically depicts the annotated spectra (%); Figure 5C graphically depicts the # protein ID), as described in detail in Example 3, below.
  • Figure 6 gives a three-dimensional view of proteins identified using the exemplary apparatus and methods of the invention), as described in detail in Example 3, below.
  • Figure 6A shows an overlay of the predicted (and also observed) membrane proteins (solid circles) over the total population (open circles).
  • Figures 6B, 6C, and 6D Certain functional classes are depicted by the overlays in Figures 6B, 6C, and 6D, illustrating the class of proteins belonging to "protein synthesis", “glycolysis” and “protein glycosylation", respectively, as described in detail in Example 3, below.
  • Figure 7 illustrates the sequence of pyruvate decarboxylase set forth in SEQ ID NO: 1 as generated using an exemplary chromatography system and method of the invention, as described in detail in Example 3, below.
  • Figure 8 illustrates an exemplary method of the invention, as described in detail in Examples 3 and 4, below.
  • Figure 9 illustrates an exemplary sample preparation protocol of the invention, see Example 4, below.
  • Figure 10 illustrates the results of salt extraction subfractions in a reverse phase sub-fraction for analysis of the B.
  • Figure 11 illustrates the results of an analysis of a B. anthracis proteome using a chromatography system of the invention, as described in Example 4, below.
  • Figure 12 summarizes a "matrix" of protein distribution from different B. anthracis samples, as described in Example 4, below.
  • Figure 13 summarizes the discovered protein distribution by "role” category.
  • Figure 14 illustrates an exemplary multi-dimensional chromatography apparatus of the invention, as described in detail in Example 3, below.
  • Figure 15 describes the metabolic pathways identified in the yeast proteome using an exemplary multi-dimensional chromatography apparatus and methods of the invention, as described in detail in Example 3, below.
  • Figure 16 illustrates proteins (highlighted in blue) from the glycolysis pathway identified using this system.
  • Figure 17 is a schematic, a flow chart, illustrating an exemplary data analysis algorithm of the invention for quantitative proteomics.
  • Figure 18 is a schematic, a flow chart, illustrating the "component extraction” section of the exemplary data analysis algorithm for quantitative proteomics illustrated in Figure 17.
  • Figure 19 is a schematic, a flow chart, illustrating the "precursor integration" section of the exemplary data analysis algorithm for quantitative proteomics illustrated in Figure 17.
  • Figure 20 is a schematic, a flow chart, illustrating the "spectra comparison" section of the exemplary data analysis algorithm for quantitative proteomics as illustrated in Figure 19.
  • Figure 21 is a schematic, a flow chart, illustrating the "identity and merge of duplicates LC-MS spectra" section of the exemplary data analysis algorithm for quantitative proteomics as illustrated in Figure 19.
  • Figure 22 illustrates an exemplary automated chromatograph system of the invention.
  • Figure 23 illustrates the results of an MS/MS of the separated peptides in a proteome analysis, as discussed in Example 3, below.
  • Figure 24 schematically illustrates the design of an oxidative stress experiment, as discussed in Example 5, below.
  • Figure 25 schematically illustrates the design of a sample preparation protocol used in oxidative stress experiments, as discussed in Example 5, below.
  • Figure 26 graphically illustrates data representing the number of protein identifications 3D LC-MS/MS analyses in oxidative stress experiments, as discussed in Example 5, below.
  • Figure 27 summarizes data representing differences in the number of proteins identified in non-stressed and stressed cell samples in oxidative stress experiments, as discussed in Example 5, below.
  • Figure 28 summarizes data representing a down-regulation in superoxide reductase ("Sor") protein levels after oxidative stress of Desulfovibrio vulgaris cells, as discussed in Example 5, below.
  • Figure 29 illustrates that after oxidative stress oi Desulfovibrio vulgaris cells a concerted down-regulation of proteins along the polyglucose utilization pathway (schematically illustrated) was found, as discussed in Example 5, below.
  • Figure 30 summarizes the results of proteome analysis from different organisms using an exemplary 3D LC LCQ MS/MS system of the invention, as discussed in Example 6, below.
  • Figure 31 summarizes the results of proteome analysis comparing two exemplary 3D LC LCQ MS/MS systems of the invention: 3D LC LCQ MS/MS versus 3D LC LTQ MS/MS, as discussed in Example 6, below.
  • Figure 32 illustrates the results of an LTQ and LCQ MS/MS Human Embryonic Kidney HEK293 proteome analysis, as discussed in Example 7, below.
  • Like reference symbols in the various drawings indicate like elements.
  • the invention provides a number of strategies for comparing a polynucleotide of known sequence (a reference sequence) with variants of that sequence (target sequences).
  • the comparison can be performed at the level of entire genomes, chromosomes, genes, exons or introns, or can focus on individual mutant sites and immediately adjacent bases.
  • the strategies allow detection of variations, such as mutations or polymo ⁇ hisms, in the target sequence irrespective whether a particular variant has previously been characterized.
  • the strategies both define the nature of a variant and identify its location in a target sequence.
  • the strategies employ arrays of oligonucleotide probes immobilized to a solid support.
  • Target sequences are analyzed by determining the extent of hybridization at particular probes in the array.
  • the strategy in selection of probes facilitates distinction between perfectly matched probes and probes showing single- base or other degrees of mismatches.
  • the strategy usually entails sampling each nucleotide of interest in a target sequence several times, thereby achieving a high degree of confidence in its identity. This level of confidence is further increased by sampling of adjacent nucleotides in the target sequence to nucleotides of interest.
  • the number of probes on the chip can be quite large (e.g., 10 5 -10 6 ). However, usually only a small proportion of the total number of probes of a given length are represented.
  • Some advantage of the use of only a small proportion of all possible probes of a given length include: (i) each position in the array is highly informative, whether or not hybridization occurs; (ii) nonspecific hybridization is minimized; (iii) it is straightforward to correlate hybridization differences with sequence differences, particularly with reference to the hybridization pattern of a known standard; and (iv) the ability to address each probe independently during synthesis, using high resolution photolithography, allows the array to be designed and optimized for any sequence. For example the length of any probe can be varied independently of the others.
  • the present tiling strategies result in sequencing and comparison methods suitable for routine large-scale practice with a high degree of confidence in the sequence output.
  • the chips can be designed to contain probes exhibiting complementarity to one or more selected reference sequence whose sequence is known.
  • the chips are used to read a target sequence comprising either the reference sequence itself or variants of that sequence.
  • Target sequences may differ from the reference sequence at one or more positions but show a high overall degree of sequence identity with the reference sequence (e.g., at least 75, 90, 95, 99, 99.9 or 99- 99%).
  • Any polynucleotide of known sequence can be selected as a reference sequence.
  • Reference sequences of interest include sequences known to include mutations or polymo ⁇ hisms associated with phenotypic changes having clinical significance in human patients.
  • the CFTR gene and P53 gene in humans have been identified as the location of several mutations resulting in cystic fibrosis or cancer respectively.
  • Other reference sequences of interest include those that serve to identify pathogenic microorganisms and/or are the site of mutations by which such microorganisms acquire drug resistance (e.g., the HIV reverse transcriptase gene).
  • Other reference sequences of interest include regions where polymo ⁇ hic variations are known to occur (e.g., the D-loop region of mitochondrial DNA). These reference sequences have utility for, e.g., forensic or epidemiological studies.
  • Reference sequences of interest include p34 (related to p53), p65 (implicated in breast, prostate and liver cancer), and DNA segments encoding cytochromes P450 (see Meyer et al., Pharmac. Ther. 46, 349-355 (1990)).
  • Reference sequences of interest include those from the genome of pathogenic viruses (e.g., hepatitis J, B, or Q, he ⁇ es virus (e.g., VZV, HSV-1, HAV-6, HSV-II, and CMV, Epstein Barr virus), adenovirus, influenza virus, flaviviruses, echovirus, rhinovirus, coxsackie virus, cornovirus, respiratory syncytial virus, mumps virus, rotavirus, measles virus, rubella virus, parvovirus, vaccinia virus, HTLV virus, dengue virus, papillomavirus, molluscum virus, poliovirus, rabies virus, JC virus and arboviral encephalitis virus.
  • pathogenic viruses e.g., hepatitis J, B, or Q, he ⁇ es virus (e.g., VZV, HSV-1, HAV-6, HSV-II, and CMV, Epstein Barr virus)
  • Other reference sequences of interest are from genomes or episomes of pathogenic bacteria, particularly regions that confer drug resistance or allow phylogenic characterization of the host (e.g., 16S rRNA or corresponding DNA).
  • pathogenic bacteria include Chlamydia, rickettsial bacteria, mycobacteria, staphylococci, streptococci, pneumonococci, meningococci and conococci, klebsiella, proteus, serratia, pseudomonas, legionella, diphtheria, salmonella, bacilli, cholera, tetanus, botulism, anthrax, plague, leptospirosis, and Lymes disease bacteria.
  • reference sequences of interest include those in which mutations result in the following autosomal recessive disorders: sickle cell anemia, beta-thalassemia, phenylketonuria, galactosemia, Wilson's disease, hemochromatosis, severe combined immunodeficiency, alpha-1-antitrypsin deficiency, albinism, alkaptonuria, lysosomal storage diseases and Ehlers-Danlos syndrome.
  • Reference sequences of interest include those in which mutations result in X-linked recessive disorders: hemophilia, glucose-6-phosphate dehydrogenase, agammaglobulemia, diabetes insipidus, Lesch- Nyhan syndrome, muscular dystrophy, Wiskott-Aldrich syndrome, Fabry's disease and fragile X- syndrome.
  • Reference sequences of interest includes those in which mutations result in the following autosomal dominant disorders: familial hypercholesterolemia, polycystic kidney disease, Huntingdon's disease, hereditary spherocytosis, Marian's syndrome, von Willebrand's disease, neurof ⁇ bromatosis, tuberous sclerosis, hereditary hemorrhagic telangiectasia, familial colonic polyposis, Ehlers-Danlos syndrome, myotomc dystrophy, muscular dystrophy, osteogenesis imperfecta, acute intermittent po ⁇ hyria, and von Hippel- Lindau disease.
  • the length of a reference sequence can vary widely from a full-length genome, to an individual chromosome, episome, gene, component of a gene, such as an exon, intron or regulatory sequences, to a few nucleotides.
  • a reference sequence of between about 2, 5, 10, 20, 50, 100, 5000, 1000, 5,000 or 10,000, 20,000 or 100,000 nucleotides is common.
  • regions of a sequence e.g., exons of a gene
  • the particular regions can be considered as separate reference sequences or can be considered as components of a single reference sequence, as matter of arbitrary choice.
  • a reference sequence can be any naturally occurring, mutant, consensus or purely hypothetical sequence of nucleotides, RNA or DNA.
  • sequences can be obtained from computer data bases, publications or can be determined or conceived de novo.
  • a reference sequence is selected to show a high degree of sequence identity to envisaged target sequences.
  • more than one reference sequence is selected.
  • Combinations of wildtype and mutant reference sequences are employed in several applications of the tiling strategy.
  • the basic tiling strategy provides an array of immobilized probes for analysis of target sequences showing a high degree of sequence identity to one or more selected reference sequences.
  • the strategy is first illustrated for an exemplary array that is subdivided into four probe sets, although it will be apparent that in some situations, satisfactory results are obtained from only two probe sets.
  • a first probe set comprises a plurality of probes exhibiting perfect complementarity with a selected reference sequence. The perfect complementarity usually exists throughout the length of the probe. However, probes having a segment or segments of perfect complementarity that is/are flanked by leading or trailing sequences lacking complementarity to the reference sequence can also be used.
  • each probe in the first probe set has at least one interrogation position that corresponds to a nucleotide in the reference sequence. That is, the inte ⁇ ogation position is aligned with the corresponding nucleotide in the reference sequence, when the probe and reference sequence are aligned to maximize complementarity between the two. If a probe has more than one interrogation position, each corresponds with a respective nucleotide in the reference sequence. The identity of an interrogation position and corresponding nucleotide in a particular probe in the first probe set cannot be determined simply by inspection of the probe in the first set.
  • an interrogation position and corresponding nucleotide is defined by the comparative structures of probes in the first probe set and corresponding probes from additional probe sets.
  • a probe can have an inte ⁇ ogation position at each position in the segment complementary to the reference sequence.
  • An inte ⁇ ogation position can be located away from the ends of a segment of complementarity. Inte ⁇ ogation positions may provide more accurate data when located away from the ends of a segment of complementarity.
  • a probe can have a segment of complementarity of length x does not contain more than x-2 inte ⁇ ogation positions. Since probes are typically 9-21 nucleotides, and usually all of a probe is complementary, a probe typically has 1-19 interrogation positions.
  • the probes can contain a single inte ⁇ ogation position, at or near the center of probe.
  • For each probe in the first set there can be three conesponding probes from three additional probe sets. Thus, there can be four probes conesponding to each nucleotide of interest in the reference sequence. Each of the four conesponding probes has an inte ⁇ ogation position aligned with that nucleotide of interest.
  • the probes from the three additional probe sets can be identical to the conesponding probe from the first probe set with one exception.
  • the conesponding probe from the first probe set has its inte ⁇ ogation position occupied by a T
  • the conesponding probes from the additional three probe sets have their respective inte ⁇ ogation positions occupied by A, C, or G, a different nucleotide in each probe.
  • a probe from the first probe set comprises trailing or flanking sequences lacking complementarity to the reference sequences, these sequences need not be present in conesponding probes from the three additional sets.
  • conesponding probes from the three additional sets can contain leading or trailing sequences outside the segment of complementarity that are not present in the conesponding probe from the first probe set.
  • the probes from the additional three probe set are identical (with the exception of intenogation position(s)) to a contiguous subsequence of the full complementary segment of the conesponding probe from the first probe set.
  • the subsequence includes the inte ⁇ ogation position and usually differs from the full- length probe only in the omission of one or both terminal nucleotides from the termini of a segment of complementarity. That is, if a probe from the first probe set has a segment of complementarity of length n, conesponding probes from the other sets will usually include a subsequence of the segment of at least length n-2.
  • the subsequence is usually at least 3, 4, 7, 9, 15, 21, or 25 nucleotides long, most typically, in the range of 9-21 nucleotides.
  • the subsequence should be sufficiently long to allow a probe to hybridize detectably more strongly to a variant of the reference sequence mutated at the inte ⁇ ogation position than to the reference sequence.
  • the probes can be oligodeoxyribonucleotides or oligoribonucleotides, or any modified forms of these polymers that are capable of hybridizing with a target nucleic sequence by complementary base-pairing.
  • Complementary base pairing means sequence-specific base pairing which includes e.g., Watson-Crick base pairing as well as other forms of base pairing such as Hoogsteen base pairing.
  • Modified forms include 2'-0-methyl oligoribonucleotides and so-called PNAs, in which oligodeoxyribonucleotides are linked via peptide bonds rather than phophodiester bonds.
  • the probes can be attached by any linkage to a support (e.g., 3', 5' or via the base). 3' attachment is more usual as this orientation is compatible with a chemistry for solid phase synthesis of oligonucleotides.
  • the number of probes in the first probe set depends on the length of the reference sequence, the number of nucleotides of interest in the reference sequence and the number of intenogation positions per probe.
  • each nucleotide of interest in the reference sequence requires the same intenogation position in the four sets of probes.
  • a reference sequence can have 100 nucleotides, 50 of which are of interest, and probes each having a single intenogation position.
  • the first probe set requires fifty probes, each having one intenogation position conesponding to a nucleotide of interest in the reference sequence.
  • the second, third and fourth probe sets each have a conesponding probe for each probe in the first probe set, and so each also contains a total of fifty probes.
  • each nucleotide of interest in the reference sequence is determined by comparing the relative hybridization signals at four probes having inte ⁇ ogation positions conesponding to that nucleotide from the four probe sets.
  • every nucleotide is of interest.
  • only certain portions in which variants (e.g., mutations or polymo ⁇ hisms) are concentrated are of interest.
  • only particular mutations or polymo ⁇ hisms and immediately adjacent nucleotides are of interest.
  • the first probe set has intenogation positions selected to conespond to at least a nucleotide (e.g., representing a point mutation) and one immediately adjacent nucleotide.
  • the probes in the first set have inte ⁇ ogation positions conesponding to at least 3, 10, 50, 100, 1000, or 20,000 contiguous nucleotides.
  • the probes usually have intenogation positions conesponding to at least 5, 10, 30, 50, 75, 90, 99 or sometimes 100%, of the nucleotides in a reference sequence.
  • the probes in the first probe set can completely span the reference sequence and overlap with one another relative to the reference sequence. For example, in one common anangement each probe in the first probe set differs from another probe in that set by the omission of a 3' base complementary to the reference sequence and the acquisition of a 5' base complementary to the reference sequence.
  • the probes in a set can be a ⁇ anged in order of the sequence in a lane across the chip.
  • a lane contains a series of overlapping probes, which represent or tile across, the selected reference sequence.
  • the components of the four sets of probes are usually laid down in four parallel lanes, collectively constituting a row in the horizontal direction and a series of 4-member columns in the vertical direction. Conesponding probes from the four probe sets (i.e., complementary to the same subsequence of the reference sequence) occupy a column.
  • Each probe in a lane usually differs from its predecessor in the lane by the omission of a base at one end and the inclusion of additional base at the other end.
  • probes sets can be laid down in lanes such that all probes having an intenogation position occupied by an A form an- A-lane, all probes having an intenogation position occupied by a C fonn a C-lane, all probes having an intenogation position occupied by a G form a G-lane, and all probes having an inte ⁇ ogation position occupied by a T (or U) form a T lane (or a U lane).
  • the probe from the first probe set is laid down in the A-lane, C-lane, A-lane, A- lane and T-lane for the five columns.
  • the intenogation position on a column of probes conesponds to the position in the target sequence whose identity is determined from analysis of hybridization to the probes in that column.
  • the inte ⁇ ogation position can be anywhere in a probe but is usually at or near the central position of the probe to maximize differential hybridization signals between a perfect match and a single-base mismatch. For example, for an 11 mer probe, the central position is the sixth nucleotide.
  • the anay of probes is usually laid down in rows and columns as described above, such a physical anangement of probes on the chip is not essential.
  • the data from the probes can be collected and processed to yield the sequence of a target inespective of the physical anangement of the probes on a chip.
  • the hybridization signals from the respective probes can be reassorted into any conceptual anay desired for subsequent data reduction whatever the physical anangement of probes on the chip.
  • a range of lengths of probes can be employed in the chips.
  • a probe may consist exclusively of a complementary segments, or may have one or more complementary segments juxtaposed by flanking, trailing and/or intervening segments.
  • the total length of complementary segment(s) is more important than the length of the probe.
  • the complementarity segment(s) of the first probe sets should be sufficiently long to allow the probe to hybridize detectably more strongly to a reference sequence compared with a variant of the reference including a single base mutation at the nucleotide conesponding to the inte ⁇ ogation position of the probe.
  • the complementarity segment(s) in conesponding probes from additional probe sets can be sufficiently long to allow a probe to hybridize detectably more strongly to a variant of the reference sequence having a single nucleotide substitution at the intenogation position relative to the reference sequence.
  • a probe can have a single complementary segment having a length of at least 3 nucleotides, and more usually at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or bases exhibiting perfect complementarity (other than possibly at the intenogation position(s) depending on the probe set) to the reference sequence.
  • each segment provides at least three complementary nucleotides to the reference sequence and the combined segments provide at least two segments of three or a total of six complementary nucleotides.
  • the combined length of complementary segments is typically from 6-30 nucleotides, or, from about 9-21 nucleotides. The two segments are often approximately the same length.
  • the probes (or segment of complementarity within probes) have an odd number of bases, so that an inte ⁇ ogation position can occur in the exact center of the probe.
  • all probes are the same length.
  • Other chips employ different groups of probe sets, in which case the probes are of the same size within a group, but differ between different groups. For example, some chips have one group comprising four sets of probes as described above in which all the probes are 11 mers, together with a second group comprising four sets of probes in which all of the probes are 13 mers. Of course, additional groups of probes can be added. Thus, some chips contain, e.g., four groups of probes having sizes of 11 mers, 13 mers, 15 mers and 17 mers.
  • the probes in the first set can vary in length independently of each other. Probes in the other sets are usually the same length as the probe occupying the same column from the first set. However, occasionally different lengths of probes can be included at the same column position in the four lanes.
  • the different length probes are included to equalize hybridization signals from probes inespective of whether A-T or C-G bonds are formed at the intenogation position.
  • the length of probe can be important in distinguishing between a perfectly matched probe and probes showing a single- base mismatch with the target sequence. The discrimination is usually greater for short probes. Shorter probes are usually also less susceptible to formation of secondary structures.
  • the absolute amount of target sequence bound, and hence the signal is greater for larger probes.
  • the probe length representing the optimum compromise between these competing considerations may vary depending on inter alia the GC content of a particular region of the target DNA sequence, secondary structure, synthesis efficiency and cross- hybridization. In some regions of the target, depending on hybridization conditions, short probes (e.g., 1 1 mers) may provide information that is inaccessible from longer probes (e.g., 19 mers) and vice versa.
  • Maximum sequence information can be read by including several groups of different sized probes on the chip as noted above. However, for many regions of the target sequence, such a strategy provides redundant information in that the same sequence is read multiple times from the different groups of probes.
  • Equivalent information can be obtained from a single group of different sized probes in which the sizes are selected to maximize readable sequence at particular regions of the target sequence.
  • the strategy of customizing probe length within a single group of probe sets minimizes the total number of probes required to read a particular target sequence. This leaves ample capacity for the chip to include probes to other reference sequences.
  • the invention provides an optimization block which allows systematic variation of probe length and inte ⁇ ogation position to optimize the selection of probes for analyzing a particular nucleotide in a reference sequence.
  • the block comprises alternating columns of probes complementary to the wildtype target and probes complementary to a specific mutation.
  • the inte ⁇ ogation position is varied between columns and probe length is varied down a column.
  • Hybridization of the chip to the reference sequence or the mutant form of the reference sequence identifies the probe length and inte ⁇ ogation position providing the greatest differential hybridization signal.
  • the probes are designed to be complementary to either strand of the reference sequence (e.g., coding or non-coding), some chips contain separate groups of probes, one complementary to the coding strand, the other complementary to the noncoding strand. Independent analysis of coding and noncoding strands provides largely redundant information. However, the regions of ambiguity in reading the coding strand are not always the same as those in reading the noncoding strand. Thus, combination of the information from coding and noncoding strands increases the overall accuracy of sequencing. Some chips contain additional probes or groups of probes designed to be complementary to a second reference sequence.
  • the second reference sequence can often be a subsequence of the first reference sequence bearing one or more commonly occurring mutations or interstrain variations.
  • the second group of probes is designed by the same principles as described above except that the probes exhibit complementarity to the second reference sequence.
  • the inclusion of a second group is particular useful for analyzing short subsequences of the primary reference sequence in which multiple mutations are expected to occur within a short distance commensurate with the length of the probes (i.e., two or more mutations within 9 to 21 bases).
  • the same principle can be extended to provide chips containing groups of probes for any number of reference sequences.
  • the chips may contain additional probe(s) that do not form part of a tiled anay as noted above, but rather serves as probe(s) for a conventional reverse dot blot.
  • the presence of mutation can be detected from binding of a target sequence to a single oligomeric probe harboring the mutation.
  • An additional probe containing the equivalent region of the wildtype sequence can be included as a control.
  • the chips can be read by comparing the intensities of labeled target bound to the probes in an anay. In one aspect, a comparison is performed between each lane of probes (e.g., A, C, G and T lanes) at each columnar position (physical or conceptual).
  • the lane showing the greatest hybridization signal is called as the nucleotide present at the position in the target sequence conesponding to the inte ⁇ ogation position in the probes.
  • the conesponding position in the target sequence is that aligned with the intenogation position in conesponding probes when the probes and target are aligned to maximize complementarity.
  • the four probes in a column only one can exhibit a perfect match to the target sequence whereas the others usually exhibit at least a one base pair mismatch.
  • the probe exhibiting a perfect match usually produces a substantially greater hybridization signal than the other three probes in the column and is thereby easily identified.
  • a call ratio is established to define the ratio of signal from the best hybridizing probes to the second best hybridizing probe that must be exceeded for a particular target position to be read from the probes.
  • a high call ratio ensures that few if any e ⁇ ors are made in calling target nucleotides, but can result in some nucleotides being scored as ambiguous, which could in fact be accurately read.
  • a lower call ratio can result in fewer ambiguous calls, but can result in more enoneous calls. It has been found that at a call ratio of 1.2 virtually all calls are accurate.
  • a small but significant number of bases may have to be scored as ambiguous.
  • small regions of the target sequence can sometimes be ambiguous, these regions usually occur at the same or similar segments in different target sequences.
  • An anay of probes is most useful for analyzing the reference sequence from which the probes were designed and variants of that sequence exhibiting substantial sequence similarity with the reference sequence (e.g., several single- base mutants spaced over the reference sequence).
  • an anay When an anay is used to analyze the exact reference sequence from which it was designed, one probe exhibits a perfect match to the reference sequence, and the other three probes in the same column exhibits single-base mismatches. Thus, discrimination between hybridization signals is usually high and accurate sequence is obtained. High accuracy is also obtained when an anay is used for analyzing a target sequence comprising a variant of the reference sequence that has a single mutation relative to the reference sequence, or several widely spaced mutations relative to the reference sequence. At different mutant loci, one probe exhibits a perfect match to the target, and the other three probes occupying the same column exhibit single-base mismatches, the difference (with respect to analysis of the reference sequence) being the lane in which the perfect match occurs.
  • a single group of probes i.e., designed with respect to a single reference sequence
  • Such a comparison does not always allow the target nucleotide conesponding to that columnar position to be called.
  • Deletions in target sequences can be detected by loss of signal from probes having intenogation positions encompassed by the deletion.
  • signal may also be lost from probes having intenogation positions closely proximal to the , deletion resulting in some regions of the target sequence that cannot be read.
  • Target sequence bearing insertions will also exhibit short regions including and proximal to the insertion that usually cannot be read.
  • the presence of short regions of difficult-to-read target because of closely spaced mutations, insertions or deletion, does not prevent determination of the remaining sequence of the target as different regions of a target sequence are determined independently.
  • such ambiguities as might result from analysis of diverse variants with a single group of probes can be avoided by including multiple groups of probe sets on a chip.
  • one group of probes can be designed based on a full-length reference sequence, and the other groups on subsequences of the reference sequence inco ⁇ orating frequently occurring mutations or strain variations.
  • the sequencing strategy of the invention has the capacity to simultaneously detect and quantify proportions of multiple target sequences. Such capacity is valuable, e.g., for diagnosis of patients who are heterozygous with respect to a gene or who are infected with a virus, such as HIV, which is usually present in several polymo ⁇ hic forms. Such capacity is also useful in analyzing targets from biopsies of tumor cells and sunounding tissues.
  • the presence of multiple target sequences is detected from the relative signals of the four probes at the anay columns conesponding to the target nucleotides at which diversity occurs.
  • the relative signals at the four probes for the mixture under test are compared with the conesponding signals from a homogeneous reference sequence.
  • the extent in shift in hybridization signals of the probes is related to the proportion of a target sequence in the mixture.
  • Shifts in relative hybridization signals can be quantitatively related to proportions of reference and mutant sequence by prior calibration of the chip with seeded mixtures of the mutant and reference sequences.
  • a chip can be used to detect variant or mutant strains constituting as little as 1, 5, 20, or 25 % of a mixture of stains.
  • Similar principles allow the simultaneous analysis of multiple target sequences even when none is identical to the reference sequence. For example, with a mixture of two target sequences bearing first and second mutations, there would be a variation in the hybridization patterns of probes having intenogation positions conesponding to the first and second mutations relative to the hybridization pattern with the reference sequence.
  • one of the probes having a mismatched intenogation position relative to the reference sequence would show an increase in hybridization signal, and the probe having a matched inte ⁇ ogation position relative to the reference sequence would show a decrease in hybridization signal.
  • Analysis of the hybridization pattern of the mixture of mutant target sequences indicates the presence of two mutant target sequences, the position and nature of the mutation in each strain, and the relative proportions of each strain.
  • the different components in a mixture of target sequences are differentially labeled before being applied to the anay. For example, a variety of fluorescent labels emitting at different wavelength are available.
  • differential labels allows independent analysis of different targets bound simultaneously to the anay.
  • the methods permit comparison of target sequences obtained from a patient at different stages of a disease.
  • Omission of Probes The general strategy of the aspects of the invention outlined above employs four probes to read each nucleotide of interest in a target sequence. One probe (from the first probe set) shows a perfect match to the reference sequence and the other three probes (from the second, third and fourth probe sets) exhibit a mismatch with the reference sequence and a perfect match with a target sequence bearing a mutation at the nucleotide of interest. The provision of three probes from the second, third and fourth probe sets allows detection of each of the three possible nucleotide substitutions of any nucleotide of interest.
  • probes that would detect silent mutations are omitted.
  • the probes from the first probe set are omitted conesponding to some or all positions of the reference sequences.
  • Such chips comprise at least two probe sets.
  • the first probe set has a plurality of probes. Each probe comprises a segment exactly complementary to a subsequence of a reference sequence except in at least one intenogation position.
  • a second probe set has a conesponding probe for each probe in the first probe set.
  • the conesponding probe in the second probe set is identical to a sequence comprising the conesponding probe form the first probe set or a subsequence thereof that includes the at least one (and usually only one) inte ⁇ ogation position except that the at least one intenogation position is occupied by a different nucleotide in each of the two conesponding probes from the first and second probe sets.
  • a third probe set if present, also comprises a conesponding probe for each probe in the first probe set except at the at least one intenogation position, which 0226
  • the presence of a mutation is detected by a shift in the background hybridization intensity of the reference sequence to a perfectly matched hybridization signal of the target sequence, rather than by a comparison of the hybridization intensities of probes from the first set with conesponding probes from the second, third and fourth sets.
  • Wildtype Probe Lane When the chips comprise four probe sets, as discussed supra, and the probe sets are laid down in four lanes, an A-lane, a C-lane, a G-lane and a T or U- lane, the probe having a segment exhibiting perfect complementarity to a reference sequence varies between the four lanes from one column to another. This does not present any significant difficulty in computer analysis of the data from the chip.
  • each probe has a segment exhibiting perfect complementarity to the reference sequence.
  • This segment is identical to a segment from one of the probes in the other four lanes (which lane depending on the column position).
  • the extra lane of probes (designated the wildtype lane) hybridizes to a target sequence at all nucleotide positions except those in which deviations from the reference sequence occurs.
  • the hybridization pattern of the wildtype lane thereby provides a simple visual indication of mutations.
  • the chips provide an additional probe set specifically designed for analyzing deletion mutations.
  • the additional probe set comprises a probe conesponding to each probe in the first probe set as described above.
  • a probe from the additional probe set differs from the conesponding probe in the first probe set in that the nucleotide occupying the intenogation position is deleted in the probe from the additional probe set.
  • the probe from the additional probe set bears an additional nucleotide at one of its termini relative to the conesponding probe from the first probe set.
  • the probe from the additional probe set will hybridize more strongly than the conesponding probe from the first probe set to a target sequence having a single base deletion at the nucleotide conesponding to the intenogation position.
  • Additional probe sets are provided in which not only the intenogation position, but also an adjacent nucleotide is detected.
  • other chips provide additional probe sets for analyzing insertions.
  • one additional probe set has a probe conesponding to each probe in the first probe set as described above.
  • the probe in the additional probe set has an extra T nucleotide inserted adjacent to the intenogation position.
  • the probe has one fewer nucleotide at one of its termini relative to the conesponding probe from the first probe set.
  • the probe from the additional probe set hybridizes more strongly than the conesponding probe from the first probe set to a target sequence having an A nucleotide inserted in a position adjacent to that conesponding to the intenogation position.
  • Similar additional probe sets are constructed having C, G or T/U nucleotides inserted adjacent to the intenogation position. Usually, four such probe sets, one for each nucleotide, are used in combination.
  • Other chips provide additional probes (multiple-mutation probes) for analyzing target sequences having multiple closely spaced mutations.
  • a multiple- mutation probe is usually identical to a conesponding probe from the first set as described above, except in the base occupying the inte ⁇ ogation position, and except at one or more additional positions, conesponding to nucleotides in which substitution may occur in the reference sequence.
  • the one or more additional positions in the multiple mutation probe are occupied by nucleotides complementary to the nucleotides occupying conesponding positions in the reference sequence when the possible substitutions have occuned.
  • Block Tiling As noted in the discussion of the general tiling strategy, in one aspect, a probe in the first probe set can have more than one intenogation position.
  • a probe in the first probe set is sometimes matched with multiple groups of at least one, and usually, three additional probe sets.
  • Three additional probe sets are used to allow detection of the three possible nucleotide substitutions at any one position. If only certain types of substitution are likely to occur (e.g., transitions), only one or two additional probe sets are required (analogous to the use of probes in the basic tiling strategy).
  • a group comprises three additional probe sets
  • a first such group comprises second, third and fourth probe sets, each of which has a probe conesponding to each probe in the first probe set.
  • the conesponding probes from the second, third and fourth probes sets differ from the conesponding probe in the first set at a first of the intenogation positions.
  • the relative hybridization signals from conesponding probes from the first, second, third and fourth probe sets indicate the identity of the nucleotide in a target sequence conesponding to the first inte ⁇ ogation position.
  • a second group of three probe sets (designated fifth, sixth and seventh probe sets), each also have a probe conesponding to each probe in the first probe set. These conesponding probes differ from that in the first probe set at a second intenogation position.
  • the relative hybridization signals from conesponding probes from the first, fifth, sixth, and seventh probe sets indicate the identity of the nucleotide in the target sequence conesponding to the second intenogation position.
  • the probes in the first probe set often have seven or more intenogation positions. If there are seven intenogation positions, there are seven groups of three additional probe sets, each group of three probe sets serving to identify the nucleotide conesponding to one of the seven intenogation positions.
  • Each block of probes allows short regions of a target sequence to be read. For example, for a block of probes having seven intenogation positions, seven nucleotides in the target sequence can be read.
  • a chip can contain any number of blocks depending on how many nucleotides of the target are of interest.
  • the hybridization signals for each block can be analyzed independently of any other block.
  • the block tiling strategy can also be combined with other tiling strategies, with different parts of the same reference sequence being tiled by different strategies.
  • the block tiling strategy offers two advantages over the basic strategy in which each probe in the first set has a single intenogation position. One advantage is that the same sequence information can be obtained from fewer probes.
  • a second advantage is that each of the probes constituting a block (i.e., a probe from the first probe set and a conesponding probe from each of the other probe sets) can have identical 3' and 5' sequences, with the variation confined to a central segment containing the inte ⁇ ogation positions.
  • the identity of 3' sequence between different probes simplifies the strategy for solid phase synthesis of the probes on the chip and results in more uniform deposition of the different probes on the chip, thereby in turn increasing the uniformity of signal to noise ratio for different regions of the chip.
  • a third advantage is that greater signal uniformity is achieved within a block.
  • the identity of a nucleotide in a target or reference sequence is determined by comparison of hybridization patterns of one probe having a segment showing a perfect match with that of other probes (usually three other probes) showing a single base mismatch.
  • the identity of at least two nucleotides in a reference or target sequence is determined by comparison of hybridization signal intensities of four probes, two of which have a segment showing perfect complementarity or a single base mismatch to the reference sequence, and two of which have a segment showing perfect complementarity or a double-base mismatch to a segment.
  • the four probes whose hybridization patterns are to be compared each have a segment that is exactly complementary to a reference sequence except at two intenogation positions, in which the segment may or may not be complementary to the reference sequence.
  • the intenogation positions conespond to the nucleotides in a reference or target sequence which are determined by the comparison of intensities.
  • the nucleotides occupying the intenogation positions in the four probes are selected according to the following rule.
  • the first intenogation position is occupied by a different nucleotide in each of the four probes.
  • the second intenogation position is also occupied by a different nucleotide in each of the four probes.
  • the segment is exactly complementary to the reference sequence except at not more than one of the two inte ⁇ ogation positions.
  • one of the intenogation positions is occupied by a nucleotide that is complementary to the conesponding nucleotide from the reference sequence and the other intenogation position may or may not be so occupied.
  • the segment is exactly complementary to the reference sequence except that both inte ⁇ ogation positions are occupied by nucleotides which are non-complementary to the respective conesponding nucleotides in the reference sequence.
  • the conditions noted above are satisfied by each of the intenogation positions in any one of the four probes being occupied by complementary nucleotides.
  • the intenogation positions could be occupied by A and T, in the second probe by C and G, in the third probe by G and C and in the four probe, by T and A.
  • the four probes are hybridized to a target that is the same as the reference sequence or differs from the reference sequence at one (but not both) of the intenogation positions, two of the four probes show a double-mismatch with the target and two probes show a single mismatch.
  • the identity of probes showing these different degrees of mismatch can be determined from the different hybridization signals. From the identity of the probes showing the different degrees of mismatch, the nucleotides occupying both of the intenogation positions in the target sequence can be deduced.
  • the multiplex strategy has been initially described for the situation where there are two nucleotides of interest in a reference sequence and only four probes in an anay.
  • the strategy can be extended to analyze any number of nucleotides in a target sequence by using additional probes.
  • each pair of inte ⁇ ogation positions is read from a unique group of four probes.
  • helper mutations serve to break-up regions of internal complementarity within a probe and thereby prevent annealing.
  • one or two helper mutations are quite sufficient for this pu ⁇ ose.
  • the inclusion of helper mutations can be beneficial in any of the tiling strategies noted above.
  • each probe having a particular intenogation position has the same helper mutation(s).
  • such probes have a segment in common which shows perfect complementarity with a reference sequence, except that the segment contains at least one helper mutation (the same in each of the probes) and at least one intenogation position
  • a probe from the first probe set comprises a segment containing an intenogation position and showing perfect complementarity with a reference sequence except for one or two helper mutations.
  • the conesponding probes from the second, third and fourth probe sets usually comprise the same segment (or sometimes a subsequence thereof including the helper mutation(s) and intenogation position), except that the base occupying the intenogation position varies in each probe.
  • the helper mutation tiling strategy is used in conjunction with one of the tiling strategies described above.
  • the probes containing helper mutations are used to tile regions of a reference sequence otherwise giving low hybridization signal (e.g., because of self-complementarity), and the alternative tiling strategy is used to tile intervening regions.
  • Pooling Strategies Pooling strategies of the invention can also employ anays of immobilized probes. Probes can be immobilized in cells of an anay, and the hybridization signal of each cell can be determined independently of any other cell. A particular cell may be occupied by pooled mixture of probes. Although the identity of each probe in the mixture is known, the individual probes in the pool are not separately addressable. Thus, the hybridization signal from a cell is the aggregate of that of the different probes occupying the cell.
  • a cell is scored as hybridizing to a target sequence if at least one probe occupying the cell comprises a segment exhibiting perfect complementarity to the target sequence.
  • a simple strategy to show the increased power of pooled strategies over a standard tiling is to create three cells each containing a pooled probe having a single pooled position, the pooled position being the same in each of the pooled probes. At the pooled position, there are two possible nucleotides, allowing the pooled probe to hybridize to two target sequences. In tiling terminology, the pooled position of each probe is an inte ⁇ ogation position.
  • the identity of the nucleotide in the target sequence conesponding to the intenogation position i.e., that is matched with the intenogation position when the target sequence and pooled probes are maximally aligned for complementarity.
  • the three cells are assigned probe pools that are perfectly complementary to the target except at the pooled position, which is occupied by a different pooled nucleotide in each probe. With 3 pooled probes, all 4 possible single base pair states (wild and 3 mutants) are detected.
  • a pool hybridizes with a target if some probe contained within that pool is complementary to that target.
  • a cell containing a pair (or more) of oligonucleotides lights up when a target complementary to any of the oligonucleotide in the cell is present.
  • each of the four possible targets yields a unique hybridization pattern among the three cells. Since a different pattern of hybridizing pools is obtained for each possible nucleotide in the target sequence conesponding to the pooled intenogation position in the probes, the identity of the nucleotide can be determined from the hybridization pattern of the pools.
  • a standard tiling requires four cells to detect and identify the possible single-base substitutions at one location, this simple pooled 45 strategy only requires three cells.
  • pooling strategy for sequence analysis is the 'Trellis' strategy.
  • each pooled probe has a segment of perfect complementarity to a reference sequence except at three pooled positions.
  • One pooled position is an N pool.
  • the three pooled positions may or may not be contiguous in a probe.
  • the other two pooled positions are selected from the group of three pools consisting of (1) M or K, (2) R or Y and (3) W or S, where the single letters are IUPAC standard ambiguity codes.
  • the sequence of a pooled probe is thus, of the form XXXN[(M/K) or (R/Y) or (W/S)][(M/K) or (R/Y) or (W/S)]XXXXX, where XXX represents bases complementary to the reference sequence.
  • the three pooled positions may be in any order, and may be contiguous or separated by intervening nucleotides. For, the two positions occupied by [(M/K) or (R/Y) or (W/S)], two choices must be made. First, one must select one of the following three pairs of pooled nucleotides (1) M/K, (2) R/Y and (3) W/S.
  • the one of three pooled nucleotides selected may be the same or different at the two pooled positions.
  • This choice should result in selection of a pooled nucleotide comprising a nucleotide that complements the conesponding nucleotide in a reference sequence, when the probe and reference sequence are maximally aligned.
  • the same principle governs the selection between R and Y, and between W and S.
  • a trellis pool probe has one pooled position with four possibilities, and two pooled positions, each with two possibilities.
  • a trellis pool probe comprises a mixture of 16 (4 x 2 x 2) probes.
  • each pooled position includes one nucleotide that complements the conesponding nucleotide from the reference sequence
  • one of these 16 probes has a segment that is the exact complement of the reference sequence.
  • a target sequence that is the same as the reference sequence i.e., a wildtype target
  • the segment of complementarity should be sufficiently long to permit specific hybridization of a pooled probe to a reference sequence be detected relative to a variant of that reference sequence.
  • the segment of complementarity is about 9-21 nucleotides.
  • a target sequence is analyzed by comparing hybridization intensities at three pooled probes, each having the structure described above.
  • the segments complementary to the reference sequence present in the three pooled probes show some overlap. Sometimes the segments are identical (other than at the inte ⁇ ogation positions). However, this need not be the case.
  • the segments can tile across a reference sequence in increments of one nucleotide (i.e., one pooled probe differs from the next by the acquisition of one nucleotide at the 5' end and loss of a nucleotide at the 3' end).
  • the three intenogation positions may or may not occur at the same relative positions within each pooled probe (i.e., spacing from a probe terminus).
  • one of the three intenogation positions from each of the three pooled probes aligns with the same nucleotide in the reference sequence, and that this intenogation position is occupied by a different pooled nucleotide in each of the three probes.
  • the intenogation position is occupied by an N.
  • the inte ⁇ ogation position is occupied by one of (M/K) or (R/Y) or (W/S).
  • three pooled probes are used to analyze a single nucleotide in the reference sequence.
  • Still another combination of three pooled probes from the set of five have an inte ⁇ ogation position that aligns with a third nucleotide in the reference sequence and these probes are used to analyze that nucleotide.
  • three nucleotides in the reference sequence are fully analyzed from only five pooled probes.
  • the basic tiling strategy would require 12 probes for a similar analysis.
  • the trellis strategy can employ an anay of probes having at least three cells, each of which is occupied by a pooled probe as described above.
  • Three cells are occupied by pooled probes having a pooled intenogation position conesponding to the position of possible substitution in the target sequence, one cell with an N', one cell with one of M' or K', and one cell with R' or Y'.
  • the cell with the N' in the intenogation position lights up for the wildtype sequence and any of the three single base substitutions of the target sequence.
  • a further class of strategies involving pooled probes are termed coding strategies. These strategies assign code words from some set of numbers to variants of a reference sequence. Any number of variants can be coded. The variants can include multiple closely spaced substitutions, deletions or insertions.
  • the designation letters or other symbols assigned to each variant may be any arbitrary set of numbers, in any order. For example, a binary code is often used, but codes to other bases are entirely feasible. The numbers are often assigned such that each variant has a designation having at least one digit and at least one nonzero value for that digit.
  • a variant assigned the number 101 has a designation of three digits, with one possible nonzero value for each digit.
  • the designation of the variants are coded into an anay of pooled probes comprising a pooled probe for each nonzero value of each digit in the numbers assigned to the variants. For example, if the variants are assigned successive number in a numbering system of base m, and the highest number assigned to a variant has n digits, the array would have about n x (m -1) pooled probes.
  • log m (3N+1) probes are required to analyze all variants of N locations in a reference sequence, each having three possible mutant substitutions.
  • each pooled probe has a segment exactly complementary to the reference sequence except that certain positions are pooled.
  • the segment should be sufficiently long to allow specific hybridization of the pooled probe to the reference sequence relative to a mutated form of the reference sequence.
  • segments lengths of 9-21 nucleotides are typical.
  • the probe has no nucleotides other than the 9-21 nucleotide segment.
  • the pooled positions comprise nucleotides that allow the pooled probe to hybridize to every variant assigned a particular nonzero value in a particular digit.
  • the pooled positions further comprises a nucleotide that allows the pooled probe to hybridize to the reference sequence.
  • a wildtype target or reference sequence
  • a target is hybridized to the pools, only those pools comprising a component probe having a segment that is exactly complementary to the target light up.
  • the identity of the target is then decoded from the pattern of hybridizing pools.
  • Each pool that lights up is conelated with a particular value in a particular digit.
  • the aggregate hybridization patterns of each lighting pool reveal the value of each digit in the code defining the identity of the target hybridized to the anay.
  • Probes that contain partial matches to two separate (i.e., non contiguous) subsequences of a target sequence sometimes hybridize strongly to the target sequence. In certain instances, such probes have generated stronger signals than probes of the same length which are perfect matches to the target sequence. It is believed (but not necessary to the invention) that this observation results from interactions of a single target sequence with two or more probes simultaneously.
  • This invention exploits this observation to provide anays of probes having at least first and second segments, which are respectively complementary to first and second subsequences of a reference sequence. Optionally, the probes may have a third or more complementary segments. These probes can be employed in any of the, strategies noted above.
  • the two segments of such a probe can be complementary to disjoint subsequences of the reference sequences or contiguous subsequences. If the latter, the two segments in the probe are inverted relative to the order of the complement of the reference sequence.
  • the two subsequences of the reference sequence each typically comprises about 3 to 30 contiguous nucleotides.
  • the subsequences of the reference sequence are sometimes separated by 0, 1, 2 or 3 bases. Often the sequences, are adjacent and nonoverlapping.
  • the bridging strategy can offer the following advantages: (1) Higher discrimination between matched and mismatched probes, (2) The possibility of using longer probes in a bridging tiling, thereby increasing the specificity of the hybridization, without sacrificing discrimination, (3) The use of probes in which an intenogation position is located very off-center relative to the regions of target complementarity. This may be of particular advantage when, for example, when a probe centered about one region of the target gives low hybridization signal. The low signal is overcome by using a probe centered about an adjoining region giving a higher hybridization signal. (4) Disruption of secondary structure that might result in annealing of certain probes (see previous discussion of helper mutations).
  • the invention also provides a deletion tiling strategy.
  • Deletion tiling is related to both the bridging and helper mutant strategies described above.
  • comparisons are performed between probes sharing a common deletion but differing from each other at an intenogation position located outside the deletion.
  • a first probe comprises first and second segments, each exactly complementary to respective first and second subsequences of a reference sequence, wherein the first and second subsequences of the reference sequence are separated by a short distance (e.g., 1 or 2 nucleotides).
  • the order of the first and second segments in the probe is usually the same as that of the complement to the first and second subsequences in the reference sequence.
  • Such tilings sometimes offer superior discrimination in hybridization intensities between the probe having an inte ⁇ ogation position complementary to the target and other probes.
  • the difference between the hybridizations to matched and mismatched targets for the probe set shown above is the difference between a single-base bulge, and a large asymmetric loop (e.g., two bases of target, one of probe). This often results in a larger difference in stability than the comparison of a perfectly matched probe with a probe showing a single base mismatch in the basic tiling strategy.
  • the use of deletion or bridging probes is quite general. These probes can be used in any of the tiling strategies of the invention.
  • the target polynucleotide whose sequence is to be determined, is usually isolated from a tissue sample. If the target is genomic, the sample may be from any tissue (except exclusively red blood cells). For example, whole blood, peripheral blood lymphocytes or PBMC, skin, hair or semen are convenient sources of clinical samples. These sources are also suitable if the target is RNA. Blood and other body fluids are also a convenient source for isolating viral nucleic acids.
  • the sample is obtained from a tissue in which the mRNA is expressed.
  • the polynucleotide in the sample is RNA, it is usually reverse transcribed to DNA.
  • DNA samples or cDNA resulting from reverse transcription are usually amplified, e.g., by PCR.
  • the amplification product can be RNA or DNA. Paired primers are selected to flank the borders of a target polynucleotide of interest. More than one target can be simultaneously amplified by multiplex PCR in which multiple paired primers are employed.
  • the target can be labeled at one or more nucleotides during or after amplification.
  • target polynucleotides e.g., episomal DNA
  • sufficient DNA is present in the tissue sample to dispense with the amplification step.
  • the sense of the strand should of course be complementary to that of the probes on the chip. This is achieved by appropriate selection of primers.
  • the target can be fragmented before application to the chip to reduce or eliminate the formation of secondary structures in the target.
  • the average size of targets segments following hybridization is usually larger than the size of probe on the chip.
  • genome sequencing can be accomplished according to the enzymatic/Sanger method (described in F. Sanger, S. Nicklen, and A. R. Coulson, Proc. Natl. Acad. Sci, USA, 74:5463-5467 (1977)) and involve cloning and subcloning (described in U.S. Patent No. 4725677; Chen and Seeburg, DNA 4, 165-170 (1985); Lim et al., Gene Anal., Techn. 5, 32-39 (1988); PCR Protocols- A Guide to Methods and Applications. Innis et al., editors, Academic Press, San Diego (1990); Innis et al., Proc. Nat. Acad. Sci.
  • sequencing can be accomplished according to the chemical/Maxam and Gilbert method which is described in references: A. M. Maxam, and W. Gilbert, Proc. Nat. Acad. of Sci., USA, 74:560-564 (1977) and
  • genome sequencing can be accomplished by methodology described by Guo and Wu (Guo and Wu, Nucleic Acids Res., 10:2065 (1982); and Meth. Enz., 100:60 (1983)) or those methods that utilize 3'hydroxy-protected and labeled nucleotides as exemplified in the following references: Churchich, J.E., Eur. J. Biochem., 231 :736 (1995);
  • sequencing may be read by autoradiography using radioisotopes (as described in Orastein et al., Biotechniques 2, 476 (1985)) or by using non-radioactively labeling strategies that have been integrated into partly automated DNA sequencing procedures (Smith et al., Nature M, 674-679 (1986) and EPO Patent No. 873 00998.9; Du Pont De Nemours EPO Application No. 03 59225; Ansorge et al., L Biochem. Biophys. Method 13, 325-32 (19860; Prober et al.
  • this invention provides for various methods of reading sequencing data such as capillary zone electrophoresis (described in Jorgenson et al., J. Chromatography 352, 337 (1986); Gesteland et al., Nucleic Acids Res. 18, 1415- 1419 (1990)), mass spectrometry (including ES [described in Fenn et al. J. Phys. Chem. 18, 4451-59 (1984); PCT Application No.
  • the invention provides a method of performing whole cell engineering comprising the step of cell screening.
  • the method includes DNA amplification.
  • DNA can be amplified by a variety of procedures including cloning (Sambrook et at., Molecular Cloning : A Laboratory Manual., Cold Spring Harbor Laboratory Press, 1989), polymerase chain reaction (PCR) (CR. Newton and A. Graham, PCF, BIOS Publishers, 1994; Bevan et al., "Sequencing of PCR-Amplified DNA” PCR Meth. App. 4:222 (1992)), ligase chain reaction (LCR) (F. Barany Proc. Natl. Acad Sci USA 88, 189-93 (1991), strand displacement amplification (SDA) (G.
  • PCR polymerase chain reaction
  • LCR ligase chain reaction
  • SDA strand displacement amplification
  • this invention provides for additional sequencing methods (as described in Labeit et al., MA 5, 173-177 (1986); Amersham, PCT- Application GB86/00349; Eckstein et al., Nucleic Acids Res. 1 ⁇ , 9947 (1988); Max- Planck- Geselischaft, DE 3930312 Al; Saiki, R. et al., Science 239:487-491 (1998); Sarkat, G. and Bolander Mark E., Semi Exponential Cycle Sequencing Nucleic Acids Research, 1995, Vol. 23, No. 7, p. 1269-1270).
  • This invention also provides for the following sequencing strategies: shotgun sequencing, transposon-mediated directed sequencing (Sfrathmann, M. et al.
  • the step of genomic sequencing includes constructing ordered clone maps of DNA sequencing (as described in sections of U.S. Patent Publication No. 5604100 and PCT Patent Publication No. WO9627025). This invention provides that the method of genome sequencing be achieved by various steps that may utilize modifications of certain methods mentioned above (described in the following patents: PCT Publication Nos.
  • this invention provides for the use of a relational database system for storing and manipulating biomolecular sequence information and storing and displaying genetic information
  • the database including genomic libraries for a plurality of types of organisms, the libraries having multiple genomic sequences, at least some of which represent open reading frames located along a contiguous sequence on each the plurality of organisms' genomes, and a user interface capable of receiving a selection of two or more of the genomic libraries for comparison and displaying the results of the comparison.
  • Associated with the database is a software system that allows a user to determine the relative position of a selected gene sequence within a genome. The system allows execution of a method of displaying the genetic locus of a biomolecular sequence.
  • the method involves providing a database including multiple biomolecular sequences, at least some of which represent open reading frames located along a contiguous sequence on an organism's genome.
  • the system also provides a user interface capable of receiving a selection of one or more probe open reading frames for use in determining homologous matches between such probe open reading frame(s) and the open reading frames in the genomic libraries, and displaying the results of the determination.
  • An open reading frame for the sequence is selected and displayed together with adjacent open reading frames located upstream and downstream in the relative positions in which they occur on the contiguous sequence.
  • the invention provides a relational database system for storing biomolecular sequence information in a manner that allows sequences to be catalogued and searched according to one or more protein function hierarchies.
  • the hierarchies allow searches for sequences based upon a protein's biological function or molecular function. Also disclosed is a mechanism for automatically grouping new sequences into protein function hierarchies. This mechanism uses descriptive information obtained from "external hits" which are matches of stored sequences against gene sequences stored in an external database such as GenBank. The descriptive information provided with the external database is evaluated according to a specific algorithm and used to automatically group the external hits (or the sequences associated with the hits) in the categories. Ultimately, the biomolecular sequences stored in databases of this invention are provided with both descriptive information from the external hit and category information from a relevant hierarchy or hierarchies.
  • a relational database system for storing biomolecular sequence information in a manner that allows sequences to be catalogued and searched according to association with one or more projects for obtaining full-length biomolecular sequences from shorter sequences.
  • the relational database has sequence records containing information identifying one or more projects to which each of the sequence records belong. Each project groups together one or more biomolecular sequences generated during work to obtain a full-length gene sequence from a shorter sequence.
  • the computer system has a user interface allowing a user to selectively view information regarding one or more projects.
  • the relational database also provides interfaces and methods for accessing and manipulating and analyzing project-based information. Polymer sequences can be assembled into bins. A first number of bins are populated with polymer sequences.
  • the polymer sequences in each bin are assembled into one or more consensus sequences representative of the polymer sequences of the bin.
  • the consensus sequences of the bins are compared to determine relationships, if any, between the consensus sequences of the bins.
  • the bins are modified based on the relationships between the consensus sequences of the bins.
  • the polymer sequences are reassembled in the modified bins to generate one or more modified consensus sequences for each bin representative of the modified bins.
  • sequence similarities and dissimilarities are analyzed in a set of polymer sequences. Pairwise alignment data is generated for pairs of the polymer sequences.
  • the pairwise alignment data defines regions of similarity between the pairs of polymer sequences with boundaries.
  • ANNOTATING - GENERAL METHODOLOGY In one aspect the invention provides relational databases for storing and retrieving biological information. More particularly the invention relates to systems and methods for providing sequences of biological molecules in a relational format allowing retrieval in a client-server environment and for providing full-length cDNA sequences in a relational format allowing retrieval in a client-server environment.
  • this present invention provides relational database systems for storing and analyzing biomolecular sequence information together with biological annotations detailing the source and inte ⁇ retation the sequence data.
  • the present invention provides a powerful database tool for drug development and other research and development piuposes.
  • the present invention provides relational database systems for storing and analyzing biomolecular sequence information together with biological detailing the source and inte ⁇ retation the sequence data.
  • Disclosed is a relational database systems for storing and displaying genetic information.
  • a software system the allows a user to determine the relative position of a selected gene sequence within a genome.
  • the system allows execution of a method of displaying the genetic locus of a biomolecular sequence.
  • the method involves providing a database including multiple biomolecular sequences, at least some of which represent open reading frames located along a contiguous sequence on an organism's genome. An open reading frame for the sequence is selected and displayed together with adjacent open reading frames located upstream and downstream in the relative positions in which they occur on the contiguous sequence.
  • the invention provides a method of displaying the genetic locus of a biomolecular sequence.
  • the method involve providing a database including multiple biomolecular sequences, at least some of which represent open reading frames located along a contiguous sequence on an organism's genome.
  • the method further involves identifying a selected open reading frame, and displaying the selected open reading frame together with adjacent open reading frames located upstream and downstream from the selected open reading frame.
  • the adjacent open reading frames and the selected open reading frame are displayed in the relative positions in which they occur on the contiguous sequence, textually and/or graphically.
  • the method of the invention may be practiced with sequences from microbial organisms, and the sequences may include nucleic acid or protein sequences.
  • the invention also provides a computer system including a database having multiple biomolecular sequences, at least some of which represent open reading frames located along a contiguous sequence on an organism's genome.
  • the computer system also includes a user interface capable of identifying a selected open reading frame, and displaying the selected open reading frame together with adjacent open reading frames located upstream and downstream from the selected open reading frame. The adjacent the open reading frames and the selected open reading frame are displayed in the relative positions in which they occur on the contiguous sequence.
  • the user interface may also capable of detecting a scrolling command, and based upon the direction and magnitude of the scrolling command, identifying a new selected open reading frame from the contiguous sequence.
  • the invention further provides a computer program product comprising a computer-usable medium having computer-readable program code embodied thereon relating to a database including multiple biomolecular sequences, at least some of which represent open reading frames located along a contiguous sequence on an organism's genome.
  • the computer program product includes computer-readable program code for identifying a selected open reading frame, and displaying the selected open reading frame together with adjacent open reading frames located upstream and downstream from the selected open reading frame. The adjacent open reading frames and the selected open reading frame are displayed in the relative positions in which they occur on the contiguous sequence.
  • Comparative Genomics is a feature of the database system of the present invention which allows a user to compare the sequence data of sets of different organism types.
  • Comparative searches may be formulated in a number of ways using the Comparative Genomics feature. For example, genes common to a set of organisms may be identified through a "commonality" query, and genes unique to one of a set of organisms may be identified through a "subtraction” query.
  • Electronic Southern is a feature of the present database system which is useful for identifying genomic libraries in which a given gene or ORF exists.
  • a Southern analysis is a conventional molecular biology technique in which a nucleic acid of known sequence is used to identify matching (complementary) sequences in a sample of nucleic acid to be analyzed.
  • Electronic Southerns may be used to locate homologous matches between a "probe" DNA sequence and a large number of DNA sequences in one or more libraries.
  • the present invention provides a method of comparing genetic complements of different types of organisms. The method involves providing a database having sequence libraries with multiple biomolecular sequences for different types of organisms, where at least some of the sequences represent open reading frames located along one or more contiguous sequences on each of the organisms' genomes. The method further involves receiving a selection of two or more of the sequence libraries for comparison, determining open reading frames common or unique to the selected sequence libraries, and displaying the results of the determination. The invention also provides a method of comparing genomic complements of different types of organisms.
  • the method involves providing a database having genomic sequence libraries with multiple biomolecular sequences for different types of organisms, where at least some of the sequences represent open reading frames located along one or more contiguous sequences on each of the organisms' genomes.
  • the method further involves receiving a selection of two or more of the sequence libraries for comparison, determining sequences common or unique to the selected sequence libraries, and displaying the results of the determination.
  • the invention further provides a computer system including a database containing genomic libraries for different types of organisms, which libraries have multiple genomic sequences, at least some of which representing open reading frames located along one or more contiguous sequences on each the organisms' genomes.
  • the system also includes a user interface capable of receiving a selection of two or more genomic libraries for comparison and displaying the results of the comparison.
  • Another aspect of the present invention provides a method of identifying libraries in which a given gene exists.
  • the method involves providing a database including genomic libraries for one or more types of organisms.
  • the libraries have multiple genomic sequences, at least some of which represent open reading frames located along one or more contiguous sequences on each the organisms' genomes.
  • the method further involves receiving a selection of one or more probe sequences, determining homologous matches between the selected probe sequences and the sequences in the genomic libraries, and displaying the results of the determination.
  • the invention also provides a computer system including a database including genomic libraries for one or more types of organisms, which libraries have multiple genomic sequences, at least some of which represent open reading frames located along one or more contiguous sequences on each the organisms' genomes.
  • the system also includes a user interface capable of receiving a selection of one or more probe sequences for use in determining homologous matches between one or more probe sequences and the sequences in the genomic libraries, and displaying the results of the determination.
  • a computer program product including a computer- usable medium having computer-readable program code embodied thereon relating to a database including genomic libraries for one or more types of organisms.
  • the libraries have multiple genomic sequences, at least some of which represent open reading frames located along one or more contiguous sequences on each the organisms' genomes.
  • the computer program product includes computer-readable program code for providing, within a computing system, an interface for receiving a selection of two or more genomic libraries for comparison, determining sequences common or unique to the selected genomic libraries, and displaying the results of the determination. Additionally provided is a computer program product including a computer-usable medium having computer-readable program code embodied thereon relating to a database including genomic libraries for one or more types of organisms.
  • the libraries have multiple genomic sequences, at least some of which represent open reading frames located along one or more contiguous sequences on each the organisms' genomes.
  • the computer program product includes computer-readable program code for providing, within a computing system, an interface for receiving a selection of one or more probe open reading frames, determining homologous matches between the probe sequences and the sequences in the genomic libraries, and displaying the results of the determination.
  • the invention further provides a method of presenting the genetic complement of an organism. The method involves providing a database including sequence libraries for a plurality of types of orgamsms, where the libraries have multiple biomolecular sequences, at least some of which represent open reading 226
  • the method further involves receiving a selection of one of the sequence libraries, determining open reading frames within the selected sequence library, and displaying the results as one or more unique identifiers for groups of related opening reading frames.
  • the present invention provides relational database systems for storing biomolecular sequence information in a manner that allows sequences to be catalogued and searched according to one or more protein function hierarchies.
  • the hierarchies are provided to allow carefully tailored searches for sequences based upon a protein's biological function or molecular function. To make this capability available in large sequence databases, the invention provides a mechanism for automatically grouping new sequences into protein function hierarchies.
  • the mechanism takes advantage of descriptive information obtained from "external hits" which are matches of stored sequences against gene sequences stored in an external database such as GenBank.
  • GenBank an external database
  • the descriptive information provided with GenBank is evaluated according to a specific algorithm and used to automatically group the external hits (or the sequences associated with the hits) in the categories.
  • the biomolecular sequences stored in databases of this invention are provided with both descriptive information from the external hit and category information from a relevant hierarchy or hierarchies.
  • the invention provides a computer system having a database containing records pertaining to a plurality of biomolecular sequences. At least some of the biomolecular sequences are grouped into a first hierarchy of protein function categories, the protein function categories specifying biological functions of proteins conesponding to the biomolecular sequences and the first hierarchy.
  • the hierarchy includes a first set of protein function categories specifying biological functions at a cellular level, and a second set of protein function categories specifying biological functions at a level above the cellular level.
  • the computer system of the invention also includes a user interface allowing a user to selectively view information regarding the plurality of biomolecular sequences as it relates to the first hierarchy.
  • the computer system may also include additional protein function categories based, for example, on molecular or enzymatic function of proteins.
  • the biomolecular sequences may include nucleic acid or amino acid sequences. Some of said biomolecular sequences may be provided as part of one or more projects for obtaining full-length gene sequences from shorter sequences, and the database records may contain information about such projects.
  • the invention also provides a method of using a computer system to present information pertaining to a plurality of biomolecular sequence records stored in a database.
  • the method involves displaying a list of the records or a field for entering information identifying one or more of the records, identifying one or more of the records that a user has selected from the list or field, matching the one or more selected records with one or more protein function categories from a first hierarchy of protein function categories into which at least some of the biomolecular sequence records are grouped, and displaying the one or more categories matching the one or more selected records.
  • the protein function categories specify biological functions of proteins conesponding to the biomolecular sequences and the first hierarchy includes a first set of protein function categories specifying biological functions at a cellular level, and a second set of protein function categories specifying biological functions at a tissue level.
  • the method may also involve matching the records against other protein function hierarchies, such as hierarchies based on molecular and/or enzymatic function, and displaying the results.
  • At least some of the biomolecular sequences may be provided as part of one or more projects for obtaining full-length gene sequences from shorter sequences, and the database records may contain information about those projects. Additionally, the invention provides a method of using a computer system to present information pertaining to a plurality of biomolecular sequence records stored in a database.
  • the method involves displaying a list of one or more protein biological function categories from a first hierarchy of protein biological function categories into which at least some of the biomolecular sequence records are grouped, identifying one or more of the protein biological function categories that a user has selected from the list, matching the one or more selected protein biological function categories with one or more biomolecular sequence records which are grouped in the selected protein biological function categories, and displaying the one or more sequence records matching the one or more selected protein biological function categories.
  • the protein biological function categories specify biological functions of proteins conesponding to the biomolecular sequences and the first hierarchy includes a first set of protein biological function categories specifying biological functions at a cellular level, and a second set of protein biological function categories specifying biological functions at a tissue level.
  • the method may also involve matching the records against other protein function hierarchies, such as hierarchies based on molecular and/or enzymatic function, and displaying the results.
  • At least some of the biomolecular sequences may be provided as part of one or more projects for obtaining full-length gene sequences from shorter sequences, and the database records may contain information about those projects.
  • Another aspect of the invention provides a database system having a plurality of internal records.
  • the database includes a plurality of sequence records specifying biomolecular sequences, at least some of which records reference hits to an external database, which hits specify genes having sequences that at least partially match those of the biomolecular sequences.
  • the database also includes a plurality of external hit records specifying the hits to the external database, and at least some of the records reference protein function hierarchy categories which specify at least one of biological functions of proteins or molecular functions of proteins. At least some of the biomolecular sequences may be provided as part of one or more projects for obtaining full-length gene sequences from shorter sequences, and the database records may contain information about those projects. Further aspects of the present invention provide a method of using a computer system and a computer readable medium having program instructions to automatically categorize biomolecular sequence records into protein function categories in an internal database.
  • the method and program involve receiving descriptive information about a biomolecular sequence in the internal database from a record in an external database pertaining to a gene having a sequence that at least partially matches that of the biomolecular sequence.
  • a determination is made whether the descriptive information contains one or more terms matching one or more keywords associated with a first protein function category, the keywords being terms consistent with a classification in the first protein function category.
  • a determination is made whether the descriptive information contains a term matching one or more anti- keywords associated with the first protein function category, the anti- keywords being terms inconsistent with a classification in the first protein function category.
  • the present invention provides relational database systems for storing biomolecular sequence information in a manner that allows sequences to be catalogued and searched according to one or more characteristics.
  • the sequence information of the database is generated by one or more "projects" which are concerned with identifying the full- length coding sequence of a gene (i.e., mRNA).
  • the projects involve the extension of an initial sequenced portion of a clone of a gene of interest (e.g., an EST) by a variety of methods which use conventional molecular biological techniques, recently developed adaptations of these techniques, and certain novel database applications.
  • Data accumulated in these projects may be provided to the database of the present invention throughout the course of the projects and may be available to database users (subscribers) throughout the course of these projects for research, product (i.e., drug) development, and other pu ⁇ oses.
  • the database of the present invention and its associated projects may provide sequence and related data in amounts and forms not previously available.
  • the present invention can make partial and full-length sequence information for a given gene available to a user both during the course of the data acquisition and once the full-length sequence of the gene has been elucidated.
  • the database can provide a variety of tools for analysis and manipulation of the data, including Northern analysis and Expression summaries.
  • the present invention should permit more complete and accurate annotation, of sequence data, as well as the study of relationships between genes of different tissues, systems or organisms, and ultimately detailed expression studies of full-length gene sequences.
  • the invention provides a computer system including a database having sequence records containing information identifying one or more projects to which each of the sequence records belong. Each project groups together one or more biomolecular sequences generated during work to obtain a full-length gene sequence from a shorter sequence.
  • the computer system also has a user interface allowing a user to selectively view information regarding one or more projects.
  • the biomolecular sequences may include nucleic acid or amino acid sequences.
  • the user interface may allow users to view at least three levels of project information including a project information results level listing at least some of the projects in said database, a sequence information results level listing at least some of the sequences associated with a given project, and a sequence retrieval results level sequentially listing monomers which comprise a given sequence.
  • a method of using a computer system and a computer program product to present information pertaining to a plurality of sequence records stored in a database are also provided by the present invention.
  • the sequence records contain information identifying one or more projects to which each of the sequence records belong.
  • Each of the projects groups one or more biomolecular sequences generated during work to obtain a full-length gene sequence from a shorter sequence.
  • the method and program involve providing an interface for entering query information relating to one or more projects, locating data conesponding to the entered query information, and displaying the data conesponding to the entered query information. Additionally, the invention provides a method of using a computer system to present information pertaining to a plurality of sequence records stored in a database.
  • the sequence records contains information identifying one or more projects to which each of the sequence records belong.
  • Each of the projects groups one or more biomolecular sequences generated during work to obtain a full-length gene sequence from a shorter sequence.
  • the method involves displaying a list of one or more project identifiers, determining which project identifier or identifiers from the list is selected by a user, then displaying a second list of one or more biomolecular sequence identifiers associated with the selected project identifier or identifiers, determining which sequence identifier or identifiers from the second list has been selected by a user, and displaying a third list of one or more sequences conesponding to the selected sequence identifier or identifiers. Following the display of the third list, a determination may be made whether and which sequence from the third list has been selected by a user.
  • the invention further provides a computer system including a database having sequence records containing information identifying one or more projects to which each of the sequence records belong, each of said projects grouping one or more biomolecular sequences generated during work to obtain a full-length gene sequence from a shorter sequence.
  • the system also has a user interface capable of allowing a user to select one or more project identifiers or project member identifiers specifying one or more sequences to be compared with one or more cDNA sequence libraries, and displaying matches resulting from that comparison.
  • a method of using a computer system to present comparative information pertaining to a plurality of sequence records stored in a database is also provided by the present invention.
  • the sequence records contain information identifying one or more projects to which each of the sequence records belong, each of the projects grouping one or more biomolecular sequences generated during work to obtain a full-length gene sequence from a shorter sequence.
  • the method involves providing an interface capable of allowing a user to select one or more project identifiers or project member identifiers specifying one or more sequences, comparing the one or more specified sequences with one or more cDNA sequence libraries, and displaying matches resulting from the comparison.
  • the invention provides a computer system including a database having sequence records containing information identifying one or more projects to which each of the sequence records belong, each of the projects grouping one or more biomolecular sequences generated during work to obtain a full-length gene sequence from a shorter sequence.
  • the system also has a user interface allowing a user to view expression information pertaining to the projects by selecting one or more expression categories for a query, and displaying the result of the query.
  • a method of using a computer system to view expression information pertaining to one or more projects, each of the projects grouping one or more biomolecular sequences generated during work to obtain a full-length gene sequence from a shorter sequence is also provided by the invention.
  • the computer system includes a database storing a plurality of sequence records, the sequence records containing information identifying one or more projects to which each of the sequence records belong.
  • the method involves providing an interface which allows a user to select one or more expression categories as a query, locating projects belonging to the selected one or more expression categories, and displaying a list of located projects.
  • the present invention provides a computer system including a database having sequence records containing information identifying one or more projects to which each of the sequence records belong, each of the projects grouping one or more biomolecular sequences generated during work to obtain a full-length gene sequence from a shorter sequence.
  • This computer system has a user interface allowing a user to selectively view information regarding said one or more projects and which displays information to a user in a format common to one or more other sequence databases.
  • Polymer sequences are assembled into bins. A first number of bins are populated with polymer sequences. The polymer sequences in each bin are assembled into one or more consensus sequences representative of the polymer sequences of the bin. The consensus sequences of the bins are compared to determine relationships, if any, between the consensus sequences. The bins are modified based on the relationships between the consensus sequences. The polymer sequences are reassembled in the modified bins to generate one or more modified consensus sequences for each bin representative of the modified bins.
  • sequence similarities and dissimilarities are analyzed in a set of polymer sequences.
  • Pairwise alignment data is generated for pairs of the polymer sequences.
  • the pairwise alignment data defines regions of similarity between the pairs of polymer sequences with boundaries. Additional boundaries in particular polymer sequences are determined by applying at least one boundary from at least one pairwise alignment for one pair of polymer sequences to at least one other pairwise alignment for another pair of polymer sequences including one of the particular polymer sequences. Additional regions of similarity are generated based on the boundaries.
  • ANNOTATING - RELATIONAL DATABASES The present invention provides an improved relational database for storing and manipulating genomic sequence information. While the invention is described in terms of a database optimized for microbial data, it is by no means so limited.
  • the invention may be employed to investigate data from various sources.
  • the invention covers databases optimized for other sources of sequence data, such as animal sequences (e.g., human, primate, rodent, amphibian, insect, etc.), plant sequences and microbial sequences.
  • animal sequences e.g., human, primate, rodent, amphibian, insect, etc.
  • RNA profiling or RNA profiling
  • SAGE Serial Analysis of Gene Expression
  • RNA molecules with the ability to bind a predetermined protein or a predetermined dye molecule were selected by alternate rounds of selection and PCR amplification (Tuerk and Gold, 1990; Ellington and Szostak, 1990).
  • Proteomics in another aspect of this invention, relates to the emerging field of proteomics. Proteomics involves the qualitative and quantitative measurement of gene activity by detecting and quantitating expression at the protein level, rather than at the messenger RNA level. Proteomics also involves the study of non-genome encoded events, including the post-translational modification of proteins (including glycosylation or other modifications), interactions between proteins, and the location of proteins within a cell. The structure, function, and or level of activity of the proteins expressed by the cell are also of interest.
  • proteomics involves the study of part or all of the status of the total protein contained within or secreted by a cell.
  • Proteomics requires means of separating proteins in complex mixtures and identifying both low-and high-abundance species. Examples of powerful methods cunently used to resolve complex protein mixtures are 2D gel electrophoresis, reverse phase HPLC, capillary electrophoresis, isoelectric focusing and related hybrid techniques.
  • Commonly used protein identification techniques include N-terminal Edman and mass spectrometry (electrospray [ESI] or matrix- assisted laser deso ⁇ tion ionization [MALDI] MS) and sophisticated database search programs, such as SEQUESTTM (see, e.g., U.S. Patent Nos.
  • SEQUESTTM co ⁇ elates uninte ⁇ reted tandem mass spectra of peptides with amino acid sequences from protein and nucleotide databases. SEQUESTTM can determine the amino acid sequence and thus the protein(s) and organism(s) that conespond to the mass spectrum being analyzed. SEQUESTTM uses algorithms described in U.S. Patent Nos. 6,017,693 and 5,538,897. Using a computer, the output of the mass spectrometry can be analyzed so as to link a gene and the particular protein for which it codes.
  • the present invention is further directed to a method for generating a selected mutant polynucleotide sequence (or a population of selected polynucleotide sequences) typically in the form of amplified and/or cloned polynucleotides, whereby the selected polynucleotide sequences(s) possess at least one desired phenotypic characteristic (e.g., encodes a polypeptide, promotes transcription of linked polynucleotides, binds a protein, and the like) which can be selected for.
  • a desired phenotypic characteristic e.g., encodes a polypeptide, promotes transcription of linked polynucleotides, binds a protein, and the like
  • One method for identifying hybrid polypeptides that possess a desired structure or functional property involves the screening of a large library of polypeptides for individual library members which possess the desired structure or functional property confe ⁇ ed by the amino acid sequence of the polypeptide.
  • One method of screening peptides involves the display of a peptide sequence, antibody, or other protein on the surface of a bacteriophage particle or cell. Generally, in these methods each bacteriophage particle or cell serves as an individual library member displaying a single species of displayed peptide in addition to the natural bacteriophage or cell protein sequences.
  • Each bacteriophage or cell contains the nucleotide sequence information encoding the particular displayed peptide sequence; thus, the displayed peptide sequence can be ascertained by nucleotide sequence determination of an isolated library member.
  • a well-known peptide display method involves the presentation of a peptide sequence on the surface of a filamentous bacteriophage, typically as a fusion with a bacteriophage coat protein.
  • the bacteriophage library can be incubated with an immobilized, predetermined macromolecule or small molecule (e.g., a receptor) so that bacteriophage particles which present a peptide sequence that binds to the immobilized macromolecule can be differentially partitioned from those that do not present peptide sequences that bind to the predetermined macromolecule.
  • the bacteriophage particles i.e., library members
  • which are bound to the immobilized macromolecule are then recovered and replicated to amplify the selected bacteriophage sub-population for a subsequent round of affinity enrichment and phage replication.
  • the bacteriophage library members that are thus selected are isolated and the nucleotide sequence encoding the displayed peptide sequence is determined, thereby identifying the sequence(s) of peptides that bind to the predetermined macromolecule (e.g., receptor).
  • the predetermined macromolecule e.g., receptor
  • the fusion protein/vector DNA complexes can be screened against a predetermined macromolecule in much the same way as bacteriophage particles are screened in the phage-based display system, with the replication and sequencing of the DNA vectors in the selected fusion protein/vector DNA complexes serving as the basis for identification of the selected library peptide sequence(s).
  • the displayed peptide sequences can be of varying lengths, typically from 3-5000 amino acids long or longer, frequently from 5-100 amino acids long, and often from about 8-15 amino acids long.
  • a library can comprise library members having varying lengths of displayed peptide sequence, or may comprise library members having a fixed length of displayed peptide sequence.
  • Portions or all of the displayed peptide sequence(s) can be random, pseudorandom, defined set kernal, fixed, or the like.
  • the present display methods include methods for in vitro and in vivo display of single-chain antibodies, such as nascent scFv on polysomes or scfv displayed on phage, which enable large-scale screening of scfv libraries having broad diversity of variable region sequences and binding specificities.
  • the present invention also provides random, pseudorandom, and defined sequence framework peptide libraries and methods for generating and screening those libraries to identify useful compounds (e.g., peptides, including single-chain antibodies) that bind to receptor molecules or epitopes of interest or gene products that modify peptides or RNA in a desired fashion.
  • the random, pseudorandom, and defined sequence framework peptides are produced from libraries of peptide library members that comprise displayed peptides or displayed single-chain antibodies attached to a polynucleotide template from which the displayed peptide was synthesized.
  • the mode of attachment may vary according to the specific aspect of the invention selected, and can include encapsulation in a phage particle or inco ⁇ oration in a cell.
  • Screening that utilizes in vitro translation systems An aspect of this invention provides for the use of in vitro translation during the step of screening. In vitro translation has been used to synthesize proteins of interest and has been proposed as a method for generating large libraries of peptides.
  • Affinity enrichment provides for the use of affinity enrichment which allows a very large library of peptides and single-chain antibodies to be screened and the polynucleotide sequence encoding the desired peptide(s) or single-chain antibodies to be selected.
  • the polynucleotide can then be isolated and shuffled to recombine combinatorially the amino acid sequence of the selected peptide(s) (or predetermined portions thereof) or single-chain antibodies (or just VHI, VLI or CDR portions thereof).
  • a peptide or single-chain antibody as having a desired binding affinity for a molecule and can exploit the process of shuffling to converge rapidly to a desired high-affinity peptide or scfv.
  • the peptide or antibody can then be synthesized in bulk by conventional means for any suitable use (e.g., as a therapeutic or diagnostic agent).
  • a significant advantage of the present invention is that no prior information regarding an expected ligand structure is required to isolate peptide ligands or antibodies of interest.
  • the peptide identified can have biological activity, which is meant to include at least specific binding affinity for a selected receptor molecule and, in some instances, will further include the ability to block the binding of other compounds, to stimulate or inhibit metabolic pathways, to act as a signal or messenger, to stimulate or inhibit cellular activity, and the like.
  • the present invention also provides a method for shuffling a pool of polynucleotide sequences selected by affinity screening a library of polysomes displaying nascent peptides (including single-chain antibodies) for library members which bind to a predetermined receptor (e.g., a mammalian proteinaceous receptor such as, for example, a peptidergic hormone receptor, a cell surface receptor, an intracellular protein which binds to other protein(s) to form intracellular protein complexes such as hetero-dimers and the like) or epitope (e.g., an immobilized protein, glycoprotein, oligosaccharide, and the like).
  • a predetermined receptor e.g., a mammalian proteinaceous receptor such as, for example, a peptidergic hormone receptor, a cell surface receptor, an intracellular protein which binds to other protein(s) to form intracellular protein complexes such as hetero-dimers and the like
  • epitope e.g.,
  • the invention also provides peptide libraries comprising a plurality of individual library members of the invention, wherein (1) each individual library member of said plurality comprises a sequence produced by shuffling of a pool of selected sequences, and (2) each individual library member comprises a variable peptide segment sequence or single-chain antibody segment sequence which is distinct from the variable peptide segment sequences or single-chain antibody sequences of other individual library members in said plurality (although some library members may be present in more than one copy per library due to uneven amplification, stochastic probability, or the like).
  • Antibody Display The present method can be used to shuffle, by in vitro and/or in vivo recombination by any of the disclosed methods, and in any combination, polynucleotide sequences selected by antibody display methods, wherein an associated polynucleotide encodes a displayed antibody which is screened for a phenotype (e.g., for affinity for binding a predetermined antigen (ligand).
  • a phenotype e.g., for affinity for binding a predetermined antigen (ligand).
  • Various prokaryotic expression systems have been developed that can be manipulated to produce combinatorial antibody libraries which may be screened for high-affinity antibodies to specific antigens.
  • a bacteriophage antibody display library is screened with a receptor (e.g., polypeptide, carbohydrate, glycoprotein, nucleic acid) that is immobilized (e.g., by covalent linkage to a chromatography resin to enrich for reactive phage by affinity chromatography) and/or labeled (e.g., to screen plaque or colony lifts).
  • a receptor e.g., polypeptide, carbohydrate, glycoprotein, nucleic acid
  • scfv single-chain fragment variable
  • Intracellular expression of an anti-Rev scfv has been shown to inhibit HIV-1 virus replication in vitro (Duan et al, 1994), and intracellular expression of an anti-p21rar, scfv has been shown to inhibit meiotic maturation of Xenopus oocytes (Biocca et al, 1993). Recombinant scfv which can be used to diagnose HIV infection have also been reported, demonstrating the diagnostic utility of scfv (Lilley et al, 1994). Fusion proteins wherein an scFv is linked to a second polypeptide, such as a toxin or fibrinolytic activator protein, have also been reported (Holvost et al, 1992; Nicholls et al, 1993).
  • Enzymatic inverse PCR mutagenesis has been shown to be a simple and reliable method for constructing relatively large libraries of scfv site-directed hybrids (Stemmer et al, 1993), as has enor-prone PCR and chemical mutagenesis (Deng et al, 1994).
  • Riechmann Riechmann et al, 1993
  • Barbas (Barbas et al, 1992) attempted to circumvent the problem of limited repertoire sizes resulting from using biased variable region sequences by randomizing the sequence in a synthetic CDR region of a human tetanus toxoid-binding Fab.
  • Displayed peptide/polynucleotide complexes which encode a variable segment peptide sequence of interest or a single-chain antibody of interest are selected from the library by an affinity enrichment technique. This is accomplished by means of a immobilized macromolecule or epitope specific for the peptide sequence of interest, such as a receptor, other macromolecule, or other epitope species.
  • the affinity selection procedure provides an enrichment of library members encoding the desired sequences, which may then be isolated for pooling and shuffling, for sequencing, and/or for further propagation and affinity enrichment.
  • the library members without the desired specificity are removed by washing.
  • the degree and stringency of washing required will be determined for each peptide sequence or single-chain antibody of interest and the immobilized predetermined macromolecule or epitope. A certain degree of control can be exerted over the binding characteristics of the nascent peptide/DNA complexes recovered by adjusting the conditions of the binding incubation and the subsequent washing.
  • the temperature, pH, ionic strength, divalent cations concentration, and the volume and duration of the washing will select for nascent peptide/DNA complexes within particular ranges of affinity for the immobilized macromolecule. Selection based on slow dissociation rate, which is usually predictive of high affinity, is often the most practical route. This may be done either by continued incubation in the presence of a saturating amount of free predetermined macromolecule, or by increasing the volume, number, and length of the washes. In each case, the rebinding of dissociated nascent peptide/DNA or peptide RNA complex is prevented, and with increasing time, nascent peptide/DNA or peptide/RNA complexes of higher and higher affinity are recovered.
  • affinities of some peptides are dependent on ionic strength or cation concentration. This is a useful characteristic for peptides that will be used in affinity purification of various proteins when gentle conditions for removing the protein from the peptides are required.
  • One variation involves the use of multiple binding targets (multiple epitope species, multiple receptor species), such that a scfv library can be simultaneously screened for a multiplicity of scfv which have different binding specificities. Given that the size of a scfv library often limits the diversity of potential scfv sequences, it is typically desirable to us scfv libraries of as large a size as possible.
  • multiple predetermined epitope species can be concomitantly screened in a single library, or sequential screening against a number of epitope species can be used.
  • multiple target epitope species each encoded on a separate bead (or subset of beads), can be mixed and incubated with a polysome-display scfv library under suitable binding conditions.
  • the collection of beads, comprising multiple epitope species can then be used to isolate, by affinity selection, scfv library members.
  • subsequent affinity screening rounds can include the same mixture of beads, subsets thereof, or beads containing only one or two individual epitope species.
  • This approach affords efficient screening, and is compatible with laboratory automation, batch processing, and high throughput screening methods.
  • Expression systems will typically include an expression control DNA sequence operably linked to the coding sequences, including naturally-associated or heterologous promoter regions.
  • the expression control sequences can be eukaryotic promoter systems in vectors capable of transforming or transfecting eukaryotic host cells. Once the vector has been inco ⁇ orated into the appropriate host, the host is maintained under conditions suitable for high level expression of the nucleotide sequences, and the collection and purification of the mutant' "engineered" antibodies.
  • the DNA sequences will be expressed in hosts after the sequences have been operably linked to an expression control sequence (i.e., positioned to ensure the transcription and translation of the structural gene).
  • expression control sequence i.e., positioned to ensure the transcription and translation of the structural gene.
  • These expression vectors are typically replicable in the host organisms either as episomes or as an integral part of the host chromosomal DNA.
  • expression vectors will contain selection markers, e.g., tetracycline or neomycin, to permit detection of those cells transformed with the desired DNA sequences (see, e.g., USPN 4,704,362).
  • mammalian tissue cell culture may also be used to produce the polypeptides of the present invention (see Winnacker, 1987), which is inco ⁇ orated herein by reference).
  • Eukaryotic cells can be used because a number of suitable host cell lines capable of secreting intact immunoglobulins have been developed in the art, and include the CHO cell lines, various COS cell lines, HeLa cells, and myeloma cell lines, or transformed B cells or hybridomas.
  • Expression vectors for these cells can include expression control sequences, such as an origin of replication, a promoter, an enhancer (Queen et al, 1986), and necessary processing information sites, such as ribosome binding sites, RNA splice sites, polyadenylation sites, and transcriptional terminator sequences.
  • Expression control sequences can be promoters derived from immunoglobulin genes, cytomegalovirus, SV40, Adenovirus, Bovine Papilloma Virus, and the like.
  • Eukaryotic DNA transcription can be increased by inserting an enhancer sequence into the vector.
  • Enhancers are cis-acting sequences of between 10 to 300 bp that increase transcription by a promoter. Enhancers can effectively increase transcription when either 5' or 3' to the transcription unit.
  • viral enhancers are used, including SV40 enhancers, cytomegalovirus enhancers, polyoma enhancers, and adenovirus enhancers. Enhancer sequences from mammalian systems are also commonly used, such as the mouse immunoglobulin heavy chain enhancer. Mammalian expression vector systems will also typically include a selectable marker gene. Examples of suitable markers include, the dihydrofolate reductase gene (DHFR), the thymidine kinase gene (TK), or prokaryotic genes conferring drug resistance. The first two marker genes can use mutant cell lines that lack the ability to grow without the addition of thymidine to the growth medium.
  • DHFR dihydrofolate reductase gene
  • TK thymidine kinase gene
  • prokaryotic genes conferring drug resistance. The first two marker genes can use mutant cell lines that lack the ability to grow without the addition of thymidine to the growth medium.
  • Transformed cells can then be identified by their ability to grow on non-supplemented media.
  • prokaryotic drug resistance genes useful as markers include genes conferring resistance to G418, mycophenolic acid and hygromycin.
  • the vectors containing the DNA segments of interest can be transfened into the host cell by well-known methods, depending on the type of cellular host. For example, calcium chloride transfection is commonly utilized for prokaryotic cells, whereas calcium phosphate treatment, lipofection, or electroporation may be used for other cellular hosts. Other methods used to transform mammalian cells include the use of Polybrene, protoplast fusion, liposomes, electroporation, and micro-injection (see, generally, Sambrook et al, 1982 and 1989).
  • the antibodies, individual mutated immunoglobulin chains, mutated antibody fragments, and other immunoglobulin polypeptides of the invention can be purified according to standard procedures of the art, including ammonium sulfate precipitation, fraction column chromatography, gel electrophoresis and the like; see, e.g., Scopes, 1982.
  • the polypeptides may then be used therapeutically or in developing and performing assay procedures, immunofluorescent stainings, and the like (see, generally, Lefkovits and Perm ' s, 1979 and 1981; Lefkovits, 1997).
  • This invention provides a two-hybrid screening system to identify library members which bind a predetermined polypeptide sequence.
  • the selected library members are pooled and shuffled by in vitro and/or in vivo recombination.
  • the shuffled pool can then be screened in a yeast two hybrid system to select library members which bind said predetermined polypeptide sequence (e. g., and SH2 domain) or which bind an alternate predetermined polypeptide sequence (e.g., an SH2 domain from another protein species).
  • Polynucleotides encoding two hybrid proteins, one consisting of the yeast Gal4 DNA-binding domain fused to a polypeptide sequence of a known protein and the other consisting of the Gal4 activation domain fused to a polypeptide sequence of a second protein', are constructed and introduced into a yeast host cell. Intermolecular binding between the two fusion proteins reconstitutes the Gal4 DNA-binding domain with the Gal4 activation domain, which leads to the transcriptional activation of a reporter gene (e.g., lacz, HIS3) which is operably linked to a Gal4 binding site.
  • a reporter gene e.g., lacz, HIS3
  • the two-hybrid method is used to identify novel polypeptide sequences which interact with a known protein (Silver and Hunt, 1993; Durfee et al, 1993; Yang et al, 1992; Luban et al, 1993; Hardy et al, 1992; Bartel et al, 1993; and Vojtek et al, 1993).
  • variations of the two-hybrid method have been used to identify mutations of a known protein that affect its binding to a second known protein (Li and Fields, 1993; Lalo et al, 1993; Jackson et al, 1993; and Madura et al, 1993).
  • Two-hybrid systems have also been used to identify interacting structural domains of two known proteins (Bardwell et al, 1993; Chakrabarty et al, 1992; Staudinger et al, 1993; and Milne and Weaver 1993) or domains responsible for oligomerization of a single protein (Iwabuchi et al, 1993; Bogerd et al, 1993). Variations of two-hybrid systems have been used to study the in vivo activity of a proteolytic enzyme (Dasmahapatra et al, 1992). Alternatively, an E.
  • coli BCCP interactive screening system (Germino et al, 1993; Guarente, 1993) can be used to identify interacting protein sequences (i.e., protein sequences which heterodimerize or form higher order he teromul timers). Sequences selected by a two-hybrid system can be pooled and shuffled and introduced into a two-hybrid system for one or more subsequent rounds of screening to identify polypeptide sequences which bind to the hybrid containing the predetermined binding sequence. The sequences thus identified can be compared to identify consensus sequence(s) and consensus sequence kernals. Improved methods for cellular engineering, protein expression profiling, differential labeling of peptides.
  • the invention relates to peptide chemistry, proteomics, and mass spectrometry technology.
  • the invention provides novel methods for determining polypeptide profiles and protein expression variations, as with proteome analyses.
  • the present invention provides methods of simultaneously identifying and quantifying individual proteins in complex protein mixtures by selective differential labeling of amino acid residues followed by chromatographic and mass spectrographic analysis.
  • the diagnosis and treatment, as well as the predisposition of, a variety of diseases and disorders may often be accomplished through identification and quantitative measurement of polypeptide expression variations between different cell types and cell states.
  • Biochemical pathways and metabolic networks can also be analyzed by globally and quantitatively measuring protein expression in various cell types and biological states (see, e.g., Ideker (2001) Science 292:929-934).
  • State-of-the-art techniques such as liquid-chromatography- electrospray-ionization tandem mass spectrometry have, in conjunction with database- searching computer algorithms, revolutionized the analysis of biochemical species from complex biological mixtures. With these techniques, it is now possible to perform high-throughput protein identification at picomolar to subpicomolar levels from complex mixtures of biological molecules (see, e.g., Dongre (1997) Trends Biotechnol. 15:418-425).
  • ICATs isotope-coded affinity tags
  • tandem mass spectrometry or ion trap mass spectrometry or a combination thereof The method labels multiple cysteinyl residues and uses stable isotope dilution techniques. For example, Gygi (1999) Nat. Biotechnol. 10:994-999, compared protein expression in a yeast using ethanol or galactose as a carbon source. The measured differences in protein expression conelated with known yeast metabolic function under glucose-repressed conditions.
  • two different protein mixtures for quantitative comparison are digested to peptide mixtures, the peptides mixtures are separately methylated using either dO- or d3 -methanol, the mixtures of methylated peptide combined and subjected to microcapillary HPLC-MS/MS (see, e.g., Goodlett, D. R., et al., (2000) "Differential stable isotope labeling of peptides for quantitation and de novo sequence derivation," 49th ASMS; Zhou, H; Watts, JD; Aebersold, R. A systematic approach to the analysis of protein phosphorylation.; Comment In: Nat Biotechnol.
  • Parent proteins of methylated peptides are identified by co ⁇ elative database searching of fragment ion spectra using a computer program assisted paradigms or automated de novo sequencing that compares all tandem mass spectra of dO- and d3-methylated peptide ion pairs. In Goodlett (2000) supra, ratios of proteins in two different mixtures were calculated for dO- to d3-methylated peptide pairs.
  • differential labeling reagents which relied on stable isotopes, which are expensive, and not flexible to differential labeling of more than two mixtures of peptides
  • labeling methods limited only to methylation of carboxy-termini
  • protein expression profiling limited to duplex comparison
  • one dimensional capillary HPLC chromatography was employed to separate peptides, which doesn't has enough capacity and resolving power for complex mixtures of peptides.
  • this invention provides a method for identifying proteins by differential labeling of peptides, the method comprising the following steps: (a) providing a sample comprising a polypeptide; (b) providing a plurality of labeling reagents which differ in molecular mass that can generate differential labeled peptides that do not differ in chromatographic retention properties and do not differ in ionization and detection properties in mass spectrographic analysis, wherein the differences in molecular mass are distinguishable by mass spectrographic analysis; (c) fragmenting the polypeptide into peptide fragments by enzymatic digestion or by non- enzymatic fragmentation; (d) contacting the labeling reagents of step (b) with the peptide fragments of step (c), thereby labeling the peptides with the differential labeling reagents; (e) separating the peptides by chromatography to generate an eluate; (f) feeding the eluate of step (e) into a mass spectrometer and quantifying the
  • the sample of step (a) comprises a cell or a cell extract.
  • the method can further comprise providing two or more samples comprising a polypeptide.
  • One or more of the samples can be derived from a wild type cell and one sample can be derived from an abnormal or a modified cell.
  • the abnormal cell can be a cancer cell.
  • the modified cell can be a cell that is mutagenized &/or treated with a chemical, a physiological factor, or the presence of another organism (including, e.g. a eukaryotic organism, prokaryotic organism, virus, vector, prion, or part thereof), &/or exposed to an environmental factor or change or physical force (including, e.g., sound, light, heat, sonication, and radiation).
  • the modification can be genetic change (including, for example, a change in DNA or RNA sequence or content) or otherwise.
  • the method further comprises purifying or fractionating the polypeptide before the fragmenting of step (c).
  • the method can further comprise purifying or fractionating the polypeptide before the labeling of step (d).
  • the method can further comprise purifying or fractionating the labeled peptide before the chromatography of step (e).
  • the purifying or fractionating comprises a method selected from the group consisting of size exclusion chromatography, size exclusion chromatography, HPLC, reverse phase HPLC and affinity purification.
  • the method further comprises contacting the polypeptide with a labeling reagent of step (b) before the fragmenting of step (c).
  • the labeling reagent of step (b) comprises the general formulae selected from the group consisting of: Z A OH and Z B OH, to esterify peptide C-terminals and/or Glu and Asp side chains; Z A NH 2 and Z B NH , to form amide bond with peptide C-terminals and or Glu and Asp side chains; and Z A CO 2 H and Z B CO 2 H.
  • Z A and Z B independently of one another comprise the general formula R-Z 1 - A'-Z 2 -A 2 -Z 3 -A 3 -Z 4 -A 4 -, Z 1 , Z 2 , Z 3 , and Z 4 independently of one another, are selected from the group consisting of nothing, O, OC(O), OC(S), OC(O)O, OC(O)NR, OC(S)NR, OSiRR 1 , S, SC(O), SC(S), SS, S(O), S(O 2 ), NR, NRR 1+ , C(O), C(O)O,
  • R 1 is an alkyl group
  • a 1 , A 2 , A 3 , and A 4 independently of one another, are selected from the group consisting of nothing or (CRR ) mention, wherein R, R , independently from other R and R 1 in Z 1 to Z 4 and independently from other R and R 1 in A 1 to A 4 , are selected from the group consisting of a hydrogen atom, a halogen atom and an alkyl group; "n" in Z 1 to Z 4 , independent of n in A 1 to A 4 , is an integer having a value selected from the group consisting of
  • the alkyl group (see definition below) is selected from the group consisting of an alkenyl, an alkynyl and an aryl group.
  • One or more C-C bonds from (CRR 1 ), can be replaced with a double or a triple bond; thus, in alternative aspects, an R or an R 1 group is deleted.
  • the (CRR 1 ), can be selected from the group consisting of an ⁇ -arylene, an w-arylene and a 7-arylene, wherein each group has none or up to 6 substituents.
  • the (CRR ) itself can be selected from the group consisting of a carbocyclic, a bicyclic and a tricyclic fragment, wherein the fragment has up to 8 atoms in the cycle with or without a heteroatom selected from the group consisting of an O atom, a N atom and an S atom.
  • two or more labeling reagents have the same structure but a different isotope composition.
  • Z has the same structure as Z
  • Z has a different isotope composition than Z .
  • the isotope is boron- 10 and boron- 11; carbon- 12 and carbon- 13; nitrogen- 14 and nitrogen-15; and, sulfur-32 and sulfur-34.
  • x is greater than y.
  • x and y are between 1 and about 11, between 1 and about 21, between 1 and about 31, between 1 and about 41, or between 1 and about 51.
  • the labeling reagent of step (b) can comprise the general formulae selected from the group consisting of: Z A OH and Z B OH to esterify peptide C-terminals; Z A NH 2 / Z B NH 2 to fonn an amide bond with peptide C-terminals; and, Z A CO 2 H / Z B CO 2 H to form an amide bond with peptide N-terminals; wherein Z A and Z B have the general formula R-Z 1 -A 1 -Z 2 -A 2 -Z 3 -A 3 -Z 4 -A 4 - ; Z 1 , Z 2 , Z 3 , and Z 4 , independently of one another, are selected from the group consisting of nothing, O, OC(O), OC(S), OC(O)O, OC(O)NR, OC(S)NR, OSiRR 1 , S, SC(O), SC(S), SS, S(O), S(O 2 ), NR, NRR
  • a single C-C bond in a (CRR')n group is replaced with a double or a triple bond; thus, the R and R 1 can be absent.
  • the (CRR ⁇ n can comprise a moiety selected from the group consisting of an ⁇ -arylene, an m-arylene and ap- arylene, wherein the group has none or up to 6 substituents.
  • the group can comprise a carbocyclic, a bicyclic, or a tricyclic fragments with up to 8 atoms in the cycle, with or without a heteroatom selected from the group consisting of an O atom, an N atom and an S atom.
  • R, R 1 independently from other R and R 1 in Z 1 - Z 4 and independently from other R and R 1 in A 1 - A 4 , are selected from the group consisting of a hydrogen atom, a halogen and an alkyl group.
  • the alkyl group (see definition below) can be an alkenyl, an alkynyl or an aryl group.
  • the "n" in Z 1 - Z 4 is independent of n in A 1 - A 4 and is an integer selected from the group consisting of about 51 ; about 41 ; about 31 ; about 21 , about 11 and about 6.
  • Z A has the same structure a Z B but Z A further comprises x number of -CH 2 - fragment(s) in one or more A - A fragments, wherein x is an integer. In one aspect, Z A has the same structure a Z B but Z A further comprises x number of -CF 2 - fragment(s) in one or more A 1 - A 4 fragments, wherein x is an integer. In one aspect, Z A comprises x number of protons and Z B comprises y number of halogens in the place of protons, wherein x and y are integers.
  • Z contains x number of protons and Z B contains ⁇ number of halogens, and there are x - y number of protons remaining in one or more A 1 - A 4 fragments, wherein x andy are integers.
  • Z A further comprises x number of -O- fragment(s) in one or more A - A fragments, wherein x is an integer.
  • Z further comprises x number of -S- fragment(s) in one or more A 1 - A 4 fragments, wherein x is an integer.
  • Z A further comprises x number of -O- fragment(s) and Z B further comprises y number of-S- firagment(s) in the place of-O- fragment(s), wherein x and y are integers.
  • Z A further comprises x - y number of -O- fragment(s) in one or more A 1 - A 4 fragments, wherein x and y are integers.
  • x and are integers selected from the group consisting of between 1 about 51; between 1 about 41; between 1 about 31; between 1 about 21, between 1 about 11 and between 1 about 6, wherein x is greater than y.
  • n, m and y are integers selected from the group consisting of about 51; about 41; about 31; about 21, about 11; about 6 and between about 5 and 51.
  • the separating of step (e) comprises a liquid chromatography system, such as a multidimensional liquid chromatography or a capillary chromatography system.
  • the mass spectrometer comprises a tandem mass spectrometry device or an ion trap mass spectrometer or a combination thereof.
  • the method further comprises quantifying the amount of each polypeptide or each peptide.
  • the invention provides a method for defining the expressed proteins associated with a given cellular state, the method comprising the following steps: (a) providing a sample comprising a cell in the desired cellular state; (b) providing a plurality of labeling reagents which differ in molecular mass that can generate differential labeled peptides that do not differ in chromatographic retention properties and do not differ in ionization and detection properties in mass spectrographic analysis, wherein the differences in molecular mass are distinguishable by mass spectrographic analysis; (c) fragmenting polypeptides derived from the cell into peptide fragments by enzymatic digestion or by non-enzymatic fragmentation; (d) contacting the labeling reagents of step (b) with the peptide fragments of step (c
  • the invention provides a method for quantifying changes in protein expression between at least two cellular states, the method comprising the following steps: (a) providing at least two samples comprising cells in a desired cellular state; (b) providing a plurality of labeling reagents which differ in molecular mass that can generate differential labeled peptides that do not differ in chromatographic retention properties and do not differ in ionization and detection properties in mass spectrographic analysis, wherein the differences in molecular mass are distinguishable by mass spectrographic analysis; (c) fragmenting polypeptides derived from the cells into peptide fragments by enzymatic digestion or by non-enzymatic fragmentation;
  • step (d) contacting the labeling reagents of step (b) with the peptide fragments of step (c), thereby labeling the peptides with the differential labeling reagents, wherein the labels used in one same are different from the labels used in other samples; (e) separating the peptides by chromatography to generate an eluate; (f) feeding the eluate of step
  • the invention provides a method for identifying proteins by differential labeling of peptides, the method comprising the following steps: (a) providing a sample comprising a polypeptide; (b) providing a plurality of labeling reagents which differ in molecular mass but do not differ in chromatographic retention properties and do not differ in ionization and detection properties in mass spectrographic analysis, wherein the differences in molecular mass are distinguishable by mass spectrographic analysis; (c) fragmenting the polypeptide into peptide fragments by enzymatic digestion or by non-enzymatic fragmentation; (d) contacting the labeling reagents of step (b) with the peptide fragments of step (c), thereby labeling the peptides with the differential labeling reagents; (e) separating the peptides by multidimensional liquid chromatography to generate an eluate; (f) feeding the eluate of step (e) into a tandem mass spectrometer or an ion trap mass spectrometer or a
  • the invention provides a chimeric labeling reagent comprising (a) a first domain comprising a biotin; and (b) a second domain comprising a reactive group capable of covalently binding to an amino acid, wherein the chimeric labeling reagent comprises at least one isotope.
  • the isotope(s) can be in the first domain or the second domain.
  • the isotope(s) can be in the biotin.
  • the isotope can be a deuterium isotope, a boron- 10 or boron- 11 isotope, a carbon- 12 or a carbon- 13 isotope, a nitrogen- 14 or a nitrogen-15 isotope, or, a sulfur-32 or a sulfur-34 isotope.
  • the chimeric labeling reagent can comprise two or more isotopes.
  • the chimeric labeling reagent reactive group capable of covalently binding to an amino acid can be a succimide group, an isothiocyanate group or an isocyanate group.
  • the reactive group can be capable of covalently binding to an amino acid binds to a lysine or a cysteine.
  • the chimeric labeling reagent can further comprising a linker moiety linking the biotin group and the reactive group.
  • the linker moiety can comprise at least one isotope.
  • the linker is a cleavable moiety that can be cleaved by, e.g., enzymatic digest or by reduction.
  • the invention provides a method of comparing relative protein concentrations in a sample comprising (a) providing a plurality of differential small molecule tags, wherein the small molecule tags are structurally identical but differ in their isotope composition, and the small molecules comprise reactive groups that covalently bind to cysteine or lysine residues or both; (b) providing at least two samples comprising polypeptides; (c) attaching covalently the differential small molecule tags to amino acids of the polypeptides; (d) determining the protein concentrations of each sample in a tandem mass spectrometer or an ion trap mass spectrometer or a combination thereof; and, (d) comparing relative protein concentrations of each sample.
  • the sample comprises a complete or a fractionated cellular sample.
  • the differential small molecule tags comprise a chimeric labeling reagent comprising (a) a first domain comprising a biotin; and, (b) a second domain comprising a reactive group capable of covalently binding to an amino acid, wherein the chimeric labeling reagent comprises at least one isotope.
  • the isotope can be a deuterium isotope, a boron- 10 or boron- 11 isotope, a carbon- 12 or a carbon- 13 isotope, a nitrogen- 14 or a nitrogen- 15 isotope, or, a sulfur- 32 or a sulfur-34 isotope.
  • the chimeric labeling reagent can comprise two or more isotopes.
  • the reactive group can be capable of covalently binding to an amino acid is selected from the group consisting of a succimide group, an isothiocyanate group and an isocyanate group.
  • the invention provides a method of comparing relative protein concentrations in a sample comprising (a) providing a plurality of differential small molecule tags, wherein the differential small molecule tags comprise a chimeric labeling reagent comprising (i) a first domain comprising a biotin; and, (ii) a second domain comprising a reactive group capable of covalently binding to an amino acid, wherein the chimeric labeling reagent comprises at least one isotope; (b) providing at least two samples comprising polypeptides; (c) attaching covalently the differential small molecule tags to amino acids of the polypeptides; (d) isolating the tagged polypeptides on a biotin-binding column by binding tagged polypeptides to the column, washing non-bound materials off the column, and eluting tagged polypeptides off the column; (e) determining the protein concentrations of each sample in a tandem mass spectrometer or an ion trap mass spectrometer or a combination thereof;
  • the invention provides methods for simultaneously identifying individual proteins in complex mixtures of biological molecules and quantifying the expression levels of those proteins, e.g., proteome analyses.
  • the methods compare two or more samples of proteins, one of which can be considered as the standard sample and all others can be considered as samples under investigation.
  • the proteins in the standard and investigated samples are subjected separately to a series of chemical modifications, i.e., differential chemical labeling, and fragmentation, e.g., by proteolytic digestion and/or other enzymatic reactions or physical fragmenting methodologies.
  • the chemical modifications can be done before, or after, or before and after fragmentation/ digestion of the polypeptide into peptides.
  • Peptides derived from the standard and the investigated samples are labeled with chemical residues of different mass, but of similar properties, such that peptides with the same sequence from both samples are eluted together in the separation procedure and their ionization and detection properties regarding the mass spectrometry are very similar.
  • Differential chemical labeling can be performed on reactive functional groups on some or all of the carboxy- and/or amino- termini of proteins and peptides and/or on selected amino acid side chains.
  • a combination of chemical labeling, proteolytic digestion and other enzymatic reaction steps, physical fragmentation and or fractionation can provide access to a variety of residues to general different specifically labeled peptides to enhance the overall selectivity of the procedure.
  • Mass spectrometry data is processed by special software, which allows for identification and quantification of peptides and proteins.
  • LC-LC-MS/MS combined mixtures of peptides are first separated by a chromatography method, such as a multidimensional liquid chromatography system of the invention, before being fed into a coupled mass spectrometry device, such as a tandem mass spectrometry device or an ion trap mass spectrometer or a combination thereof.
  • a coupled mass spectrometry device such as a tandem mass spectrometry device or an ion trap mass spectrometer or a combination thereof.
  • the combination of multidimensional liquid chromatography and tandem mass spectrometry can be called "LC-LC-MS/MS.”
  • LC- LC-MS/MS was first developed by Link A. and Yates J.
  • Another exemplary system of the invention comprises the combination of multidimensional liquid chromatography and tandem mass spectrometry and an ion trap mass spectrometry, designated 3D LC LCQ MS/MS or 3D LC LTQ MS/MS, as described herein (e.g., comprising Finnigan MDLC LTQTM or LTQ FTTM, Thermo Electron Co ⁇ oration, San Jose, CA, or Agilent's LC/MSD Trap (Agilent Technologies, Palo Alto, CA), or an equivalent mass spectrometer).
  • proteins can be first substantially or partially isolated from the biological samples of interest.
  • the polypeptides can be treated before selective differential labeling; for example, they can be denatured, reduced, preparations can be desalted, and the like. Conversion of samples of proteins into mixtures of differentially labeled peptides can include preliminary chemical and/or enzymatic modification of side groups and or termini; proteolytic digestion or fragmentation; post-digestion or post-fragmentation chemical and/or enzymatic modification of side groups and/or termini. The differentially modified polypeptides and peptides are then combined into one or more peptide mixtures. Solvent or other reagents can be removed, neutralized or diluted, if desired or necessary.
  • the buffer can be modified, or, the peptides can be re-dissolved in one or more different buffers, such as a "MudPIT" (see below) loading buffer.
  • the peptide mixture is then loaded onto chromatography column, such as a liquid chromatography column, a 2D capillary column or a multidimensional chromatography column, to generate an eluate.
  • the eluate is fed into a mass spectrometer, such as a tandem mass spectrometer, an ion trap mass spectrometer (LCQ or LTQ) or a combination thereof
  • LCQ or LTQ ion trap mass spectrometer
  • data output is processed by appropriate software using database searching and data analysis.
  • high yields of peptides can generated for mass spectrograph analysis.
  • Two or more samples can be differentially labeled by selective labeling of each sample.
  • Peptide modifications, i.e., labeling are stable.
  • Reagents having differing masses or reactive groups can be chosen to maximize the number of reactive groups and differentially labeled samples, thus allowing for a multiplex analysis of sample, polypeptides and peptides.
  • a "MudPIT" protocol is used for peptide analysis, as described herein.
  • the methods of the invention can be fully automated and can essentially analyze every protein in a sample.
  • alkyl is used to refer to a genus of compounds including branched or unbranched, saturated or unsaturated, monovalent hydrocarbon radicals, including substituted derivatives and equivalents thereof.
  • the hydrocarbons have from about 1 to about 100 carbons, about 1 to about 50 carbons or about 1 to about 30 carbons, about 1 to about 20 carbons, about 1 to about 10 carbons.
  • alkyl group When the alkyl group has from about 1 to 6 carbon atoms, it is refened to as a "lower alkyl.”
  • Suitable alkyl radicals include, e.g., structures containing one or more methylene, methine and/or methyne groups ananged in acyclic and/or cyclic forms. Branched structures have a branching motif similar to isopropyl, tert-butyl isobutyl, 2-ethylpropyl, etc.
  • substituted alkyl refers to alkyl as just described including one or more functional groups such as lower alkyl, aryl, acyl, halogen (i.e., alkylhalos, e.g., CF3), hydroxy, amino, alkoxy, alkylamino, acylamino, thioamido, acyloxy, aryloxy, arylamino, aryloxyalkyl, mercapto, thia, aza, oxo, both saturated and unsaturated cyclic hydrocarbons, heterocycles and the like. These groups may be attached to any carbon of the alkyl moiety. Additionally, these groups may be pendent from, or integral to, the alkyl chain.
  • alkoxy is used herein to refer to the to a COR group, where
  • R is a lower alkyl, substituted lower alkyl, aryl, substituted aryl, arylalkyl or substituted arylalkyl wherein the alkyl, aryl, substituted aryl, arylalkyl and substituted arylalkyl groups are as described herein.
  • Suitable alkoxy radicals include, for example, methoxy, ethoxy, phenoxy, substituted phenoxy, benzyloxy phenethyloxy, tert.-butoxy, etc.
  • aryl is used herein to refer to an aromatic substituent that may be a single aromatic ring or multiple aromatic rings which are fused together, linked covalently, or linked to a common group such as a methylene or ethylene moiety.
  • the common linking group may also be a carbonyl as in benzophenone.
  • the aromatic ring(s) may include phenyl, naphthyl, biphenyl, diphenylmethyl and benzophenone among others.
  • aryl encompasses
  • arylalkyl refers to aryl as just described including one or more functional groups such as lower alkyl, acyl, halogen, alkylhalos (e.g., CF3), hydroxy, amino, alkoxy, alkylamino, acylamino, acyloxy, phenoxy, mercapto and both saturated and unsaturated cyclic hydrocarbons which are fused to the aromatic ring(s), linked covalently or linked to a common group such as a methylene or ethylene moiety.
  • the linking group may also be a carbonyl such as in cyclohexyl phenyl ketone.
  • substituted aryl encompasses “substituted arylalkyl.”
  • arylalkyl is used herein to refer to a subset of “aryl” in which the aryl group is further attached to an alkyl group, as defined herein.
  • biotin refers to any natural or synthetic biotin or variant thereof, which are well known in the art; ligands for biotin, and ways to modify the affinity of biotin for a ligand, are also well known in the art; see, e.g., U.S. Patent Nos.
  • labeling reagents which ... do not differ in ionization and detection properties in mass spectrographic analysis means that the amount and/or mass sequence of the labeling reagents can be detected using the same mass spectrographic conditions and detection devices.
  • polypeptide includes natural and synthetic polypeptides, or mimetics, which can be either entirely composed of synthetic, non-natural analogues of amino acids, or, they can be chimeric molecules of partly natural peptide amino acids and partly non-natural analogs of amino acids.
  • polypeptide as used herein includes proteins and peptides of all sizes.
  • sample as used herein includes any polypeptide-containing sample, including samples from natural sources, or, entirely synthetic samples.
  • column as used herein means any substrate surface, including beads, filaments, anays, tubes and the like.
  • chromatographic retention properties as used herein means that two compositions have substantially, but not necessary exactly, the same retention properties in a chromatograph, such as a liquid chromatograph. For example, two compositions do not differ in chromatographic retention properties if they elute together, i.e., they elute in what a skilled artisan would consider the same elution fraction.
  • proteins and peptides are subjected to a series of chemical modifications, i.e., differential chemical labeling.
  • the chemical modifications can be done before, or after, or before and after fragmentation/ digestion of the polypeptide into peptides.
  • Differential labeling reagents can differ in their isotope composition (i.e., isotopical reagents), in their structural composition (i.e., homologous reagents), but by a rather small fragment which change does not alter the properties stated above, i.e., the labeling reagent differ in molecular mass but do not differ in chromatographic retention properties and do not differ in ionization and detection properties in mass spectrographic analysis, and the differences in molecular mass are distinguishable by mass spectrographic analysis.
  • mixtures of polypeptides and/or peptides coming from the "standard" protein sample and the "investigated” protein sample(s) are labeled separately with differential reagents, or, one sample is labeled and other sample remains unlabeled.
  • differential reagents differ in molecular mass, but do not differ in retention properties regarding the separation method used (e.g., chromatography) and the mass spectrometry methods used will not detect different ionization and detection properties.
  • differential reagents differ either in their isotope composition (i.e., they are isotopical reagents) or they differ structurally by a rather small fragment which change does not alter the properties stated above (i.e., they are homologous reagents).
  • Differential chemical labeling can include esterification of C-termini, amidation of C-termini and/or acylation of N-termini. Esterification targets C-termini of peptides and carboxylic acid groups in amino acid side chains. Amidation targets C-termini of peptides and carboxylic acid groups in amino acid side chains. Amidation may require protection of amine groups first.
  • reagents comprise the general formulae: Z A OH and Z B OH to esterify peptide C-terminals and or Glu and Asp side chains; Z A NH 2 / Z B NH 2 to form amide bond with peptide C-terminals and/or Glu and Asp side chains; or Z A CO 2 H / Z B CO 2 H to form amide bond with peptide N-terminals and/or Lys and Arg side chains; wherein Z A and Z B independently of one another can be R-Z 1 -A 1 -Z 2 -
  • a 2 -Z 3 -A 3 -Z 4 -A 4 - , and Z 1 , Z 2 , Z 3 , and Z 4 independently of one another can be selected from O, OC(O), OC(S), OC(O)O, OC(O)NR, OC(S)NR, OSiRR 1 , S, SC(O), SC(S), SS, S(O), S(O 2 ), NR, NRR 1+ , C(O), C(O)O, C(S), C(S)O, C(O)S, C(O)NR, C(S)NR, SiRR 1 , (Si(RR')O)n, SnRR 1 , Sn(RR ] )O, BR(OR'), BRR 1 , B(0R)(0R') , OBR(OR'), OBRR 1 , OB(OR)(OR'), or, Z 1 , Z 2 , Z 3 , and Z 4 independently of one another may be
  • some single C-C bonds from (CRR')n may be replaced with double or triple bonds, in which case some groups R and R 1 will be absent
  • (CRR')n can be an o- arylene, an -arylene, or a -arylene with up to 6 substituents, carbocyclic, bicyclic, or tricyclic fragments with up to 8 atoms in the cycle with or without heteroatoms (O, N, S) and with or without substituents, or A 1 , A 2 , A 3 , and A 4 independently of one another can be absent;
  • R, R 1 independently from other R and R 1 in Z 1 - Z 4 and independently from other R and R 1 in A 1 - A 4
  • n in Z 1 - Z 4 independent of n in A 1 - A 4 , is an integer that can have value from 0 to about 51
  • Z A contains x number of protons
  • Z may contain y number of deuterons in the place of protons, and, conespondingly, x - y number of protons remaining
  • Z A contains x number of borons- 10
  • Z may contain y number of borons- 11 in the place of borons- 10, and, conespondingly, x - y number of borons- 10 remaining
  • Z A contains x number of carbons- 12
  • Z may contain y number of carbons- 13 in the place of carbons-12, and, conespondingly, x - y number of carbons-12 remaining
  • Z A contains x number of nitrogens- 14
  • Z B may contain y number of nitrogens- 15 in the place of nitrogens- 14, and, conespondingly, x - y number of nitrogens- 14 remaining
  • Z may contain y number of sulfurs-32
  • Z may contain y number of
  • x and y are between 1 and about 11 , between 1 and about 21 , between 1 and about 31 , between 1 and about 41, between 1 and about 51.
  • Z A OH and Z B OH to esterify peptide C-terminals;
  • Z A NH 2 / Z B NH 2 to form an amide bond with peptide C-terminals;
  • Z A CO 2 H / Z B CO 2 H to form an amide bond with peptide N-terminals;
  • Z A and Z B can be R-Z 1 -A 1 -Z 2 -A 2 -Z 3 -A 3 -Z 4 -A 4 - and Z 1 , Z 2 , Z 3 , and Z 4 , independently of one another, can be selected from O, OC(O), OC(S), OC(O)O, OC(O)NR, OC(S)NR, OSiRR 1 , S, SC(O), SC(S), SS, S(O), S(O 2 ), NR, NRR 1+ , C(O), C(O)O, C(S), C(S)O, C
  • single C-C bonds in some (CRR')n groups may be replaced with double or triple bonds, in which case some groups R and R will be absent, or (CRR )n can be an o-arylene, an w-arylene, or ap-arylene with up to 6 substituents, or a carbocyclic, a bicyclic, or a tricyclic fragments with up to 8 atoms in the cycle, with or without heteroatoms (e.g., O, N or S atoms), or, with or without substituents, or, A 1 - A 4 independently of one another may be absent;
  • R, R 1 independently from other R and R 1 in Z 1 - Z 4 and independently from other R and R 1 in A - A 4 , can be a hydrogen atom, a halogen or an alkyl group, such as an alkenyl, an alkynyl or an aryl group;
  • n in Z 1 - Z 4 is independent of n
  • Z A has a similar structure to that of Z B , but Z A has x extra -CH 2 - fragment(s) in one or more A 1 - A 4 fragments, and or Z A has x extra -CF 2 - fragment(s) in one or more A 1 - A 4 fragments.
  • Z A can contain x number of protons and Z B may contain >> number of halogens in the place of protons.
  • Z contains x number of protons and Z contains y number of halogens
  • x andy are integers that can have value of between 1 about 51; of between 1 about 41; of between 1 about 31; of between 1 about 21, of between 1 about 11; of between 1 about 6, such that x is greater thany.
  • a liquid chromatography is used, e.g., a multidimensional liquid chromatography, such as the mixed bed multidimensional liquid chromatograph of the invention.
  • a chromatogram eluate is coupled to a mass spectrometer, such as a tandem mass spectrometry device (e.g., a "3D LC-LC-MS/MS" system of the invention, as described herein), or an ion trap mass spectrometer (e.g., 3D LC LCQ MS/MS or 3D LC LTQ MS/MS systems of the invention, as described herein), or a combination of LC-LTQ-MS/MS or LC-LCQ-MS/MS and LC-LC-MS/MS. Any variation and equivalent thereof can be used to separate and detect peptides.
  • LC-LC-MS/MS was first developed by Link A. and Yates J.
  • LC-LC-MS/MS as described, e.g., in (Link (1999) Nature Biotechnology 17:676-682; Link (2000) Electrophoresis 18, 1314-1334.
  • the LC-LC-MS/MS technique is used; it is effective for complexed peptide separation and it is easily automated.
  • LC-LC-MS/MS is commonly known by the acronym "MudPIT,” for "Multi-dimensional Protein Identification Technique.”
  • Variations and equivalents of LC-LC-MS/MS and LC-LCQ-MS/MS or LC-LTQ-MS/MS systems of the invention used in the methods of the invention include methodologies involving reverse phase columns coupled to either cation exchange columns (as described, e.g., by Opiteck (1997) Anal. Chem.
  • an LC-LC-MS/MS or LC-LCQ-MS/MS or LC-LTQ- MS/MS technique uses a mixed bed microcapillary column containing strong cation exchange (SCX) and reverse phase (RPC) resins.
  • SCX strong cation exchange
  • RPC reverse phase
  • Other exemplary alternatives include protein fractionation combined with one-dimensional LC-ESI MS/MS or peptide fractionation combined MALDI MS/MS.
  • any protein fractionation method including size exclusion chromatography, ion exchange chromatography, reverse phase chromatography, or any of the possible affinity purifications, can be introduced prior to labeling and proteolysis. In some circumstances, use of several different methods may be necessary to identify all proteins or specific proteins in a sample.
  • both quantity and sequence identity of the protein from which the modified peptide originated is determined by a mass spectrometry device, such as a "multistage mass spectrometry" (MS), including 3D LC-LC-MS/MS or LC-LCQ-MS/MS or LC-LTQ-MS/MS systems of the invention, as described herein.
  • MS multistage mass spectrometry
  • This can be achieved by the operation of the mass spectrometer in a dual mode in which it alternates in successive scans between measuring the relative quantities of peptides eluting from the capillary column and recording the sequence information of selected peptides.
  • Peptides are quantified by measuring in the MS mode the relative signal intensities for pairs or series of peptide ions of identical sequence that are tagged differentially, which therefore differ in mass by the mass differential encoded within the differential labeling reagents.
  • Peptide sequence information can be automatically generated by selecting peptide ions of a particular mass-to-charge (m z) ratio for collision-induced dissociation (CID) in the mass spectrometer operating in the tandem MS mode, as described, e.g., by Link (1997) Electrophoresis 18: 1314-1334; Gygi (1999) Nature Biotechnol. 17:994-999; Gygi (1999) Cell Biol. 19:1720-1730.
  • m z mass-to-charge
  • CID collision-induced dissociation
  • tandem mass spectra can be conelated to sequence databases to identify the protein from which the sequenced peptide originated.
  • Exemplary commercial available softwares include TURBO SEQUESTTM by Thermo Finnigan, San Jose, CA; MASSSCOTTM by Matrix Science, SONAR MS/MSTM by Proteometrics. Routine software modifications may be necessary for automated relative quantification.
  • Mass spectrometry devices can use mass spectrometry to identify and quantify differentially labeled peptides and polypeptides. Any mass spectrometry system can be used.
  • combined mixtures of peptides are separated by a chromatography system of the invention comprising multidimensional liquid chromatography coupled to tandem mass spectrometry, or, "LC-LC-MS/MS,” see, e.g., Link (1999) Biotechnology 17:676-682; Link (1999)
  • Electrophoresis 18: 1314-1334 combined mixtures of peptides are separated by a chromatography method comprising a multidimensional liquid chromatography system of the invention coupled to a combination tandem mass spectrometry and an ion trap mass spectrometry device of the invention, or, LC-LCQ- MS/MS or LC-LTQ-MS/MS, as described herein.
  • Exemplary ion trap mass spectrometry devices that can be used in the systems and methods of the invention include, for example, the LCQ Deca XPTM electrospray ionization/ion trap mass spectrometer, including a Finnigan LCQ Deca XPTM or LCQ Deca XP MAXTM, or MDLC LTQTM, from Thermo Electron Co ⁇ oration, San Jose, CA, , or Agilent's LC/MSD Trap (Agilent Technologies, Palo Alto, CA), or an equivalent mass spectrometer.
  • the LCQ Deca XPTM electrospray ionization/ion trap mass spectrometer including a Finnigan LCQ Deca XPTM or LCQ Deca XP MAXTM, or MDLC LTQTM, from Thermo Electron Co ⁇ oration, San Jose, CA, , or Agilent's LC/MSD Trap (Agilent Technologies, Palo Alto
  • a sample can be introduced by direct infusion using a syringe pump, by flow injection using a injection valve and an LC pump, or by LC fitted with a column (LC/MS).
  • exemplary mass spectrometry devices also include those inco ⁇ orating matrix-assisted laser deso ⁇ tion-ionization-time-of-flight (MALDI-TOF) mass spectrometry (see, e.g., Isola (2001) Anal. Chem. 73:2126-2131; Van de Water (2000) Methods Mol. Biol. 146:453-459; Griffin (2000) Trends Biotechnol. 18:77-84; Ross (2000) Biotechniques 29:620-626, 628-629).
  • MALDI-TOF matrix-assisted laser deso ⁇ tion-ionization-time-of-flight
  • MALDI-TOF MS The inherent high molecular weight resolution of MALDI-TOF MS conveys high specificity and good signal-to-noise ratio for performing accurate quantitation.
  • Use of mass spectrometry, including MALDI-TOF MS, and its use in detecting nucleic acid hybridization and in nucleic acid sequencing, is well known in the art, see, e.g., U.S. Patent Nos. 6,258,538; 6,238,871; 6,238,869; 6,235,478; 6,232,066; 6,228,654; 6,225,450; 6,051,378; 6,043,031.
  • polypeptides can be fragmented, e.g., by proteolytic, i.e., enzymatic, digestion and/or other enzymatic reactions or physical fragmenting methodologies.
  • the fragmentation can be done before and/or after reacting the peptides/ polypeptides with the labeling reagents used in the methods of the invention.
  • Methods for proteolytic cleavage of polypeptides are well known in the art, e.g., enzymes include trypsin (see, e.g., U.S. Patent No. 6,177,268; 4,973,554), chymotrypsin (see, e.g., U.S. Patent No.
  • a chimeric labeling reagent of the invention includes a cleavable linker.
  • cleavable linker sequences include, e.g., Factor Xa or enterokinase (Invitrogen, San Diego CA).
  • purification facilitating domains can be used, such as metal chelating peptides, e.g., polyhistidine tracts and histidine- tryptophan modules that allow purification on immobilized metals, protein A domains that allow purification on immobilized immunoglobulin, and the domain utilized in the FLAGS extension/affinity purification system (Immunex Co ⁇ , Seattle WA).
  • metal chelating peptides e.g., polyhistidine tracts and histidine- tryptophan modules that allow purification on immobilized metals
  • protein A domains that allow purification on immobilized immunoglobulin
  • domain utilized in the FLAGS extension/affinity purification system Immunex Co ⁇ , Seattle WA.
  • the invention provides a method for quantifying changes in protein expression between at least two cellular states, such as, an activated cell versus a resting cell, a normal cell versus a cancerous cell, a stem cell versus a differentiated cell, an injured cell or infected cell versus an uninjured cell or uninfected cell; or, for defining the expressed proteins associated with a given cellular state.
  • Sample can be derived from any biological source, including cells from, e.g., bacteria, insects, yeast, mammals and the like.
  • the proteome of the Bacillus anthracis microbe is analyzed using the mixed bed multi-dimensional liquid chromatographs and methods of the invention.
  • Cells can be harvested from any body fluid or tissue source, or, they can be in vitro cell lines or cell cultures.
  • Detection Devices and Methods can also inco ⁇ orate in whole or in part designs of detection devices as described, e.g., in U.S. Patent Nos.
  • JA, Prime SB, Plait AE, Stoney RM Computer-assisted methods and apparatus for identification and characterization of biomolecules in a biological sample.
  • Alting-Mecs MA and Short JM Polycos vectors: a system for packaging filamentous phage and phagemid vectors using lambda phage packaging extracts. Gene 137: 1, 93-
  • Arkin AP and Youvan DC An algorithm for protein engineering: simulations of recursive ensemble mutagenesis. Proc Natl Acad Sci USA 89(16):7811-7815, (Aug 15)
  • Haemophilus gallinarum (Hga I). Proc Natl Acad Sci USA 74(8):3213-6, (Aug) 1977.
  • Caldwell RC and Joyce GF Randomization of genes by PCR mutagenesis.
  • Caton AJ and Koprowski H Influenze virus hemagglutinin-specific antibodies isolated from a combinatorial expression library are closely related to the immune response of the donor. Proc Natl Acad Sci USA 87(16):6450-6454, 1990.
  • Chothia C and Lesk AM Canonical structures for the hypervariable regions of immunoglobulins. JMolBiol 196)4):901-917, 1987.
  • the retinoblastoma protein associates with the protein phosphatase type 1 catalytic subunit. Genes Dev 7(4):555-569, 1993.
  • Fields S and Song 0 A novel genetic system to detect protein-protein interactions.
  • Germino FJ Wang ZX, Weissman SM: Screening for in vivo protein-protein interactions. Proc Natl Acad Sci USA 90(3):933-937, 1993.
  • Gingeras TR Brooks JE: Cloned restriction/modification system from Pseudomonas aeruginosa. Proc Natl Acad Sci USA 80(2):402-6, 1983 (Jan).
  • Gluzman Y SV40-transformed simian cells support the replication of early SV40 mutants. Cell 23(1): 175-182, 1981.
  • Gottschalk G Bacterial Metabolism. 2 nd ed. New York: Springer- Verlag Inc., 1986.
  • Gansemans Y, Collen D Biochemical characterization of single-chain chimeric plasminogen activators consisting of a single-chain Fv fragment of a fibrin-specific antibody and single-chain urokinase. Eur J Biochem 210(3):945-952, 1992.
  • Kettleborough CA Ansell KH, Allen RW, Rosell-Vives E, Gussow DH, Bendig MM:
  • Li B and Fields S Identification of mutations in p53 that affect its binding to SV40 large
  • Milne GT and Weaver DT Dominant negative alleles of RAD52 reveal a DNA repair/ recombination complex including Rad51 and Rad52. Genes Dev 7(9):1755-1765, 1993.
  • Nath K, Azzolina BA in Gene Amplification and Analysis (ed. Chirikjian JG), vol. 1, p. 113, Elsevier North Holland, Inc., New York, New York, ⁇ 1981.
  • Needleman SB and Wunsch CD A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443-453, 1970.
  • Oiler AR, Vanden Broek W, Conrad M, Topal MD Ability of DNA and spermidine to affect the activity of restriction endonucleases from several bacterial species.
  • Reidhaar-Olson JF and Sauer RT Combinatorial cassette mutagenesis as a probe of the informational content of protein sequences. Science 241(4861):53-57, 1988. Riechmann L and Weill M: Phage display and selection of a site-directed randomized single-chain antibody Fv fragment for its affinity improvement. Biochemistry 32(34):8848-8855, 1993.
  • Segel IH Enzyme Kinetics: Behavior and Analysis of Rapid Equilibrium and Steady- State Enzyme Systems. New York: John Wiley & Sons, Inc., 1993. Silver SC and Hunt SW 3d: Techniques for cloning cDNAs encoding interactive transcriptional regulatory proteins. Mol Biol Rep 17(3):155-165, 1993. Smith TF, Waterman MS, Fitch WM: Comparative biosequence metrics. J MolEvol S18(l):38-46, 1981.
  • Staudinger J, Perry M, Elledge SJ, Olson EN Interactions among vertebrate helix-loop- helix proteins in yeast using the two-hybrid system. J Biol Chem 268(7):4608-4611, 1993.
  • Stemmer WP Morris SK, Wilson BS: Selection of an active single chain Fv antibody from a protein linker library prepared by enzymatic inverse PCR. Biotechniques 14(2):256-265, 1993.
  • Stemmer WP DNA shuffling by random fragmentation and reassembly: in vitro recombination for molecular evolution. Proc Natl Acad Sci USA 91(22): 10747-10751,
  • Tague BW, Dickinson CD, Chrispeels MJ A short domain of the plant vacuolar protein phytohemagglutinin targets invertase to the yeast vacuole. Plant Cell 2(6): 533-46, (June)
  • Thiesen HJ and Bach C Target Detection Assay (TDA): a versatile procedure to determine DNA binding sites as demonstrated on SP1 protein.
  • Tingey SV, Walker EL, Corruzzi GM Glutamine synthetase genes of pea encode distinct polypeptides which are differentially expressed in leaves, roots and nodules.
  • Tuerk C and Gold L Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249(4968):505-510, 1990.
  • Schimmel PR Method for deletion of a gene from a bacteria.
  • Nienhuis, James Identification and localization and introgression into plants of desired multigenic traits.
  • van de Poll ML Lafleur MV, van Gog F, Vrieling H, Meerman JH: N-acetylated and deacetylated 4'-fluoro-4-aminobiphenyl and 4-aminobiphenyl adducts differ in their ability to inhibit DNA replication of single-stranded Ml 3 in vitro and of single-stranded phi X174 in Escherichia coli. Carcinogenesis 13(5):751-8, (May) 1992.
  • Yarchuk OB Spirin AS: Method for Obtaining Polypeptides in a Cell-free System.
  • Prime SB Platt AE, Stoney RM: Computer-assisted methods and apparatus for identification and characterization of biomolecules in a biological sample.
  • Electrophoresis apparatus and method are Electrophoresis apparatus and method.
  • Protein expression profiling using selective differential labeling The use of mass spectrometry to identify proteins whose sequences are present in either DNA or protein databases is well established and integral to the field of Proteomics. Protein and peptide mass can be determined at high accuracy by several mass spectrometric techniques. Peptide can be further fragmented in a tandem or ion trap mass spectrometer yielding sequence information of the peptide. Both types of mass information can be used to identify protein in a sequence database.
  • One goal of Proteomics is to define the expressed proteins associated with a given cellular state and another is to quantify changes in protein expression between cellular states.
  • ICAT isotope-coded affinity tag
  • the method is based on a newly synthesized class of chemical reagents (ICATs) used in combination with tandem mass spectrometry.
  • the ICAT reagent contains a biotin affinity tag and a thiol specific reactive group, which are joined by a spacer domain that is available in two forms: regular and isotopically heavy, which includes eight deuterium atoms.
  • regular and isotopically heavy which includes eight deuterium atoms.
  • a reduced protein mixture representing one cell state is derivatized with the isotopically light version of the ICAT reagent, while the conesponding reduced protein mixture representing a second cell state is derivatized with the isotopically heavy version of the ICAT reagent.
  • the labeled samples are combined and proteolytically digested to produce peptide fragments.
  • this present invention provides a method for simultaneous identification and quantification of expression levels of individual proteins carrying certain functional groups in their side chains.
  • the proteins may be analyzed in complex mixtures.
  • the method is based on comparison of two or more samples of proteins, one of which can be considered as the standard sample and all others can be considered as samples under investigation.
  • the samples of proteins are subjected to a sequence of manipulations including (i) proteolytic digestion into mixtures of peptides, (ii) treatment of the mixtures of peptides with chemical probes, (iii) washing away and discarding the unbound peptides from the mixtures, (iv) cleaving the chemical probes and the consequential release of the peptides still carrying parts of the chemical probes into solution.
  • This sequence of manipulations may also include one or more auxiliary chemical and/or enzymatic modifications of functional groups in side chains and/or in the free termini of the proteins and/or peptides in order to achieve selective and the most favorable modification for the next steps in the protocol.
  • the auxiliary modifications may be performed between any steps of the main sequence.
  • the core structure of the chemical probe consists of (i) a solid support,
  • the chemical probes perform three functions: (i) they attach peptides carrying specific functional groups in their side chains and/or termini to a solid support by forming covalent chemical bonds to the reactive group of the probe, (ii) they provide means for selective cleavage of the attached peptide from the solid support such that a part of the probe still remains attached to the peptide, and (iii) they serve as differential labeling reagents.
  • Differential labeling results from attaching of chemical moieties of different mass but of similar properties to a protein or a peptide such that peptides with the same sequence but with different labels are eluted together in the separation procedure and their ionization and detection properties regarding mass spectrometrical analysis are very similar.
  • the differential mass labeling unit remains covalently bound to the peptide after it is cleaved from the solid support part of the probe. Signals conesponding to peptides with the same sequence but marked with differential mass labels are assigned to different original protein samples.
  • the auxiliary chemical and/or enzymatic modification can be used to introduce additional differential mass labels into the peptides.
  • the reactive group on the chemical probe may be activated or modified by a bridging reagent prior to a reaction with mixtures of peptides. Such activation or modification provides for a greater flexibility in design of the chemical probe since the same core structure of a chemical probe may be tuned to increase reactivity and/or selectivity towards different functional groups in side chains and/or in termini of the peptides.
  • the differentially labeled peptide mixtures are combined, subjected to multidimensional chromatographic separation, and analyzed by mass spectrometry methods. Mass spectrometry data is processed by special software, which allows for determination and tracing the composition and sequence of peptides in the mixture to identification of the original proteins and their quantification.
  • This approach can be used for duplex or potentially multiplex protein expression profiling.
  • the complexity of the sample is simplified by targeting peptides containing particular amino acids, which selected by a reaction with chemical probes.
  • Alternative aspects of this invention include: (i) design of solid phase- based differential mass labeling reagents for selective peptide modification; (ii) design of various kinds of differential mass unit; (iii) combination of differential mass probes with various bridge reagent to target certain amino acid specifically; (iv) multiplex analysis; (v) combination of proteolytic digestion and chemical and/or enzymatic modifications in side chains and/or in termini of proteins and peptides in order to achieve selective and the most favorable modifications for the next steps in the protocol; (vi) combination of differential chemical labeling with MudPIT, and possible all other protein peptide separation or purification technologies if necessary.
  • One aspect of this invention provides reagents and procedures for quantification of protein expression using combination of selective differential peptides labeling, and the mixed bed multi-dimensional liquid chromatographs of the invention, e.g., 3D LC MS/MS, 3D LC-LC MS/MS or LC-LCQ-MS/MS or LC-LTQ- MS/MS systems of the invention, as described herein.
  • This invention overcomes the limitations inherent in traditional techniques.
  • the basic approach described can be employed for quantitative analysis of protein expression in complex samples (such as cells, tissues, and fraction etc.), the detection and quantitation of specific proteins in complex samples, and quantitative measurement of specific enzymatic activities in complexed samples.
  • the solid support part of the chemical probe may consist of any of the following materials or any combination of them: gel, glass beads, magnetic beads, polymers, silicon wafer, membrane, or resin.
  • the spacer between the solid phase part and the cleavable unit of the chemical probe may be included for convenience and improved yields in synthetic preparation of the chemical probe.
  • the spacer may consist of a chain of 2 to 8 atoms, which can be C, O, N, B, Si, S, P, Se ..., covalently bound to each other.
  • the atoms may carry hydrogen atoms, halogens, or one of the following groups containing up to 25 atoms: alkyl, hydroxy, alkoxy, amino, alkylamino...
  • the spacer may contain cyclic moieties with or without heteroatoms and with or without substituents.
  • the cleavable moiety provides means for selective detachment of the solid phase part of the chemical probe from the differential mass label attached to peptide. It is designed such that it can be cleaved by treating the probe with a chemical reagent or any kind of electromagnetic inadiation, photochemically, enzymatically, or thermally.
  • Differential mass labeling units differ in molecular mass, but do not differ in retention properties regarding the separation method used and in ionization and detection properties regarding the mass spectrometry methods used.
  • a R Z can have the same structure as Z , but they have different isotope A R composition. For instance, if Z contains x number of protons, Z may contain ⁇ number of deuterons in the place of protons, and, conespondingly, x -y number of protons remaining; and/or if Z A contains x number of borons- 10, Z B may contain y number of borons- 11 in the place of borons- 10, and, conespondingly, x - y number of borons- 10 remaining; and/or if Z A contains x number of carbons-12, Z B may contain y number of carbons- 13 in the place of carbons- 12, and, conespondingly, x -y number of carbons-12 remaining; and/or if Z A contains x number of nitrogens- 14, Z B may contain y number of nitrogens- 15 in the place of nitrogens- 14, and, conespondingly, x - y number of nitrogens- 14 remaining; and/or if Z A contains
  • Z A and Z B R-Z 1 -A 1 -Z 2 -A 2 -Z 3 -A 3 -Z 4 -A 4 - Z 1 , Z 2 , Z 3 , and Z 4 independently of one another can be selected from O, OC(O), OC(S), OC(O)O, OC(O)NR, OC(S)NR, OSiRR 1 , S, SC(O), SC(S), SS, S(O), S(O 2 ), NR, NRR 1+ , C(O), C(O)O, C(S), C(S)O, C(O)S, C(O)NR, C(S)NR, SiRR 1 , (Si(
  • OBRR 1 , OB(OR)(OR') or Z 1 - Z 4 may be absent; A 1 , A 2 , A 3 , and A 4 independently of one another can be selected from
  • R and R in which some single C-C bonds may be replaced with double or triple bonds, in which case some groups R and R will be absent, o-arylene, m-arylene, p- arylene with up to 6 substituents, carbocyclic, bicyclic, or tricyclic fragments with up to 8 atoms in the cycle with or without heteroatoms (O, N, S) and with or without substituents, or A 1 - A 4 may be absent; R, R 1 independently from other R and R'in Z 1 - Z 4 and independently from other R and R 1 in A 1 - A 4 is hydrogen, halogen, an alkyl, alkenyl, alkynyl, or aryl group; n in Z 1 - Z 4 is independent of n in A 1 - A 4 and is a whole number that can have value from 0 to 21.
  • Z A can have a similar structure to that of Z B , but Z A has x extra -CH 2 - fragment(s) in one or more A 1 - A 4 fragments, and/or Z A has x extra -CF 2 - fragment(s) in one or more A 1 - A fragments; and/or if Z A contains x number of protons, Z B may contain y number of halogens in the place of protons, and, conespondingly, x - y number of protons remaining in one or more A 1 - A 4 fragments; and/or Z A has x extra -O- fragment(s) in one or more A 1 - A 4 fragments; and/or Z A has x extra -S- fragment(s) in one or more A 1 - A 4 fragments; and or if Z A contains x number of -O- fragment(s), Z B may contain y number of-S- fragment(s) in the place of-O- fragment(s), and, conespondingly
  • Sequence analysis and quantification Peptides are quantified by measuring in the MS mode the relative signal intensities for pairs or series of peptide ions of identical sequence that are tagged differentially, which therefore differ in mass by the mass differential encoded within the differential labeling reagents.
  • Peptide sequence information is automatically generated by selecting peptide ions of a particular mass-to-charge (m z) ratio for collision-induced dissociation (CID) in the mass spectrometer operating in the tandem MS mode.
  • m z mass-to-charge
  • CID collision-induced dissociation
  • the resulting tandem mass spectra can be conelated to sequence databases to identify the protein from which the sequenced peptide originated.
  • Cunently commercial available softwares are Turbo SEQUESTTM by Thermofinigan, MASSSCOTTM by Matrix Science, and SONARTM MS/MS by Proteometiics. Special software development will be necessary for automated relative quantification.
  • Exemplary approaches for practicing the invention 1. Protein sample preparation, which may include protein denaturation, reduction, and proteolytic digestion 2. Treatment of the probe with a desired activating or bridging reagent 3. Treatment of the activated probe with a mixture of peptides 4. Wash off unbound peptides, which don't have the targeted amino acid 5. Combining modified differential labeled peptide mixture 6.
  • Metabolomics and lipidomics The invention also inco ⁇ orates holistic monitoring approaches, metabolomics and lipidomics, including profiling metabolite pools, carbohydrates, lipids, glycoproteins, and glycolipids Various chromatographic methods and other qualitative and/or quantitative methods could be utilized to characterize lipid profiles.
  • FACS fluorescence activated cell sorting
  • Desired products can be detected by incubating the encapsulated cells with fluorescent antibodies (Powell et al. Bio/Technology 8:333-337 (1990)). FACS sorting can also be used by this technique to assay resistance to toxic compounds and antibiotics by selecting droplets that contain multiple cells (i.e., the product of continued division in the presence of a cytotoxic compound; Goguen et al. Nature 363:189-190 (1995)). This method can select for any enzyme that can change the fluorescence of a substrate that can be immobilized in the agarose droplet. Reporter molecule In some aspects of the invention, screening can be accomplished by assaying reactivity with a reporter molecule reactive with a desired feature of, for example, a gene product.
  • cell-cell indicator In other aspects of the invention, screening is done with a cell-cell indicator assay. In this assay format, separate library cells (Cell A, the cell being assayed) and reporter cells (Cell B, the assay cell) are used. Only one component of the system, the library cells, is allowed to evolve. The screening is generally carried out in a two-dimensional immobilized format, such as on plates. The products of the metabolic pathways encoded by these genes (in this case, usually secondary metabolites such as antibiotics, polyketides, carotenoids, etc.) diffuse out of the library cell to the reporter cell.
  • secondary metabolites such as antibiotics, polyketides, carotenoids, etc.
  • the product of the library cell may affect the reporter cell in one of a number of ways.
  • the assay system (indicator cell) can have a simple readout (e.g., green fluorescent protein, luciferase, beta- galactosidase) which is induced by the library cell product but which does not affect the library cell.
  • the desired product can be detected by colorimetric changes in the reporter cells adjacent to the library cell.
  • indicator cells can in turn produce something that modifies the growth rate of the library cells via a feedback mechanism.
  • Growth rate feedback can detect and accumulate very small differences. For example, if the library and reporter cells are competing for nutrients, library cells producing compounds to inhibit the growth of the reporter cells will have more available nutrients, and thus will have more opportunity for growth. This is a useful screen for antibiotics or a library of polyketide synthesis gene clusters where each of the library cells is expressing and exporting a different polyketide gene product.
  • the reporter cell for an antibiotic selection can itself secrete a toxin or antibiotic that inhibits growth of the library cell. Production by the library cell of an antibiotic that is able to suppress growth of the reporter cell will thus allow uninhibited growth of the library cell.
  • the library cell may supply nutrients such as amino acids to an auxotrophic reporter, or growth factors to a growth-factor- dependent reporter. The reporter cell in turn should produce a compound that stimulates the growth of the library cell. Interleukins, growth factors, and nutrients are possibilities.
  • Further possibilities include competition based on ability to kill sunounding cells, positive feedback loops in which the desired product made by the evolved cell stimulates the indicator cell to produce a positive growth factor for cell A, thus indirectly selecting for increased product formation.
  • a different organism or genetic background
  • markers can be added to DNA constructs used for recursive sequence recombination to make the microorganism dependent on the constructs during the improvement process, even though those markers may be undesirable in the final recombinant microorganism.
  • Evnin et al. selected trypsin variants with altered substrate specificity by requiring that variant trypsin generate an essential amino acid for an arginine auxotroph by cleaving arginine beta-naphthylamide. This is thus a selection for arginine-specific trypsin, with the growth rate of the host being proportional to that of the enzyme activity.
  • the pool of cells surviving screening and/or selection is enriched for recombinant genes conferring the desired phenotype (e.g. altered substrate specificity, altered biosynthetic ability, etc.).
  • recombinant gene or pool of such genes surviving one round of screening/selection forms one or more of the substrates for a second round of recombination.
  • recombination can be performed in vivo or in vitro by any of the recursive sequence recombination formats described above. If recursive sequence recombination is performed in vitro, the recombinant gene or genes to form the substrate for recombination should be extracted from the cells in which screening/selection was performed. Optionally, a subsequence of such gene or genes can be excised for more targeted subsequent recombination.
  • the recombinant gene(s) are contained within episomes, their isolation presents no difficulties. If the recombinant genes are chromosomally integrated, they can be isolated by amplification primed from known sequences flanking the regions in which recombination has occuned. Alternatively, whole genomic DNA can be isolated, optionally amplified, and used as the substrate for recombination. Small samples of genomic DNA can be amplified by whole genome amplification with degenerate primers (Banett et al. Nucleic Acids Research 23:3488- 3492 (1995)). These primers result in a large amount of random 3' ends, which can undergo homologous recombination when reintroduced into cells.
  • the second round of recombination is to be performed in vivo, as is often the case, it can be performed in the cell surviving screening/selection, or the recombinant genes can be transfened to another cell type (e.g., a cell is type having a high frequency of mutation and/or recombination).
  • recombination can be effected by introducing additional DNA segment(s) into cells bearing the recombinant genes.
  • the cells can be induced to exchange genetic information with each other by, for example, electroporation.
  • the second round of recombination is performed by dividing a pool of cells surviving screening/selection in the first round into two subpopulations.
  • DNA from one subpopulation is isolated and transfected into the other population, where the recombinant gene(s) from the two subpopulations recombine to form a further library of recombinant genes.
  • the second round of recombination is sometimes performed exclusively among the recombinant molecules surviving selection. However, in other aspects, additional substrates can be introduced.
  • the additional substrates can be of the same form as the substrates used in the first round of recombination, i.e., additional natural or induced mutants of the gene or cluster of genes, forming the substrates for the first round.
  • the additional substrate(s) in the second round of recombination can be exactly the same as the substrate(s) in the first round of replication.
  • recombinant genes conferring the desired phenotype are again selected. The selection process proceeds essentially as before. If a suicide vector bearing a selective marker was used in the first round of selection, the same vector can be used again. Again, a cell or pool of cells surviving selection is selected. If a pool of cells, the cells can be subject to further enrichment.
  • Novel drugs Screening for various potential applications Novel drugs: identifying targets
  • the invention relates to procedures that can be applied to identifying compounds that bind to and modulate the function of target components of a cell whose function is known or unknown, and cell components that are not amenable to other screening methods.
  • the invention relates to generating and/or identifying a compound that binds to and modulates (inhibits or enhances) the function of a component of a cell, thereby producing a phenotypic effect in the cell.
  • Such a screen may involve identifying a biomolecule that 1) binds to, in vitro, a component of a cell that has been isolated from other constituents of the cell and that 2) causes, in vivo, as seen in an assay upon intracellular expression of the biomolecule, a phenotypic effect in the cell which is the usual producer and host of the target cell component.
  • intracellular production of the biomolecule can be in cells grown in culture or in cells introduced into an animal.
  • target cell component in this aspect and in other aspects not limited to pathogens can be one that is found in mammalian cells, especially cells of a type found to cause or contribute to disease or the symptoms of disease (e.g., cells of tumors or cells of other types of hype ⁇ roliferative disorders).
  • the invention provides a process for identifying one or more compounds that produce a phenotypic effect on a cell. The process is at the same time a method for target validation.
  • the process is characterized by identifying a biomolecule which binds an isolated target cell component, constructing cells comprising the target cell component and further comprising a gene encoding the biomolecular binder which can be expressed to produce the biomolecular binder, testing the constructed cells for their ability to produce, upon expression of the gene encoding the biomolecular binder, a phenotypic effect in the cells (e.g., inhibition of growth), wherein the test of the constructed cells can be a test of the cells in culture or a test of the cells after introducing them into host animals, or both, and further, identifying, for a biomolecular binder that caused the phenotypic effect, one or more compounds that compete with the biomolecular binder for binding to the target cell component.
  • a test of the constructed cells after introducing them into host animals is especially well-suited to assessing whether a biomolecular binder can produce a particular phenotype by the expression (regulatable by the researcher) of a gene encoding the biomolecular binder.
  • cells are constructed which have a gene encoding the biomolecular binder, and wherein the biomolecular binder can be produced by regulation of expression of the gene.
  • the constructed cells are introduced into a set of animals. Expression of the gene encoding the biomolecular binder is regulated in one group of the animals (test animals) such that the biomolecular binder is produced.
  • the gene encoding the biomolecular binder is regulated such that the biomolecular binder is not produced (control animals).
  • the cells in the two groups of animals are monitored for a phenotypic change (for example, a change in growth rate). If the phenotypic change is observed in cells in the test animals and not in the cells in the control animals, or to a lesser extent in the control animals, then the biomolecular binder has been proven to be effective in binding to its target cell component under in vivo conditions.
  • a target cell component of a particular cell type (a "first cell") is essential to producing a phenotypic effect on the first cell
  • the method having the steps: isolating the target component of the first cell; identifying a biomolecular binder of the isolated target component of the first cell; constructing a second type of cells (“second cell") comprising the target component and a regulable, exogenous gene encoding the biomolecular binder; and testing the second cell in culture for an altered phenotypic effect, upon production of the biomolecular binder in the second cell; whereby, if the second cell shows the altered phenotypic effect upon production of the biomolecular binder, then the target component of the first cell is essential to producing the phenotypic effect on the first cell.
  • the target cell component in this aspect and in other aspects not limited to pathogens can be one that is found in mammalian cells, especially cells of a type found to cause or contribute to disease or the symptoms of disease (e.g., cells of tumors or cells of other types of hype ⁇ roliferative disorders).
  • One aspect of the invention is a method for identifying a biomolecular inhibitor of growth of pathogen cells by using cell culture techniques, comprising contacting one or more types of biomolecules with isolated target cell component of the pathogen, applying a means of detecting bound complexes of biomolecules and target cell component, whereby, if the bound complexes are detected, one or more types of biomolecules have been identified as a biomolecular binder of the target cell component, constructing a pathogen strain having a regulatable gene encoding the biomolecular binder, regulating expression of the gene encoding the biomolecular binder to express the gene; and monitoring growth of the pathogen cells in culture relative to suitable control cells, whereby, if growth of the pathogen cells is decreased compared to growth of suitable control cells, then the biomolecule is a biomolecular inhibitor of growth of the pathogen cells.
  • Identifying compounds that inhibit infection of a mammal by a pathogen Another aspect of the invention is a method, employing an animal test, for identifying one or more compounds that inhibit infection of a mammal by a pathogen by binding to a target cell component, comprising constructing a pathogen comprising a regulatable gene encoding a biomolecule which binds to the target cell component, infecting test animals with the pathogen, regulating expression of the regulatable gene to produce the biomolecule, monitoring the test animals and suitable control animals for signs of infection, wherein observing fewer or less severe signs of infection in the test animals than in suitable control animals indicates that the biomolecule is a biomolecular inhibitor of infection, and identifying one or more compounds that compete with the biomolecular inhibitor of growth for binding to the target cell component (as by employing a competitive binding assay), then the compound inhibits infection of a mammal by a pathogen by binding to a target.
  • the competitive binding assay to identify binding analogs of biomolecular binders which have been proven to bind to their targets in an intracellular test of binding, can be applied to any target for which a biomolecular binder has been identified, including targets whose function is unknown or targets for which other types of assays are not easily developed and performed. Therefore, the method of the invention offers the advantage of decreasing assay development time when using a gene product of known function as a target cell component and the advantage of bypassing the major hurdle of gene function identification when using a gene product of unknown function as a target cell component.
  • cells comprising a biomolecule and a target cell component, wherein the biomolecule is produced by expression of a regulable gene, and wherein the biomolecule modulates function of the target cell component, thereby causing a phenotypic change in the cells.
  • cells comprising a biomolecule and a target cell component, wherein the biomolecule is a biomolecular binder of the target cell component, and is encoded by a regulatable gene.
  • the cells can include mammalian cells or cells of a pathogen, for instance, and the phenotypic change can be a change in growth rate.
  • the pathogen can be a species of bacteria, yeast, fungus, or parasite, for example.
  • Intracellular validation of a biomolecule provides methods that result in the identification of compounds that cause a phenotypic effect on a cell.
  • the general steps described herein to find a compound for drug development can be thought of as these: (1) identifying a biomolecule that can bind to an isolated target cell component in vitro, (2) confirming that the biomolecule, when produced in cells with the target cell component, can cause a desired phenotypic effect and (3) identifying, by an in vitro screening method, for example, compounds that compete with the biomolecule for binding to the target cell component.
  • a biomolecule is a gene product (e.g., polypeptide, RNA, peptide or RNA oligonucleotide) of an exogenous gene — a gene which has been introduced in the course of construction of the cell. Biomolecules that bind to and alter the function of a candidate target are identified by various in vitro methods.
  • the biomolecule Upon production of the biomolecule within a cell either in vitro or within an animal model system, the biomolecule binds to a specific site on the target, alters its intracellular function, and hence produces a phenotypic change (e.g. cessation of growth, cell death).
  • a phenotypic change e.g. cessation of growth, cell death.
  • cessation of growth or death of the engineered pathogen cells leads to the clearing of infection and animal survival, demonstrating the importance of the target in infection and thereby validating the target.
  • a further aspect of this invention provides for identifying a biomolecule that produces a phenotypic effect on a cell (wherein the cell can be, for instance, a pathogen cell or a mammalian cell) and (2) simultaneous intracellular target validation.
  • the invention includes methods for identifying compounds that inhibit the growth of cells having a target cell component.
  • the target cell component can first be identified as essential to the growth of the cells in culture and/or under conditions in which it is desired that the growth of the cells be inhibited. These methods can be applied, for example, to various types of cells that undergo abnormal or undesirable proliferation, including cells of neoplasms (tumors or growths, either benign or malignant) which, as known in the art, can originate from a variety of different cell types. Such cells can be refened to, for example, as being from adenomas, carcinomas, lymphomas or leukemias.
  • the method can also be applied to cells that proliferate abnormally in certain other diseases, such as arthritis, psoriasis or autoimmune diseases. If intracellular expression of the biomolecular binder inhibits the function of a target essential for growth (presumably by binding to the target at a biologically relevant site) cells monitored in step (2) will exhibit a slow growth or no growth phenotype. Targets found to be essential for growth by these methods are validated starting points for drug discovery, and can be inco ⁇ orated into assays to identify more stable compounds that bind to the same site on the target as the biomolecule. Where the cells are pathogen cells and the desired phenotypic change to be monitored is inhibition of growth, the invention provides a procedure to examine the activity of target (pathogen) cell components in an animal infection model.
  • a target cell component a gene product of a particular cell type
  • a target cell component a gene product of a particular cell type (e.g., a type of pathogenic bacteria), wherein the target cell component is already known as being encoded by a characterized gene, as a potential target for a modulator to be identified.
  • the target cell component can be isolated directly from the cell type of interest, assuming suitable culture methods are available to grow a sufficient number of cells, using methods appropriate to the type of cell component to be isolated (e.g., protein purification methods such as differential precipitation, ion exchange chromatography, gel chromatography, affinity chromatography, HPLC.
  • Target cell component can be produced recombinantly Alternatively, the target cell component can be produced recombinantly, that requires that the gene encoding the target cell component be isolated from the cell type of interest. This can be done by any number of methods, for example known methods such as PCR, using template DNA isolated from the pathogen or a DNA library produced from the pathogen DNA, and using primers based on known sequences or combinations of known and unknown sequences within or external to the chosen gene. See, for example, methods described in "The Polymerase Chain Reaction," Chapter 15 of Cunent Protocols in Molecular Biology, (Ausubel, F.M. et al., eds), John Wiley & Sons, New York, 1998.
  • Other methods include cloning a gene from a DNA library (e.g., a cDNA library from a eukaryotic pathogen) into a vector (e.g., plasmid, phage, phagemid, virus, etc.) and applying a means of selection or screening, to clones resulting from a transformation of vectors (including a population of vectors now having inserted genes) into appropriate host cells.
  • the screening method can take advantage of properties given to the host cells by the expression of the inserted chosen gene (e.g., detection of the gene product by antibodies directed against it, detection of an enzymatic activity of the gene product), or can detect the presence of the gene itself (for instance, by methods employing nucleic acid hybridization).
  • Target proteins can be expressed with E. coli or other prokaryotic gene expression systems, or in eukaryotic gene expression systems. Since many eukaryotic proteins carry unique modifications that are required for their activities, e.g. glycosylation and methylation, protein expression can in some cases be better carried out in eukaryotic systems, such as yeast, insect, or mammalian cells that can perform these modifications. Examples of these expression systems have been reviewed in the following literature: Methods in Enzymology, Volume 185, eds D.V.
  • the gene can be identified and cloned by a method such as that used in Shiba et al., US 5,759,833, Shiba et al., US 5,629,188, Martinis et al., US 5,656,470 and Sassanfar et al., US 5,756,327.
  • Method should be used with target cell components which have not been previously isolated or characterized and whose functions are unknown It is an advantage of the target validation method that it can be used with target cell components which have not been previously isolated or characterized and whose functions are unknown.
  • a segment of DNA containing an open reading frame (ORF; a cDNA can also be used, as appropriate to a eukaryotic cell) which has been isolated from a cell of a type that is to be an object of drug action (e.g., tumor cell, pathogen cell) can be cloned into a vector, and the target gene product of the ORF can be produced in host cells harboring the vector.
  • the gene product can be purified and further studied in a manner similar to that of a gene product that has been previously isolated and characterized.
  • the open reading frame (in some cases, cDNA) can be isolated from a source of DNA of the cells of interest (genomic DNA or a library, as appropriate), and inserted into a fusion protein or fusion polypeptide construct.
  • This construct can be a vector comprising a nucleic acid sequence which provides a control region (e.g., promoter, ribosome binding site) and a region which encodes a peptide or polypeptide portion of the fusion polypeptide wherein the polypeptide encoded by the fusion vector endows the fusion polypeptide with one or more properties that allow for the purification of the fusion polypeptide.
  • the vector can be one from the pGEX series of plasmids (Pharmacia) designed to produce fusions with glutathione S-transferase.
  • Host cells The isolated DNA having an open reading frame, whether encoding a known or an as yet unidentified gene product, when inserted into an expression construct, can be expressed to produce the target cell component in host cells.
  • Host cells can be, for example, Gram-negative or Gram-positive bacterial cells such as Escherichia coli or Bacillus subtilis, respectively, e.g., Bacillus anthracis, or yeast cells such as Saccharomyces cerevisiae, Schizosaccharomyces pombe or Pichia pastoris.
  • the target cell component can be used in target validation studies be produced in a host that is genetically related to the pathogen from which the gene encoding it was isolated.
  • a host that is genetically related to the pathogen from which the gene encoding it was isolated.
  • an E. coli host is prefened over a Pichia pastoris host.
  • the target cell component so produced can then be isolated from the host cells.
  • Many protein purification methods are known that separate proteins on the basis of, for instance, size, charge, or affinity for a binding partner (e.g., for an enzyme, a binding partner can be a substrate or substrate analog), and these methods can be combined in a sequence of steps by persons of skill in the art to produce an effective purification scheme.
  • An isolated cell component or a fusion protein comprising the cell component can be used in a test to identify one or more biomolecular binders of the isolated product (general step (1)).
  • a biomolecular binder of a target cell component can be identified by in vitro assays that test for the formation of complexes of target and biomolecular binder no covalently, bound to each other.
  • the isolated target can be contacted with one or more types of biomolecules under conditions conducive to binding, the unbound biomolecules can be removed from the targets, and a means of detecting bound complexes of biomolecules and targets can be applied.
  • the detection of the bound complexes can be facilitated by having either the potential biomolecular binders or the target labeled or tagged with an adduct that allows detection or separation (e. g., radioactive isotope or fluorescent label; streptavidin, avidin or biotin affinity label).
  • both the potential biomolecular binders and the target can be differentially labeled. For examples of such methods see, e.g., WO 98/19162.
  • Biomolecules to be tested and means for detection The biomolecules to be tested for binding to a target can be from a library of candidate biomolecular binders, (e.g., a peptide or oligonucleotide library).
  • a peptide library can be displayed on the coat protein of a phage (see, for examples of the use of genetic packages such as phage display libraries, Koivunen, E. et al., J Biol. Chem. 268:20205-20210 (1993)).
  • the biomolecules can be detected by means of a chemical tag or label attached to or integrated into the biomolecules before they are screened for binding properties.
  • the label can be a radioisotope, a biotin tag, or a fluorescent label.
  • Those molecules that are found to bind to the target molecule can be called biomolecular binders.
  • Fusion proteins An isolated target cell component, an antigenically similar portion thereof, or a suitable fusion protein comprising all of or a portion of or the entire target can be used in a method to select and identify biomolecules which bind specifically to the target.
  • the target cell component comprises a protein
  • fusion proteins comprising all of, or a portion of, the target linked to a second moiety not occurring in the target as found in nature, can be prepared for use in another aspect of the method.
  • Suitable fusion proteins for this piupose include those in which the second moiety comprises an affinity ligand (e.g., an enzyme, antigen, epitope).
  • the fusion proteins can be produced by the insertion of a gene encoding a target or a suitable portion of such gene into a suitable expression vector, which encodes an affinity ligand (e.g., pGEX-4T-2 and pET- 15b, encoding glutathione S- transferase and His-Tag affinity ligands, respectively).
  • the expression vector can be introduced into a suitable host cell for expression.
  • Host cells are lysed and the lysate, containing fusion protein, can be bound to a suitable affinity matrix by contacting the lysate with an affinity matrix under conditions sufficient for binding of the affinity ligand portion of the fusion protein to the affinity matrix.
  • Fusion protein can be immobilized
  • the fusion protein can be immobilized on a suitable affinity matrix under conditions sufficient to bind the affinity ligand portion of the fusion protein to the matrix, and is contacted with one or more candidate biomolecules (e.g., a mixture of peptides) to be tested as biomolecular binders, under conditions suitable for binding of the biomolecules to the target portion of the bound fusion protein.
  • candidate biomolecules e.g., a mixture of peptides
  • the affinity matrix with bound fusion protein can be washed with a suitable wash buffer to remove unbound biomolecules and non- specifically bound biomolecules. Biomolecules which remain bound can be released by contacting the affinity matrix with fusion protein bound thereto with a suitable elution buffer. Wash buffer can be formulated to permit binding of the fusion protein to the affinity matrix, without significantly disrupting binding of specifically bound biomolecules. In this aspect, elution buffer can be formulated to permit retention of the fusion protein by the affinity matrix, but can be formulated to interfere with binding of the test biomolecule(s) to the target portion of the fusion protein.
  • a change in the ionic strength or pH of the elution buffer can lead to release of biomolecules
  • the elution buffer can comprise a release component or components designed to disrupt binding of biomolecules to the target portion of the fusion protein.
  • Immobilization can be performed prior to, simultaneous with, or after contacting, the fusion protein with biomolecule, as appropriate.
  • Various permutations of the method are possible, depending upon factors such as the biomolecules tested, the affinity matrix-ligand pair selected, and elution buffer formulation.
  • a suitable elution buffer a matrix elution buffer, such as glutathione for a GST fusion.
  • the fusion protein comprises a cleavable linker, such as a thrombin cleavage site
  • cleavage from the affinity ligand can release a portion of the fusion with the biomolecules bound thereto.
  • Bound biomolecule can then be released from the fusion protein or its cleavage product by an appropriate method, such as extraction.
  • an appropriate method such as extraction.
  • one or more candidate biomolecular binders can be tested simultaneously. Where a mixture of biomolecules is tested, the biomolecules selected by the foregoing processes can be separated (as appropriate) and identified by suitable methods (e.g., PCR, sequencing, chromatography).
  • Random sequence RNA libraries can also be screened according to the present method to select RNA molecules which bind to a target. Where biomolecules selected from a combinatorial library by the present method carry unique tags, identification of individual biomolecules by chromatographic methods is possible.
  • biomolecules do not carry tags
  • chromatographic separation followed by mass spectrometry to ascertain structure
  • Other methods to identify biomolecular binders of a target cell component can be used.
  • the two-hybrid system or interaction trap is an in vivo system that can be used to identify polypeptides, peptides or proteins (candidate biomolecular binders) that bind to a target protein.
  • both candidate biomolecular binders and target cell component proteins are produced as fusion proteins.
  • the two-hybrid system and variations on it have been described (US 5,283,173 and US 5,468,614; Golemis, E.A.
  • biomolecular binders of a cell component Once one or more biomolecular binders of a cell component have been identified, further steps can be combined with those taken to identify the biomolecular binder, to identify those biomolecular binders that produce a phenotypic effect on a cell (where "a cell” can mean cells of a cell strain or cell line).
  • a method for identifying a biomolecule that produces a phenotypic effect on a first cell can comprise the steps of identifying a biomolecular binder of an isolated target cell component of the first cell, constructing a second cell comprising the target cell component and a regulable exogenous gene encoding the biomolecular binder, and testing the second cell for the phenotypic effect, upon production of the biomolecular binder in the second cell, where the second cell can be maintained in culture or introduced into an experimental animal. If the second cell shows the phenotypic effect upon intracellular production of the biomolecular binder, then a biomolecule that produces a phenotypic effect on the first cell has been identified.
  • Host cells Engineered to control expression Host cells (also, "second cells" in the terminology used above) of the cell type (e.g., species of pathogenic bacteria) the target was isolated from (or the gene encoding the target was originally isolated from, if the target is produced by recombinant methods), can be engineered to harbor a gene that can regulatably express the biomolecular binder (e.g., under an inducible or repressible promoter). The ability to regulate the expression of the biomolecular binder is desirable because constitutive expression of the biomolecular binder could be lethal to the cell.
  • inducible or regulated expression gives the researcher the ability to control if and when the biomolecular binder is expressed.
  • the gene expressing the biomolecular binder can be present in one or more copies, either on an extra chromosomal structure, such as on a single or multicopy plasmid, or integrated into the host cell genome. Plasmids that provide an inducible gene expression system in pathogenic organisms can be used. For example, plasmids allowing tetracycline- inducible expression of a gene in Staphylococcus aureus have been developed.
  • genes for expression For intracellular expression of a biomolecule to be tested for its phenotypic effect in a eukaryotic cell (e.g., mammalian cell), the genes for expression can be carried on plasmid-based or virus-based vectors, or on a linear piece of DNA or RNA.
  • a eukaryotic cell e.g., mammalian cell
  • the genetic material can be introduced into cells using a variety of techniques, including whole cell or protoplast transformation, electroporation, calcium phosphate-DNA precipitation or DEAE- Dextran transfection, liposome mediated DNA or RNA transfer, or transduction with recombinant viral or retroviral vectors.
  • Expression of the gene can be constitutive (e.g., ADHI promoter for expression in S. cerevisiae (Bennetzen, J.L. and Hall, B.D., J Biol. Chem 257:3026-3031 (1982)), or CMV immediate early promoter and RSV LTR for mammalian expression) or inducible, as the inducible GAL I promoter in yeast (Davis, L.I.
  • E. coli Lac repressor/operator system and TnlO Tet repressor/operator systems have been engineered to govern regulated expression in organisms from bacterial to mammalian cells. Regulated gene expression can also be achieved by activation.
  • gene expression governed by HIV LTR can be activated by HIV or SIV Tat proteins in human cells;
  • GAL4 promoter can be activated by galactose in a nonglucose-containing medium.
  • the location of the biomolecule binder genes can be extra chromosomal or chromosomally integrated.
  • the chromosome integration can be mediated through homologous or nonhomologous recombinations.
  • biomolecule binders For proper localization in the cells, it maybe desirable to tag the biomolecule binders with certain peptide signal sequences (for example, nuclear localization signal (NLS) sequences, mitochondria localization sequences).
  • NLS nuclear localization signal
  • Fused biomolecular binders For presentation of the biomolecular binders in the intracellular system, they can be fused N-terminally, C-terminally, or internally in a carrier protein (if the biomolecular binder is a peptide), and can be fused (5', 3' or internally) in a carrier RNA or DNA molecule (if the biomolecular binder is a nucleic acid).
  • the biomolecular binder can be presented with a protein or nucleic acid structural scaffold.
  • Certain linkages e.g., a 4-glycine linker for a peptide or a stretch of A's for an RNA can be inserted between the biomolecular binder and the carrier proteins or nucleic acids.
  • the effect of this biomolecular binder on the phenotype of the cells can be tested, as a manifestation of the binding (implying binding to a functionally relevant site, thus, an activator, or more likely, an inhibitory) effect of the biomolecular binder on the target used in an in vitro binding assay as described above.
  • An intracellular test can not only determine which biomolecular binders have a phenotypic effect on the cells, but at the same time can assess whether the target in the cells is essential for maintaining the normal phenotype of the cells.
  • a culture of the engineered cells expressing a biomolecular binder can be divided into two aliquots.
  • the first aliquot (“test” cells) can be treated in a suitable manner to regulate (e.g., induce or release repression of, as appropriate) the gene encoding the biomolecular binder, such that the biomolecular binder is produced in the cells.
  • the second aliquot (“control” cells) can be left untreated so that the biomolecular binder is not produced in the cells.
  • a different strain of cells not having a gene that can express the biomolecular binder, can be used as control cells.
  • the phenotype of the cells in each culture (“test” and "control” cells grown under the same conditions, other than the expression of the biomolecular binder), can then be monitored by a suitable means (e.g., enzymatic activity, monitoring, a product of a biosynthetic pathway, antibody to test for presence of cell surface antigen, etc.).
  • the growth of the cells in each culture (“test” and “control” cells grown under the same conditions, other than the expression of the biomolecular binder), can be monitored by a suitable means (e.g., turbidity of liquid cultures, cell count, etc). If the extent of growth, or rate of growth of the test cells is less than the extent of growth or rate of growth of the control cells, then the biomolecular binder can be concluded to be an inhibitor of the growth of the cells, or a biomolecular inhibitor. If the phenotype of the test cells is altered relative to that of the control cells, then the biomolecular binder can be concluded to be one that causes a phenotypic effect.
  • a suitable means e.g., turbidity of liquid cultures, cell count, etc.
  • isolated target cell component having a known function can be tested for modulation of this known function in the presence of biomolecular binder under conditions conducive to binding of the biomolecular binder to the target cell component. Positive results in these tests should encourage the investigator to continue in the drug discovery process with efforts to find a more stable compound (than a peptide, polypeptide or RNA biomolecule) that mimics the binding properties of the biomolecular binder on the tested target cell component.
  • Engineering strain of cells A further test can, again, employ an engineered strain of cells that comprise both the target cell component and one or more genes encoding a biomolecule tested to be a biomolecular binder of the target celPcomponent.
  • the cells of the cell strain can be tested in animals to see if regulable expression of the biomolecular binder in the engineered cells produces an observable or testable change in phenotype of the cells.
  • Both the "in culture” test for the effect of intracellular expression of the biomolecular binder and the “in animal” test (described below) for the effect of intracellular expression of the biomolecular binder can be applied not only towards drug discovery in the categories of antimicrobials and anticancer agents, but also towards the discovery of therapeutic agents to treat inflammatory diseases, cardiovascular diseases, diseases associated with metabolic pathways, and diseases associated with the central nervous system, for example.
  • the object of the test is to see whether production of the biomolecular binder in the engineered strain inhibits growth of these cells after their introduction into an animal by the engineered pathogen.
  • Such a test can not only determine which biomolecular binders are inhibitors of growth of the cells, but at the same time can assess whether the target in the cells is essential for maintaining growth of the cells (infection, for a pathogenic organism) in a host mammal.
  • Suitable animals for such an experiment are, for example, mammals such as mice, rats, rabbits, guinea pigs, dogs, pigs, and the like. Small mammals can be used for reasons of convenience.
  • the engineered cells are introduced into one or more animals ("test” animals) and into one or more animals in a separate group (“control” animals) by a route appropriate to cause symptoms of systemic or local growth of the engineered cells.
  • the route of introduction may be, for example, by oral feeding, by inhalation, by subdermal, intramuscular, intravenous, or intraperitoneal injection as appropriate to the desired result.
  • expression of the gene encoding the biomolecular binder is regulated to allow production of the biomolecular binder in the engineered pathogen cells.
  • the treatment to express the gene encoding the biomolecular binder can be the administration of an inducer substance (where expression of the biomolecular binder or gene is under the control of an inducible promoter) or the functional removal of a repressor substance (where expression of the biomolecular binder gene is under the control of a repressible promoter).
  • an inducer substance where expression of the biomolecular binder or gene is under the control of an inducible promoter
  • a repressor substance where expression of the biomolecular binder gene is under the control of a repressible promoter
  • the animals can be monitored for signs of infection (as the simplest endpoint, death of the animal, but also e.g., lethargy, lack of grooming behavior, hunched posture, not eating, dianhea or other discharges; bacterial titer in samples of blood or other cultured fluids or tissues).
  • signs of infection as the simplest endpoint, death of the animal, but also e.g., lethargy, lack of grooming behavior, hunched posture, not eating, dianhea or other discharges; bacterial titer in samples of blood or other cultured fluids or tissues.
  • the test and control animals can be monitored for the development of tumors or for other indicators of the proliferation of the introduced engineered cells.
  • the biomolecule can be also called a biomolecular inhibitor of growth, or biomolecular inhibitor of infection, as appropriate, as it can be concluded that the expression in vivo of the biomolecular inhibitor is the cause of the relative reduction in growth of the introduced cells in the test animals.
  • further steps of the procedure involve in vitro assays to identify one or more compounds that have binding and activating or inhibitory properties that are similar to those of the biomolecules which have been found to have a phenotypic effect, such as inhibition of growth. That is, compounds that compete for binding to a target cell component with the biomolecule would then be structural analogs of the biomolecules. Assays to identify such compounds can take advantage of known methods to identify competing molecules in a binding assay. These steps comprise general step (3) of the method.
  • a biomolecular inhibitor (or activator) can be contacted with the isolated target-cell component to allow binding, one or more compounds can be added to the milieu comprising the biomolecular inhibitor and the cell component under conditions that allow interaction and binding between the cell component and the biomolecular inhibitor, and any biomolecular inhibitor that is released from the cell component can be detected.
  • Fluorescence One suitable system that allows the detection of released biomolecular inhibitor (or activator) is one in which fluorescence polarization of molecules in the milieu can be measured.
  • the biomolecular inhibitor can have bound to it a fluorescent tag or label such as fluorescein or fluorescein attached to a linker.
  • Assays for inhibition of the binding of the biomolecular inhibitor to the cell component can be done in microtiter plates to conveniently test a set of compounds at the same time.
  • a majority of the fluorescently labeled biomolecular inhibitor must bind to the protein in the absence of competitor compound to allow for the detection of small changes in the bound versus free probe population when a compound which is a competitor with a biomolecular inhibitor is added (B.A. Lynch, et al., Analytical Biochemistry 247:77-82 (1997)). If a compound competes with the biomolecular inhibitor for a binding site on the target cell component, then fluorescently labeled biomolecular inhibitor is released from the target cell component, lowering the polarization measured in the milieu. Radioactive isotope 0226
  • the target cell component can be attached to a solid support, contacted with one or more compounds, and contacted with the biomolecular inhibitor.
  • One or more washing steps can be employed to remove biomolecular inhibitor and compound not bound to the cell component. Either the biomolecular inhibitor bound to the target cell component or the compound bound to the target cell component can be measured.
  • Detection of biomolecular inhibitor or compound bound to the cell compound can be facilitated by the use of a label on either molecule type, wherein the label can be, for instance, a radioactive isotope either inco ⁇ orated into the molecule itself or attached as an adduct, streptavidin or biotin, a fluorescent label or a substrate for an enzyme that can produce from the substrate a colored or fluorescent product.
  • a scintillation counter can be used to measure radioactivity.
  • Radio labeled streptavidin or biotin can be allowed to bind to biotin or streptavidin, respectively, and the resulting complexes detected in a scintillation counter.
  • Alkaline phosphatase conjugated to streptavidin can be added to a biotin- labeled biomolecular inhibitor or compound.
  • Detection and quantitation of a biotin- labeled complex can then be by addition of pNPP substrate of alkaline phosphatase and detection by spectrophotometry, of a product which absorbs UV light at a wavelength of 405 nm.
  • a fluorescent label can also be used, in which case detection of fluorescent complexes can be by a fluorometer.
  • the method for identifying compounds comprises attaching the target cell component to a solid support, contacting the biomolecular inhibitor with the target cell component under conditions suitable for binding of the biomolecular inhibitor to the cell component, removing unbound biomolecular inhibitor from the solid support, contacting one or more compounds (e.g., a mixture of compounds) with the cell component under conditions suitable for binding of the biomolecular inhibitor to the cell component, and testing for unbound biomolecular inhibitor released from the cell component, whereby if unbound biomolecular inhibitor is detected, one or more compounds that displace or compete with the biomolecular inhibitor for a particular site on the target cell component have been identified.
  • compounds e.g., a mixture of compounds
  • Derivatives of these compounds having modifications to confer improved solubility, stability, etc. can also be tested for a desired phenotypic effect.
  • Combining steps Combining steps for testing the phenotypic effects of a biomolecule, as can be produced in an intracellular test, with steps for identifying compounds that compete with the biomolecule for sites on a target cell component, yields a method for identifying a compound which is a functional analog of a biomolecule which produces a phenotypic effect on a cell.
  • steps can be to test, for the phenotypic effect, either in culture or in an animal model, or in both, a cell which produces a biomolecule by regulatable expression of an exogenous gene in the cell, and to identify, if the biomolecule caused the phenotypic effect, one or more compounds that compete with the biomolecule for binding to a target cell component. If a compound is found to compete with the biomolecule for binding to the target cell component, then the compound is a functional analog of a biomolecule which produces a phenotypic effect on the cell. Such a functional analog can cause qualitatively a similar effect on the cell, but to a similar degree, lesser degree or greater degree than the biomolecule.
  • a further aspect of the invention is a method for determining whether a target component of a cell is essential to producing a phenotypic effect on the cell, comprising isolating the target component from the cell, identifying a biomolecular binder of the isolated target component of the cell, constructing a second cell comprising the target component and a regulable, exogenous gene encoding the biomolecular binder, and testing the second cell in culture for an altered phenotypic effect, upon production of the biomolecular binder in the second cell, whereby, if the second cell shows the altered phenotypic effect upon production of the bimolecular binder, then the target component of the first cell is essential to producing the phenotypic effect on the first cell.
  • DHFR Mammalian dihydrofolate reductase
  • MTX Methotrexate
  • NIH 3T3 is a mouse fibroblast cell line that is able to develop spontaneous transformed cells when cultured in low concentration (2%) of calf serum in molecular, cellular and developmental biology medium 402 (MCDB) (M. Chow and H. Rubin, Proc. Natl. Acad. Sci. USA 95(8):4550-4555 (1998)).
  • the transformed cells which can be selectively inhibited by MTX (Chow and Rubin), are isolated.
  • Both the normal and transformed NIH3T3 cells are transfected with pTet- On plasmid (Clontech; Palo Alto, CA).
  • Stable cell lines that express high levels of reverse tetracycline-control led activator (rtTA) are isolated and characterized for their normal or transformed phenotype (Chow and Rubin).
  • the DHFR gene (Genbank Accession # L26316) from the NIH 3T3 cell line is amplified by reverse transcription-PCR (RT-PCR) using poly A' RNA isolated from NIH 3T3 cells (Sambrook, J. et al., Molecular Cloning: A Laboratory Manual, 2nd edition, Cold Spring Harbor Laboratory Press, 1989). Active DHFR is expressed using the BacPAK Baculovirus Expression System (Clontech) or other appropriate systems.
  • the expressed DHFR is purified and biotinylated and subjected to peptide binder identification as exemplified for bacterial proteins.
  • the identified peptides are biochemically characterized for in vitro inhibition of DHFR activity.
  • Peptides that inhibit DHFR are identified.
  • a nucleic acid encoding each peptide can be cloned into a vector such as pGEX-4T2 (Pharmacia) to yield a vector which encodes a fusion polypeptide having the peptide fused to the N- terminus of GST. This can also be done by PCR amplification as exemplified herein for the peptide Pro- 3.
  • the fusion genes are cloned into plasmid pTRE (Clontech) for regulated expression.
  • the constructed plasmid or the vector is co-transfected with pTK-Hyg into the stable NIH 3T3 cell line that expresses rtTA.
  • 3T3N-VITA normal 3T3 cells that express rtTA and the DHFR inhibitory peptides
  • 3T3T-VITA transformed 3T3 cells that express rtTA and the DHFR inhibitory peptides
  • 3T3T-VITA control transformed 3T3 cells that express rtTA and GST
  • 10 2 -l ⁇ ' of 3T3T-VITA or 3T3T-VITA control cells are mixed with 10 5 3T3N-VITA and are grown in MCD 402 medium with 10% calf serum at 37'C for three days.
  • Tetracycline is added to the medium to a final concentration of 0 to 1 ug/ml. In a control, 200 nM of MTX is added. The cultures are incubated for an additional eight days, and the number of foci formed are counted as described by M. Chow and H. Rubin, Proc Natl. Acad Sci. USA 95(8):4550-4555 (1998). Peptides that specifically inhibit foci formation of 3T3 transformed cells are identified. A murine model of fibroblastoma (Kogerman, P. et al., Oncogene
  • 3T3T- VITA or 3T3T-V1TA control cells (10 3 , 10 4 , 10 5 , 10 6 cells) are injected subcutaneously into 5 groups (10 in each group) of athymic nude mice (4-6 weeks old, 18-22 g) to determine the minimal dose needed for development of fibroblastomas in all of the tested animals.
  • 6 groups of athymic nude mice (10 each) are injected subcutaneously (s.e) with the minimal tumorigenic dose for 3T3T-V1TA or 3T3T-VITA control cells to develop fibroblastoma.
  • mice One week after injection, group I mice start receiving MTX s.e at 2 mg/kg/day as positive control, group 2 to 5 start receiving 1, 2, 5, or 10 mg/kg/day of tetracycline, group 6 start receiving saline (vehicle) as control. Five weeks after the introduction of cells, all of the mice are sacrificed and tumors are removed from them. Tumor mass is measured and compared among the groups. An effective peptide identified by these in vivo experiments can be used for screening libraries of compounds to identify those compounds that competitively bind to DHFR.
  • One mechanism of tumorigenesis is overexpression of proto-oncogenes such as Ha-ras (Reviewed by Suarez, H.G., Anticancer Research 9(5):1331-1343 (1989)).
  • Transgenic mice that overexpress human Ha-ras have been produced. Such transgenic mice develop salivary and/or mammary adenocarcinomas (Nielsen, L.L. et al, In Vivo 8(5):1331-1343 (1994)). Secondary transgenic mice that express rtTA can be generated using the pTet-On plasmid from Clontech.
  • Human Ha-ras open reading frame cDNA (Genbank Accession #GO0277) is amplified by RT-PCR using polyA- RNA isolated from human mammary gland or other tissues. Active Ha-ras is expressed using the BacPAK Baculovirus Expression System (Clontech) or other appropriate systems. The expressed Ha-ras is purified and biotinylated and subjected to peptide binder identification as exemplified herein for bacterial proteins as target cell components. The identified peptides are biochemically characterized for in vitro inhibition of Ha- ras GTPase activity.
  • Peptides that inhibit Ha-ras are cloned into plasmid pTPE (Clontech) for regulated expression as an N-terminal fusion of GST. Such constructs are used to generate tertiary transgenic mice using the secondary transgenic mice. Transgenic mice that are able to overexpress peptide genes are identified by Northern and Western analysis. Control mice that express GST are also identified. Various doses of tetracycline are administered to the tertiary transgenic mice by s.e or I.P. injection before or after tumor onset. Prevention or regression of tumors resulting from expression of the peptide genes are analyzed as described above for murine fibroblastoma.
  • Peptides found to be effective in in vivo experiments will be used to screen compounds that inhibit human Ha-ras activity for cancer therapy.
  • Disease targets The method of the invention can be applied more generally to mammalian diseases caused by: (1) loss or gain of protein function, (2) over- expression or loss of regulation of protein activity. In each case the starting point is the identification of a putative protein target or metabolic pathway involved in the disease.
  • the protocol can sometimes vary with the disease indication, depending on the availability of cell culture and animal model systems to study the disease. In all cases the process can deliver a validated target and assay combination to support the initiation of drug discovery.
  • Appropriate disease indications include, but are not limited to, Alzheimer's, arthritis, cancer, cardiovascular diseases, central nervous system disorders, diabetes, depression, hypertension, inflammation, obesity and pain.
  • Appropriate protein targets putatively linked to disease indications include, but are not limited to (1) the leptin protein, putatively linked to obesity and diabetes; (2) a mitogen- activated protein kinase putatively linked to arthritis, osteoporosis and atherosclerosis; (3) the interleukin- 1 beta converting protein putatively linked to arthritis, asthma and inflammation; (4) the caspase proteins putatively linked to neurodegenerative diseases such as Alzheimer's, Parkinson's and stroke, and (5) the tumor necrosis factor protein putatively linked to obesity and diabetes.
  • Appropriate protein targets include also, but are not limited to, enzymes catalyzing the following types of reactions: (1) oxido-reductases, (2) transferases, (3) hydrolases, (4) lyases, (5) isomerases, and (6) ligases.
  • the arachidonic acid pathway constitutes one of the main mechanisms for the production of pain and inflammation.
  • the pathway produces different classes of end products, including the prostaglandins, thromboxane and leukotrienes.
  • Prostaglandins an end product of cyclooxygenase metabolism, modulate immune function, mediate vascular phases of inflammation and are potent vasodilators.
  • COX cyclooxygenase
  • Anti- inflammatory potencies of different NSAIDs have been shown to be proportional to their action as COX inhibitors. It has also been shown that COX inhibition produces toxic side effects such as erosive gastritis and renal toxicity. The knowledge base regarding the toxic side effects of COX inhibitors has been gained through years of monitoring human therapies and human suffering. Two kinds of COX enzymes are now known to exist, with inhibition of COX 1 related to toxicity, and inhibition of COX2 related to reduction of inflammation. Thus, selective COX2 inhibition is a desirable characteristic of new anti-inflammatory drugs.
  • the method of the invention can provide a route from identification of potential drug targets to validating these targets (for example, COX1 and COX2) as playing a role in disease (pain and inflammation) to an examination of the phenotype for the inhibition of one or both target isozymes without human suffering. Importantly, this information can be collected in vivo.
  • the method of the invention can be used to define the phenotype of "genes of unknown function" obtained from various human genome sequencing projects or to assess the phenotype resulting, from inhibition of one isozyme subtype or one member of a family of related protein targets.
  • Target (also, "target component of a cell,” or “target cell component”) a constituent of a cell which contributes to and is necessary for the production or maintenance of a phenotype of the cell in which it is found.
  • a target can be a single type of molecule or can be a complex of molecules.
  • a target can be the product of a single gene, but can also be a complex comprising more than one gene product (for example, an enzyme comprising alpha and beta subunits, mRNA, tRNA, ribosomal RNA or a ribonucleoprotein particle such as a snRNP).
  • Targets can be the product of a characterized gene (gene of known function) or the product of an uncharacterized gene (gene of unknown function).
  • Target Validation the process of determining whether a target is essential to the maintenance of a phenotype of the cell type in which the target normally occurs. For example, for pathogenic bacteria, researchers developing antimicrobials want to know if a compound which is potentially an antimicrobial agent not only binds to a target in vitro, but also binds to, and modulates the function of, a target in the bacteria in vivo, and especially under the conditions in which the bacteria are producing an infection — those conditions under which the antimicrobial agent must work to inhibit bacterial growth in an infected animal or human.
  • Phenotypic Effect a change in an observable characteristic of a cell which can include, e.g., growth rate, level or activity of an enzyme produced by the cell, sensitivity to various agents, antigenic characteristics, and level of various metabolites of the cell.
  • a phenotypic effect can be a change away from wild type (normal) phenotype, or can be a change towards wild type phenotype, for example.
  • a phenotypic effect can be the causing or curing of a disease state, especially where mammalian cells are refened to herein.
  • a phenotypic effect can be the slowing of growth rate or cessation of growth.
  • Biomolecule a molecule which can be produced as a gene product in cells that have been appropriately constructed to comprise one or more genes encoding the biomolecule. Production of the biomolecule can be turned on, when desired, by an inducible promoter.
  • a biomolecule can be a peptide, polypeptide, or an RNA or RNA oligonucleotide, a DNA or DNA oligonucleotide, but is preferably a peptide.
  • biomolecules can also be made synthetically.
  • peptides see Merrifield, J., J. Am. Chem. Soc. 85: 2140-2154 (1963).
  • an Applied Biosystems 431 A Peptide Synthesizer Perkin Elmer
  • Biomolecules produced as gene products intracellularly are tested for their interaction with a target in the intracellular steps described herein (tests performed with cells in culture and tests performed with cells that have been introduced into animals).
  • the same biomolecules produced synthetically are tested for their binding to an isolated target in an initial in vitro method described herein.
  • Synthetically produced biomolecules can also be used for a final step of the method for finding compounds that are competitive binders of the target.
  • Biomolecular Binder (of a target): a biomolecule which has been tested for its ability to bind to an isolated target cell component in vitro and has been found to bind to the target.
  • Biomolecular Inhibitor of Growth a biomolecule which has been tested for its ability to inhibit the growth of cells constructed to produce the biomolecule in an "in culture” test of the effect of the biomolecule on growth of the cells, and has been found, in fact, to inhibit the growth of the cells in this test in culture.
  • Biomolecular Inhibitor of Infection a biomolecule which has been tested for its ability to ameliorate the effects of infection, and has been found to do so.
  • pathogen cells constructed to regulably express the biomolecule are introduced into one or more animals, the gene encoding the biomolecule is regulated so as to allow production of the biomolecule in the cells, and the effects of production of the biomolecule are observed in the infected animals compared to one or more suitable control animals.
  • Isolated term used herein to indicate that the material in question exists in a physical milieu distinct from that in which it occurs in nature.
  • an isolated target cell component of the invention may be substantially isolated with respect to the complex cellular milieu in which it naturally occurs.
  • the absolute level of purity is not critical, and those skilled in the art can readily determine appropriate levels of purity according to the use to which the material is to be put.
  • the isolated material will form part of a composition (for example, a more or less crude extract containing other substances), buffer system or reagent mix.
  • the material may be purified to essential homogeneity, for example as determined by PAGE or column chromatography (for example, HPLC).
  • Pathogen or Pathogenic Organism an organism which is capable of causing disease, detectable by signs of infection or symptoms characteristic of disease.
  • Pathogens can include prokaryotes (which include, for example, medically significant Gram- positive bacteria such as Streptococcus pneumoniae, Enterococcus faecalis and Staphylococcus aureus, Gram-negative bacteria such as Escherichia coli, Pseudomonas aeroginosa and Klebsiella pneumoniae, and "acid- fast" bacteria such as Mycobacteria, especially M. tuberculosis), eukaryotes such as yeast and fungi (for example, Candida albicans and Aspergillus fumigatus) and parasites.
  • prokaryotes which include, for example, medically significant Gram- positive bacteria such as Streptococcus pneumoniae, Enterococcus faecalis and Staphylococcus aureus
  • Gram-negative bacteria such as Escherichia coli, Pseudomonas aeroginosa and Klebsiella pneumoniae
  • pathogens can include such organisms as soil-dwelling organisms and "normal flora" of the skin, gut and orifices, if such organisms colonize and cause symptoms of infection in a human or other mammal, by abnormal proliferation or by growth at a site from which the organism cannot usually be cultured.
  • compositions e.g., mixed bed multidimensional liquid chromatographs
  • methods for simultaneously identifying individual proteins in complex mixtures of biological molecules and quantifying the expression levels of those proteins e.g., proteome analyses.
  • the methods compare two or more samples of proteins, one of which can be considered as the standard sample and all others can be considered as samples under investigation.
  • the proteins in the standard and investigated samples are subjected separately to a series of chemical modifications, i.e., differential chemical labeling, and fragmentation, e.g., by proteolytic digestion and/or other enzymatic reactions or physical fragmenting methodologies.
  • the chemical modifications can be done before, or after, or before and after fragmentation/ digestion of the polypeptide into peptides.
  • Peptides derived from the standard and the investigated samples are labeled with chemical residues of different mass, but of similar properties, such that peptides with the same sequence from both samples are eluted together in the separation procedure and their ionization and detection properties regarding the mass spectrometry are very similar.
  • Differential chemical labeling can be performed on reactive functional groups on some or all of the carboxy- and/or amino- termini of proteins and peptides and/or on selected amino acid side chains.
  • a combination of chemical labeling, proteolytic digestion and other enzymatic reaction steps, physical fragmentation and/or fractionation can provide access to a variety of residues to general different specifically labeled peptides to enhance the overall selectivity of the procedure.
  • the standard and the investigated samples are combined, subjected to multidimensional chromatographic separation, and analyzed by mass spectrometry methods. Mass spectrometry data is processed by special software, which allows for identification and quantification of peptides and proteins.
  • LC-LC-MS/MS The combination of multidimensional liquid chromatography and tandem mass spectrometry can be called "LC-LC-MS/MS.” LC-LC-MS/MS was first developed by Link A. and Yates J.
  • proteins can be first substantially or partially isolated from the biological samples of interest.
  • the polypeptides can be treated before selective differential labeling; for example, they can be denatured, reduced, preparations can be desalted, and the like.
  • Conversion of samples of proteins into mixtures of differentially labeled peptides can include preliminary chemical and/or enzymatic modification of side groups and/or termini; proteolytic digestion or fragmentation; post-digestion or post-fragmentation chemical and/or enzymatic modification of side groups and/or termini.
  • the differentially modified polypeptides and peptides are then combined into one or more peptide mixtures. Solvent or other reagents can be removed, neutralized or diluted, if desired or necessary.
  • the buffer can be modified, or, the peptides can be redissolved in one or more different buffers, such as a "MudPIT" (see below) loading buffer.
  • the peptide mixture is then loaded onto chromatography column, such as a liquid chromatography column, a 2D capillary column or a multidimensional chromatography column, to generate an eluate.
  • the eluate is fed into a mass spectrograph, such as a tandem mass spectrograph.
  • a mass spectrograph such as a tandem mass spectrograph.
  • an LC ESI MS and MS/MS analysis is complete.
  • peptides can generated for mass spectrograph analysis.
  • Two or more samples can be differentially labeled by selective labeling of each sample.
  • Peptide modifications, i.e., labeling are stable.
  • Reagents having differing masses or reactive groups can be chosen to maximize the number of reactive groups and differentially labeled samples, thus allowing for a multiplex analysis of sample, polypeptides and peptides.
  • a "MudPIT" protocol is used for peptide analysis, as described herein.
  • the methods of the invention can be fully automated and can essentially analyze every protein in a sample.
  • the invention provides apparatus (e.g., mixed bed multi-dimensional liquid chromatographs) and methods for high throughput, comparative proteome characterization.
  • the invention provides a broad-based method for global profiling protein expression, which is a combination of differential peptides labeling, multidimensional chromatography coupled with mass spectrometry for separation, identification and quantification. Proteins are identified in complex mixtures with rapid speed, high sensitivity and accurate quantitative information. Using sets of labeling tags and modification methods, protein are differentially and efficiently modified with stable and flexible labeling.
  • the invention provides methods accurate and sensitive comparative proteomics in complex systems.
  • the invention provides compositions (e.g., mixed bed multidimensional liquid chromatographs) and methods for high throughput, comparative proteome characterization.
  • the goal is to provide a broad-based method for global profiling protein expression, which is a combination of differential peptides labeling, multi-dimensional chromatography coupled with mass spectrometry for separation, identification and quantification. This method significantly improves over traditional methods. Proteins are identified in complex mixture with rapid speed, high sensitivity and accurate quantitative information.
  • the invention provides novel approaches for modifying proteins differentially and efficiently with stable and flexible labeling.
  • the methods provide the speed and sensitivity for accurate comparative proteomics in complex systems.
  • invention provides: Differential peptide labeling Compare various modifications and identify the top candidate(s) Optimize reaction conditions for desired peptide/protein modification Method validation
  • ModPIT Multi-dimensional Protein Identification Technique
  • the invention provides a high throughput proteomics technology with high speed, high efficiency and accurate quantitation, which can be employed for quantitative analysis of global protein expression in complex samples, and the detection and quantitation of specific proteins in complex samples.
  • An exemplary high throughput, comparative proteomics method uses a model pathway study of Streptomyces diver sa (S. diver sa).
  • the use of mass spectrometry to identify proteins whose sequences are present in either DNA or protein databases is well established and integrated to the field of Proteomics.
  • One goal of Proteomics is to define the expressed proteins associated with a given cellular state, and another goal is to quantify changes in protein expression between cellular states. Many techniques have been developed to achieve these goals (see below).
  • the present invention provides a non-gel based method of identifying individual proteins in complex protein mixtures simultaneously and quantifying protein expression level globally. It overcomes the limitations inherent in traditional techniques. Comparative Proteomics Techniques 2D gel electrophoresis (2D GE) is the most commonly used technique in proteomics.
  • 2D GE proteins are separated by isoelectric focusing according to their PI difference in the first dimension and by electrophoresis mobility according to their molecular weight difference in the second dimension. Separated proteins are usually visualized by staining. Quantitation is achieved by comparing the spot density.
  • spot identification the method involves spot cutting, in gel digestion and peptide extraction. The next stage is analyzing these peptides using mass spectrometry or tandem mass spectrometry and database searching for identifications.
  • the disadvantages of 2D GE approach are that it is very time consuming and labor intensive, and it does not work well for hydrophobic proteins, proteins with extreme pi, and non-abundant proteins.
  • Isotope-coded affinity tag is one of the new non-gel based methodologies that have a great impact on proteome research 1 .
  • the method is based on a newly synthesized class of chemical reagents (ICAT) used in combination with tandem mass spectrometry.
  • the ICAT reagent contains a biotin affinity tag and a thiol specific reactive group (cysteine side chain), which are joined by a spacer domain available in two forms: regular (light), and isotopically heavy which includes eight deuterium atoms.
  • a reduced protein mixture representing one cell state is derivatized with the isotopically light version of the ICAT reagent, while the conesponding reduced protein mixture representing a second cell state is derivatized with the isotopically heavy version of the ICAT reagent.
  • the labeled samples are combined and proteolytically digested to produce peptide fragments.
  • the tagged cysteine containing peptide fragments are isolated by avidin affinity chromatography.
  • the isolated tagged peptides are separated and analyzed by microcapillary tandem mass spectrometry.
  • Differential isotopic labeling of peptides for global quantification of proteins 2 is another method used cu ⁇ ently, in which two different protein mixtures for quantitative comparison were digested to peptide mixtures.
  • the peptide mixtures were separately methylated using either dO- or d3-methanol, the mixtures of methylated peptide were combined, and subjected to microcapillary HPLC-MS/MS.
  • Parent proteins of methylated peptides were identified by co ⁇ elative database searching of fragment ion spectra using SEQUEST or automated de novo sequencing that compared all tandem mass spectra of dO- and d3 -methylated peptide ion pairs.
  • Ratios of proteins in the two original mixtures were calculated by normalization of the area under the curve for dO- to d3 -methylated peptide pairs.
  • differential labeling reagents relied on stable isotopes which are expensive and not flexible to differential labeling of more than two mixtures of peptides;
  • labeling methods are limited only to methylation of c-terminal;
  • protein expression profiling is limited to duplex comparison;
  • one dimensional capillary HPLC chromatography was employed to separate peptides, which doesn't have enough capacity and resolving power for complex mixtures of peptides.
  • the invention overcomes the shortcomings of the cu ⁇ ently available quantitative proteomics methods described above.
  • the technology of the present method has speed, high efficiency and accurate quantitation, which is employed for quantitative analysis of global protein expression in complex samples.
  • the basic approach described is employed for: (i) quantitative analysis of global protein expression in complex samples (such as cells, tissues, fractions and etc.), (ii) the detection and quantitation of specific proteins in complex samples, and (iii) quantitative measurement of specific enzymatic activities in complex samples.
  • Novelties of this approach include: (i) design of differential labeling reagents for peptides and methods for efficient peptide modification; (ii) multiplex analysis; (iii) combination of labeling by chemical modifications of termini and/or side chains of peptides; (iv) combination of chemical modification and proteolytic digestions in order to achieve the most favorable and selective chemical modification of peptides; (v) improvement of multidimensional chromatography for better protein peptide separation and identification.
  • Experimental Design and Methods The present application provides a non-gel based method of identifying individual proteins in complex protein mixtures simultaneously and quantifying protein expression level globally. It overcomes the limitations inherent in traditional techniques.
  • two or more samples of proteins are compared, one of which is considered as the standard sample and all others are considered as samples under investigation.
  • the proteins in the standard and investigated samples are subjected to a sequence of proteolytic digestion and/or other enzymatic reaction in separate tubes. Then, these digested peptides are modified (novel differential chemical labeling). Peptides derived from the standard and the investigated samples are labeled with chemical residues of different mass, but they have similar properties such that the differential labeled peptides are eluted together in the separation procedure and their ionization and fragmentation properties regarding the mass spectrometry are very similar.
  • the samples are combined, separated by multidimensional chromatography, and analyzed by mass spectrometry methods.
  • the combined mixtures of peptides are separated by improving a cunent chromatography method called Multidimensional Protein Identification Technique (MudPIT) 3 .
  • MudPIT Multidimensional Protein Identification Technique 3 .
  • Chemical transformations involved in differential labeling (1) Esterification of C-termini of the peptides and carboxylic acid groups in the side chains; (2) Amidation of C-termini of the peptides and carboxylic acid groups in the side chains; (might require protection of amine groups first); (3) Acylation of N-termini of the peptides and amino and hydroxyl groups in the side chains.
  • the esterification, amidation, and acylation reactions are performed on the mixtures of peptides in a fashion similar to other reactions of the types already described in previous part, or modified as needed in each particular case.
  • Reagents for differential labeling Mixtures of peptides coming from the standard protein samples and the investigated protein samples are labeled separately with differential reagents. These differential reagents differ in molecular mass, but do not differ in retention properties regarding the separation method used and in ionization and detection properties regarding the mass spectrometry methods used. Thus, these differential reagents differ either in their isotope composition (isotopical reagents) or they differ structurally by a rather small fragment, which change does not alter the properties stated above (homologous reagents). The obvious choices for such reagents are aliphatic alcohols, aliphatic amines, and aliphatic acids.
  • Isotopic reagents based on aliphatic alcohols, amines, or acids contain different amount of protons and deuterons in different reagents, e.g., CH 3 CH 2 OH and CD CD 2 OH (mass difference is 5 Da) or CH 3 CH 2 CO 2 H and CD 3 CD 2 CO 2 H (mass difference is 5 Da).
  • the homologous reagents differ from each other by the number of CH 2 moieties in their molecules, e.g., CH 3 OH and CH 3 CH 2 OH (mass difference is 14 Da) or CH 3 CO 2 H and CH 3 CH 2 CO 2 H (mass difference is 14 Da).
  • the alcohol reagents esterify peptide C-terminals and/or Glu and Asp side chains, the amines form amide bond with peptide C-terminals and/or Glu and Asp side chains, and the acids form amide bond with peptide N-terminals and/or Lys and Arg side chains.
  • Substituents may be introduced into the mass-labeling reagents in order to tune their retention, ionization, and detection properties.
  • Differential labeling progress The peptide esterification is performed using different alcohols.
  • Figure 2 shows one example: a peptide is differential labeled by one of the homologous reagent pairs. In this case: methanol and ethanol. The physical/chemical properties of those differential labeled peptide pairs was further tested, and it was found that they are very similar in terms of reverse phase LC elution and ionization efficiency. Differential labeled peptide pairs with a methyl group difference serve as ideal mutual internal standards for quantification.
  • FIG. 1 is an illustration of a MALDI MS spectrum of a peptide pairs.
  • peptides are differentially esterified by either methanol or ethanol. They have the identical sequence before the labeling.
  • Methods for peptide/protein separation, detection and analysis a. Peptide separation and detection
  • the cutting edge methodology that represents a significant step forward in proteome analysis is the use of multidimensional liquid chromatography coupled to tandem mass spectrometry (LC-LC-MS/MS), which was first developed by Link A. and Yates J. R. 4 ' 5 ' 6 and further improved by Washburn M., Wolters D., and Yates J. R. 3 .
  • the existence and further improvement of this technique are critical factors in the present approach for the application of complex peptide separation and full automation, which makes it the most ideal technology for high throughput proteomics.
  • MudPIT has been previously reported in various incarnations involving reverse phase columns coupled to either cation exchange columns or size exclusion columns 8 . However, it was only when the technique was employed with a mixed bed microcapillary column containing strong cation exchange (SCX) and reverse phase chromatography (RPC) resins that the true utility of MudPIT was demonstrated.
  • SCX strong cation exchange
  • RPC reverse phase chromatography
  • a discrete fraction of the absorbed peptides are displaced from the SCX column onto the RPC column using a step gradient of salt, causing the peptides to be retained on the RPC column while contaminating salts and buffers are washed through.
  • Peptides are then eluted from the RPC column using an acetonitrile gradient, and analyzed by MS/MS. This process is repeated using increasing salt concentration to displace additional fractions from the SCX column. This is applied in an iterative manner, typically involving 10-20 steps, and the MS/MS data from all of the fractions are analyzed by database searching 9 ' 10 and combined to give an overall picture of the protein components present in the initial sample.
  • the MudPIT technique can be run in a fully automated system.
  • the three-dimensional microcapillary columns of the invention are operably linked to tandem mass spectrographs (3D LC LC MS/MS), ion trap mass spectrographs or a combination of tandem mass spectrographs and ion trap mass spectrographs (LC-LCQ-MS/MS or LC-LTQ- MS/MS), as described herein.
  • a three-dimensional microcapillary system of the invention can provide rapid metabolite identification and proteomic profiling to accelerate drug discovery and development. See Example 3, below, and Figures 4, 14 and 22 for exemplary 3D LC apparatus of the invention.
  • the novel three-dimensional microcapillary columns of the invention can be used to improve on MudPIT techniques.
  • the three-dimensional microcapillary columns of the invention also comprise tandem mass spectrometers ("3D LC LC MS/MS", as described herein), an ion trap mass spectrometer (LCQ or LTQ), such as a Finnigan LCQ Deca XPTM or MDLC LTQTM (Thermo Electron Co ⁇ oration, San Jose, CA) ion trap mass spectrometer, or Agilent's LC/MSD Trap (Agilent Technologies, Palo Alto, CA), or an equivalent mass spectrometer, or a combination of tandem mass spectrometry and ion trap mass spectrometry ("3D LC LCQ MS/MS” or "3D LC LTQ MS/MS", as described herein).
  • the MDLC LTQTM is the Finnigan LTQ FT
  • the Agilent LC/MSD Trap is an 1100 series LC/MSD TRAPTM, or, the LC/MSD Trap SLTM, or, the LC/MSD Trap XCT TM (Agilent Technologies, Palo Alto, CA).
  • the invention uses the 3D LC MS/MS apparatus and methods of the invention, the invention provides a rapid one-fraction protocol for protein extraction, e.g., a rapid one-fraction protocol for extraction, fractionation and/or isolation of proteins of a proteome.
  • the 3D LC MS/MS, 3D LC LCQ MS/MS or 3D LC LTQ MS/MS apparatus and methods of the invention can be used to fractionate/ isolate 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%,
  • the 3D LC MS/MS, 3D LC LCQ MS/MS or 3D LC LTQ MS/MS apparatus and methods of the invention provide a one-fraction protocol to fractionate/ isolate 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21 %, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%,
  • a denatured and reduced protein mixture is digested with trypsin to produce peptide fragments. Without desalting, the mixture is directly loaded onto a microcapillary column containing RPC resin, SCX resin and RPC resin, accordingly, eluted directly into a tandem mass spectrometer. A discrete fraction of the absorbed peptides are displaced from the first RPC to the SCX section using a reverse phase gradient (0-X%).
  • This fraction of peptides are retained onto SCX section and then sub-fractionated from the SCX column onto the RPC column using a step gradient of salt, causing part of the peptides to be eluted and retained on the last RPC section while contaminating salts and buffers are washed through.
  • Peptides are then eluted from the RPC column using the same reverse phase gradient (0-X%), and analyzed by MS/MS. This process is repeated using increasing salt concentration to displace additional sub-fractions from the SCX column following each step by a reverse phase gradient. Once the completion of the whole sequence of salt steps, next cycle begins with a higher reverse phase gradient (0-Y%, Y>X).
  • FIG. 3 illustrates 3D LC set-up and process.
  • the mixed bed multi-dimensional liquid chromatographs of the invention (designated 3D LC MS, or, 3D LC MS/MS; see Example 3, below, and Figures 3, 4, 14 and 22 for exemplary 3D LC apparatus of the invention) are fully automated apparatus techniques using LC in combination with mass spectrometry and database search for highly complex mixtures.
  • the 3D LC MS, or, 3D LC MS/MS of the invention is competitive toward the 2D GE technique in the following terms. It is universal, identifies proteins with extremes in pi, MW, and wide variety of protein classes. It can access hydrophobic proteins. It has high sensitivity, peak capacity and gives dynamic range greater than 10,000 to 1. It is time and labor efficient with its automatic workflow.
  • the mixed bed multi-dimensional liquid chromatographs (e.g., 3D LC) of the invention play an important role on both qualitative proteomics as well as quantitative proteomics with the combination of novel tagging method (see Examples 3, 4, and 5, below).
  • the chromatographs and methods of the invention are used to analyze the entire proteome of a cell, e.g., a microorganism, such as Bacillus anthracis and Desulfovibrio vulgaris).
  • a microorganism such as Bacillus anthracis and Desulfovibrio vulgaris.
  • Sequence analysis and quantification Both quantity and sequence identity of the protein from which the modified peptide originated is determined by multistage MS. This is achieved by the operation of the mass spectrometer in a dual mode in which it alternates in successive scans between measuring the relative quantities of peptides eluting from the capillary column and recording the sequence information of selected peptides.
  • Peptides are quantified by measuring in the MS mode the relative signal intensities for pairs or series of peptide ions of identical sequence that are tagged differentially, which therefore differ in mass by the mass differential encoded within the differential labeling reagents.
  • Peptide sequence information is automatically generated by selecting peptide ions of a particular mass-to-charge (m/z) ratio for collision-induced dissociation (CID) in the mass spectrometer operating in the tandem MS mode 6 ' 1 ' ,12 .
  • CID collision-induced dissociation
  • a program such as SEQUESTTM (Thermo Finnigan, San Jose, CA) or equivalent, e.g., U.S. Patent Nos. 6,017,693 and 5,538,897, can be used to inspect/ analyze the spectra with multiple peaks (e.g., more than 7 peaks/spectrum) for potential duplicates (see discussion in Example 3, below).
  • the spectra comparisons are carried out using a dot-product criteria, e.g., as in Stein (1997) Am. Soc. Mass Spectrom. 5:859, in combination with the retention time, precursor m/z constraints, and index-peak matching.
  • data acquired from the differentially labeled peptides are subjected to the following exemplary data analysis algorithm of the invention: 1.
  • Component extraction comprising the following sub-steps: a. For every MS spectrum from the beginning of the LC elution, select the "significant" ions, which are above the local noise background and 17 contain predominately C isotopes. b. For every "significant" ion, generate a "selected ion chromatogram" using the neighboring MS spectra. In one aspect, the width of the region is at least 2X of the expected width of the peptide elution (DO). c. Determine the peak location, quality, area and baseline level based on the "selected ion chromatogram". d.
  • the spectra equivalency is declared if the spectra pair satisfy the following requirements: 1. Their precursor m/z values are within the pre-defined tolerance; 2. Their elution times are within a pre-defined tolerance; 3. Their "signature” peaks achieved a pre-defined degree of match; and 4. Their "dot-products" in both forward and backward direction exceed pre-defined thresholds, b.
  • the duplicated spectra are merged based on the m/z position of the peaks.
  • the elution times of the first (TI) and last (T2) spectra are stored as a part of the description of the merged spectrum.
  • the intensity of the precursor ions is calculated from the MSI spectra by integrating the region where the precursor ions are detected.
  • FIG. 17 is a schematic, a flow chart, illustrating an exemplary data analysis algorithm of the invention for quantitative proteomics.
  • Figure 18 is a schematic, a flow chart, illustrating the "component extraction” section of the exemplary data analysis algorithm for quantitative proteomics illustrated in Figure 17.
  • Figure 19 is a schematic, a flow chart, illustrating the "precursor integration” section of the exemplary data analysis algorithm for quantitative proteomics illustrated in Figure 17.
  • Figure 20 is a schematic, a flow chart, illustrating the "spectra comparison" section of the exemplary data analysis algorithm for quantitative proteomics as illustrated in Figure 19.
  • Figure 21 is a schematic, a flow chart, illustrating the "identity and merge of duplicates LC-MS spectra" section of the exemplary data analysis algorithm for quantitative proteomics as illustrated in Figure 19.
  • the invention provides data analysis algorithms as illustrated in Figure 17, and further described in Figures 18 to 21 , in whole, and/or, in part.
  • the data analysis algorithm described in Figure 17, and further described in Figures 18 to 21, in whole, or, in part, can be used to analyze data generated by the systems and methods of the invention. For example, this analysis can be used to reconstruct a series of differentially labeled peptides based on a predictable elution behavior in combination with the predicted mass differences, which can be generated by the systems and methods of the invention.
  • this algorithm in whole, or, in part, can be used to analyze data generated by other applications, e.g., to analyze data generated by any LC, MS, LC-MS or other analytical system.
  • Computer Systems and computer program products In one aspect, the invention provides computer program products comprising computer-implemented methods and/or programs comprising data analysis algorithms as described in Figure 17, and further described in Figures 18 to 21, in whole, or, in part.
  • the invention provides computer systems, e.g., comprising computer program products, operably linked to the multidimensional columns of the invention, or the 3D LC LC MS/MS or 3D LC LCQ MS/MS systems of the invention.
  • the invention provides a storage medium (e.g., a diskette, a tape, a CD, a hard drive, a memory chip) with a computer program of the invention (e.g., a computer-implemented method, a data-analysis algorithm of the invention) stored thereon.
  • a computer program of the invention e.g., a computer-implemented method, a data-analysis algorithm of the invention
  • the invention provides computer program products comprising a computer useable medium having computer program logic recorded thereon, where computer program code logic is configured to perform operations comprising the computer- implemented methods, the data-analysis algorithms, of the invention.
  • the invention provides computer systems comprising a processor and a computer program product of the invention.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

L'invention concerne des chromatographes liquides multi-dimensionnels à lit mélangé et des procédés de fabrication et d'utilisation. L'invention concerne des systèmes comprenant les chromatographes liquides multi-dimensionnels à lit mélangé de l'invention, reliés de manière fonctionnelle à des dispositifs de spectrométrie de masse. L'invention concerne des nouveaux systèmes et des procédés pour déterminer les profils des polypeptides et les modifications d'expression de protéines, tels que des analyses du protéome. L'invention concerne également des systèmes et des procédés pour identifier et quantifier de manière simultanée des protéines individuelles dans des mélanges de protéines complexes par marquage différentiel sélectif de résidus d'aminoacides puis par une analyse chromatographique et spectrométrique de masse. L'invention concerne également des produits de programme informatique et des procédés utilisant l'informatique, destinés à mettre en pratique les systèmes et procédés de l'invention.
PCT/US2004/017647 2003-06-06 2004-06-04 Systemes de chromatographie multi-dimensionnels a lit melange et procedes de fabrication et d'utilisation associes WO2005000226A2 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US47654003P 2003-06-06 2003-06-06
US60/476,540 2003-06-06
US49202703P 2003-08-01 2003-08-01
US60/492,027 2003-08-01

Publications (2)

Publication Number Publication Date
WO2005000226A2 true WO2005000226A2 (fr) 2005-01-06
WO2005000226A3 WO2005000226A3 (fr) 2005-05-06

Family

ID=33555420

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2004/017647 WO2005000226A2 (fr) 2003-06-06 2004-06-04 Systemes de chromatographie multi-dimensionnels a lit melange et procedes de fabrication et d'utilisation associes

Country Status (1)

Country Link
WO (1) WO2005000226A2 (fr)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007139471A2 (fr) * 2006-05-31 2007-12-06 Ge Healthcare Bio-Sciences Ab Procédé de chromatographie
EP2215460A1 (fr) * 2007-11-26 2010-08-11 Waters Technologies Corporation Etalons internes et procédés destinés à être utilisés pour mesurer quantitativement des analytes dans un échantillon
CN102498394A (zh) * 2009-09-07 2012-06-13 邦尼克本鲁克公司 用于三维色谱法的分离体
CN107607642A (zh) * 2017-09-06 2018-01-19 上海烟草集团有限责任公司 一种鉴定烟草中蛋白与蛋白组的多维液相色谱质谱联用法
CN109813824A (zh) * 2017-11-22 2019-05-28 中国科学院大连化学物理研究所 一种植物样品前处理方法
WO2019152352A1 (fr) * 2018-01-31 2019-08-08 Regeneron Pharmaceuticals, Inc. Système lc-ms à double colonne et ses procédés d'utilisation
CN110869768A (zh) * 2017-08-01 2020-03-06 安进公司 实时制备用于质谱分析的多肽样品的系统和方法
US10865224B2 (en) 2012-06-29 2020-12-15 Emd Millipore Corporation Purification of biological molecules

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020155614A1 (en) * 2001-02-21 2002-10-24 Tomlinson Andrew J. Peptide esterification
US20020164809A1 (en) * 2000-10-23 2002-11-07 Genetics Institute, Inc. Acid-labile isotope-coded extractant (ALICE) and its use in quantitative mass spectrometric analysis of protein mixtures

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020164809A1 (en) * 2000-10-23 2002-11-07 Genetics Institute, Inc. Acid-labile isotope-coded extractant (ALICE) and its use in quantitative mass spectrometric analysis of protein mixtures
US20020155614A1 (en) * 2001-02-21 2002-10-24 Tomlinson Andrew J. Peptide esterification

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007139471A3 (fr) * 2006-05-31 2008-01-24 Ge Healthcare Bio Sciences Ab Procédé de chromatographie
WO2007139471A2 (fr) * 2006-05-31 2007-12-06 Ge Healthcare Bio-Sciences Ab Procédé de chromatographie
EP2215460A1 (fr) * 2007-11-26 2010-08-11 Waters Technologies Corporation Etalons internes et procédés destinés à être utilisés pour mesurer quantitativement des analytes dans un échantillon
EP2215460A4 (fr) * 2007-11-26 2010-12-29 Waters Technologies Corp Etalons internes et procédés destinés à être utilisés pour mesurer quantitativement des analytes dans un échantillon
CN102498394A (zh) * 2009-09-07 2012-06-13 邦尼克本鲁克公司 用于三维色谱法的分离体
CN102498394B (zh) * 2009-09-07 2014-11-26 邦尼克本鲁克公司 用于三维色谱法的分离体
US10865224B2 (en) 2012-06-29 2020-12-15 Emd Millipore Corporation Purification of biological molecules
CN110869768A (zh) * 2017-08-01 2020-03-06 安进公司 实时制备用于质谱分析的多肽样品的系统和方法
CN110869768B (zh) * 2017-08-01 2023-11-21 安进公司 实时制备用于质谱分析的多肽样品的系统和方法
CN107607642B (zh) * 2017-09-06 2020-12-29 上海烟草集团有限责任公司 一种鉴定烟草中蛋白与蛋白组的多维液相色谱质谱联用法
CN107607642A (zh) * 2017-09-06 2018-01-19 上海烟草集团有限责任公司 一种鉴定烟草中蛋白与蛋白组的多维液相色谱质谱联用法
CN109813824A (zh) * 2017-11-22 2019-05-28 中国科学院大连化学物理研究所 一种植物样品前处理方法
CN109813824B (zh) * 2017-11-22 2021-11-26 中国科学院大连化学物理研究所 一种植物样品前处理方法
CN111699392A (zh) * 2018-01-31 2020-09-22 瑞泽恩制药公司 双柱lc-ms系统及其使用方法
WO2019152352A1 (fr) * 2018-01-31 2019-08-08 Regeneron Pharmaceuticals, Inc. Système lc-ms à double colonne et ses procédés d'utilisation
US10908166B2 (en) 2018-01-31 2021-02-02 Regeneron Pharmaceuticals, Inc. Dual-column LC-MS system and methods of use thereof
JP2021513060A (ja) * 2018-01-31 2021-05-20 リジェネロン・ファーマシューティカルズ・インコーポレイテッド デュアルカラムlc−msシステムおよびその使用方法
US11435359B2 (en) 2018-01-31 2022-09-06 Regeneron Pharmaceuticals, Inc. Dual-column LC-MS system and methods of use thereof
US11740246B2 (en) 2018-01-31 2023-08-29 Regeneron Pharmaceuticals, Inc. Dual-column LC-MS system and methods of use thereof

Also Published As

Publication number Publication date
WO2005000226A3 (fr) 2005-05-06

Similar Documents

Publication Publication Date Title
Dreger Subcellular proteomics
CA2424178A1 (fr) Manipulation de cellule entiere par mutagenese d'une partie substantielle d'un genome de depart, par combinaison de mutations et eventuellement par repetition
CA2714641C (fr) Procede d'identification d'un anticorps ou d'une cible
Matic et al. In vivo identification of human small ubiquitin-like modifier polymerization sites by high accuracy mass spectrometry and an in vitro to in vivo strategy
EP2209893B1 (fr) Utilisation d'aptamères en protéomique
JP4166572B2 (ja) 新規プロテオーム解析法及びそのための装置
US20030044864A1 (en) Cellular engineering, protein expression profiling, differential labeling of peptides, and novel reagents therefor
EP1739424B1 (fr) Analyse protéomique rapide et quantitative et procédés associés
WO2005000226A2 (fr) Systemes de chromatographie multi-dimensionnels a lit melange et procedes de fabrication et d'utilisation associes
WO2002039120A1 (fr) Procede d'identification du proteome de cellules utilisant un microreseau de banques d'anticorps
Thannhauser et al. A workflow for large‐scale empirical identification of cell wall N‐linked glycoproteins of tomato (Solanum lycopersicum) fruit by tandem mass spectrometry
EP1319954A1 (fr) Methodes d'analyse des protéines qui utilisent des reseaux de capture
Chia et al. Knockout of the Hmt1p arginine methyltransferase in Saccharomyces cerevisiae leads to the dysregulation of phosphate-associated genes and processes
AU2002360240A1 (en) Cellular engineering, protein expression profiling, differential labeling of peptides, and novel reagents therefor
Bradshaw On the development of proteomics: a brief history
Neubauer The analysis of multiprotein complexes: the yeast and the human spliceosome as case studies
SEQUENCING Article Watch: September 2021
Hai et al. Proteomics technology as an investigative tool in plant science
Hessmann Development of analytical strategies in quantitative proteomic: quantitation of host cell proteins by mass spectrometry as a quality control tool for the biopharmaceutical industry
Thompson et al. Article Watch: March, 2020
Namasivayam Proteomics: techniques, applications and challenges
Zürbig et al. Peptidomics approach to proteomics
Mahajan et al. Proteomics: taking over where genomics leaves off
Nukui Glomerular expression profiling and novel proteins in normal mouse kidney and adriamycin-induced nephrosis
Butt Improved Methods for Proteome Analysis in Model Organisms

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase