WO2004108901A2 - A library of phylogenetically related sequences - Google Patents
A library of phylogenetically related sequences Download PDFInfo
- Publication number
- WO2004108901A2 WO2004108901A2 PCT/US2004/017903 US2004017903W WO2004108901A2 WO 2004108901 A2 WO2004108901 A2 WO 2004108901A2 US 2004017903 W US2004017903 W US 2004017903W WO 2004108901 A2 WO2004108901 A2 WO 2004108901A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- amino acid
- sequences
- acid sequence
- peptides
- sequence combinations
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/68—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
- G01N33/6803—General methods of protein analysis not limited to specific proteins or families of proteins
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P25/00—Drugs for disorders of the nervous system
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P25/00—Drugs for disorders of the nervous system
- A61P25/04—Centrally acting analgesics, e.g. opioids
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P25/00—Drugs for disorders of the nervous system
- A61P25/08—Antiepileptics; Anticonvulsants
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P25/00—Drugs for disorders of the nervous system
- A61P25/14—Drugs for disorders of the nervous system for treating abnormal movements, e.g. chorea, dyskinesia
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P25/00—Drugs for disorders of the nervous system
- A61P25/14—Drugs for disorders of the nervous system for treating abnormal movements, e.g. chorea, dyskinesia
- A61P25/16—Anti-Parkinson drugs
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P25/00—Drugs for disorders of the nervous system
- A61P25/18—Antipsychotics, i.e. neuroleptics; Drugs for mania or schizophrenia
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P25/00—Drugs for disorders of the nervous system
- A61P25/20—Hypnotics; Sedatives
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P25/00—Drugs for disorders of the nervous system
- A61P25/22—Anxiolytics
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P25/00—Drugs for disorders of the nervous system
- A61P25/24—Antidepressants
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P43/00—Drugs for specific purposes, not provided for in groups A61P1/00-A61P41/00
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/20—Sequence assembly
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B35/00—ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B35/00—ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
- G16B35/20—Screening of libraries
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/60—In silico combinatorial chemistry
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B10/00—ICT specially adapted for evolutionary bioinformatics, e.g. phylogenetic tree construction or analysis
Definitions
- the invention relates to biotechnology generally and more particularly to a method of preparing a library of nucleic acid or amino acid sequences and the resulting libraries.
- Sequence homology has been a very versatile tool that can be employed to assist in numerous tasks, from establishing the function of a gene to determination of the evolutionary development of an organism. Numerous specialized tools have been established in the public domain, which serve to align homologous sequences.
- sequence alignments have been used to identify conserved regions (i.e., those regions of a nucleic acid or protein where all members of the alignment have the same nucleotide or amino acid). More recently, conservative substitutions, non-identical nucleotides or amino acids, have been identified and included with the analysis of conserved regions.
- conserved amino acid positions may be viewed as those positions which generally may not be altered without loss or reduction of biological activity. Not withstanding the foregoing, conservative substitutions may be made in conserved positions. Conversely, non-conserved positions are traditionally viewed as positions lacking a clear or necessary role in the biological function of the protein. As a result, assays directed at identifying the function of a domain traditionally ignore, or only tangentially address, the biological role of the non-conserved positions.
- nucleic acid sequences which, for example, serve as binding sites for transcription factors, or other similar functions, such as t-RNAs and ribozymes, are expected to show conservation at the nucleotide level.
- Nucleic acid sequences, which encode a protein, may be considered to be homologous when the non-conserved nucleotides do not change the encoded amino acid. Thus, due to the degeneracy of the genetic code, silent mutations or changes in the coding sequence may be ignored when considering the effect on the gene product.
- sequence homology identifies conserved positions in multiple sequences and where the function of the conserved positions is unknown, the presence of conservation between sequences may be used to infer the presence of a functional domain. However, the function of the presumed domain must still be determined.
- Identifying a function for a molecular domain has traditionally been done by studying the function of an individual member sequence and then imputing that function to the other members of the family.
- this approach suffers from the limitations imposed by the use of a single member sequence.
- an individual member sequence may not be compatible with a particular assay used to determine function.
- a potential family of ligands, having unknown function is traditionally tested using an individual member ligand and assaying for binding to a class of receptors.
- an individual member may exhibit a binding preference inconsistent with the particular receptor source (e.g., mouse versus human) used in the assay and prevent the identification of function.
- the function of a family of ligands may be screened using a combinatorial approach.
- a library may be generated and screened for binding to a class of receptors.
- the library is generated by holding constant those positions having identity and randomizing non-conserved positions. This approach requires the screening of a large number of sequences.
- This combinatorial approach suffers from a number of limitations.
- the invention overcomes the limitations of the individual member and combinatorial approaches.
- the invention includes a method of identifying conserved and non-conserved substitutions within a set of sequences, wherein one or more non-conserved substitution exhibits desired properties.
- the invention further relates to a method of assaying phylogenetically related sequences including conservative substitutions.
- amino acid and nucleic acid sequences of the invention interact with specific molecules.
- amino acid sequences which specifically bind to a receptor or a receptor subtype or nucleic acid sequences that specifically bind to a ligand binding molecule are bind to specific molecules.
- the invention further relates to nucleic acids that encode polypeptide sequences.
- the invention provides an alignment of phylogenetically related sequences integrated with methods of generating a set of sequences based on the composition of the phylogenetically related sequences.
- the set of sequences (a library or a cladistic library) utilizes the members observed at each position, conserved positions and allowed substitutions (conservative substitutions and non-conservative substitutions, including indels), to reduce the complexity of a library.
- the members observed at each position of the sequence alignment are identified and from this information a set of sequence combinations composed of the union of observed members at each position is generated. Thus, the number of sequences which must be generated is reduced.
- a library or set of sequences prepared by the methods of the invention extends the possible benefits conferred by each allowed substitution to combinations not present in the natural sequences.
- conservative substitutions occupying a position may be treated as being identical to a single residue occupying that position. Accordingly, a representative of the conservative substitutions is selected and the conservative substitutions at that position are deemed to be equivalent to the chosen representative.
- the choice of residue deemed to represent the members of the conservative substitutions may be selected based on the frequency of occurrence in the sequence alignment or on other criteria known in the art.
- the invention further relates to sequences (peptides and/or nucleic acids) identified from sets of sequences generated by the methods disclosed herein.
- sequences peptides and/or nucleic acids identified from sets of sequences generated by the methods disclosed herein.
- the sequence is a polypeptide, such as a conopeptide
- the identified sequence specifically binds to a target receptor.
- the invention also relates to nucleic acids which encode peptides, for example, nucleic acids encoding peptides that specifically bind to a receptor.
- nucleic acid sequences having a function other than encoding a peptide for example, ribozymes, promoter elements, regulatory elements, splicing signals, polyadenylation signals and tRNAs.
- the invention relates to a method of generating a set of possible amino acid sequence combinations, wherein the amino acid sequences are analyzed to create an alignment, and a set of phylogenetically related sequences are selected.
- the observed amino acid residue(s) or indel(s) occupying each position in the alignment of the selected phylogenetically related sequences are identified and used to generate a set of possible amino acid sequence combinations, wherein the sequence combinations are composed of the union of the observed amino acid residues or indels identified at each position.
- the invention relates to a method of generating a set of possible nucleic acid sequence combinations, wherein nucleic acid sequences are analyzed to create an alignment, and a set of phylogenetically related sequences are selected from the analyzed sequences.
- the observed nucleic acid residue(s) or indel(s) occupying each position in the alignment of the selected phylogenetically related sequences are identified and used to generate a set of possible nucleic acid sequence combinations, wherein the sequence combinations are composed of the union of the observed nucleic acid residue or indels identified at each position.
- the invention further relates to a method of generating a set of possible nucleic acid sequence combinations, wherein amino acid sequences are analyzed to create an alignment and a set of phylogenetically related sequences are selected.
- the observed amino acid residue(s) or indel(s) occupying each position in the alignment of the selected phylogenetically related sequences are identified and used to generate a set of possible nucleic acid sequence combinations, wherein the nucleic acid sequence combinations encode a set of polypeptides composed of the union of the observed amino acid residues or indels identified at each position.
- the invention also relates to screening or selecting a set of possible amino acid or nucleic acid sequence combinations to identify ligand binding pair members and isolating an identified individual ligand binding pair member or a mixed population of identified individual ligand binding pair members. Further, the identified ligand binding pair member may be produced by chemical synthesis or produced in a recombinant host. A set of possible amino acid or nucleic acid sequence combinations may be screened using phage display, arrays or other ligand presentation systems.
- the invention also relates to a set of sequences, wherein the set of sequences is the union of observed members at each position of an alignment.
- the phylogenetically related sequences have a functional relationship (for example, ⁇ -, ⁇ -, ⁇ -, ⁇ - 5 ⁇ -, ⁇ -, ⁇ - and/or ⁇ -conopeptides), wherein the sequences form a clade or a part of a clade.
- the invention further relates to computer programs which execute the methods of the invention.
- the invention also relates to sets of sequences (amino acid or nucleic acid) wherein individual sequences of the set occupy known and/or isolated locations, for example, microarrays, biochips or chips.
- FIG. 1 shows the phylogenetic relationship of numerous conotoxin sequences from the extensive Cognetix conotoxin database. Six of the amino acid sequences are illustrated along with their relationship to other sequences, which correspond to SEQ ID NOs:l, 2, 3, 4, 5 and 6, respectively.
- FIG. 2 shows the six amino acid sequences illustrated in FIG. 1 and the phylogenetic relationship of the sequences.
- Conopeptides which constitute a large source of phylogenetically related sequences of the invention, are small gene products (8-60, more typically 10-40 residues) derived from Conus snail venoms, often stabilized by disulfide bonding between highly conserved cysteine residues (Norton and Pallaghy, 1998). Conopeptide precursors are expected to conform to a three-part structure (signal-propeptide-mature). The disulfide bonds of the conopeptides provide a structural scaffold that tolerates high variability in the intercysteine loops. The high degree of variability allows for the targeting of diverse receptors.
- the "six-cysteine, four-loop" scaffold (C...C...CC...C...C) is shared by conopeptides targeting multiple subtypes of voltage-gated sodium, calcium and potassium channels, including three different sites on sodium channels (Mclntosh et al, 1999).
- Conopeptides are highly selective receptor ligands, which has facilitated their use as pharmacological tools (Mclntosh et al, 1999), and resulted in substantial interest in their potential as neuronal drugs (Bowersox, S. S., and R. Luther, 1998).
- the spacing of cysteine residues appears to be important for productive folding of such peptides (Drakopoulou et al, 1998) and the number of naturally occurring conopeptide scaffolds so far identified is limited.
- These scaffolds define large hypervariable families that may share a common evolutionary origin (Conticello et al, 2001).
- One representative embodiment of the invention provides a method of identifying conopeptides, using the natural variation of the peptides to identify combinations of allowed substitutions having a desired property. As demonstrated herein, the invention is also applicable to a wide range of sequence alignments.
- Conus venom systems are ideally suited for such an approach, since conopeptide-encoding transcripts are relatively short (about 0.5 kb) and highly expressed.
- Sequencing of over 2,000 cDNA clones and PCR products from five different Conus species provided a data set of 170 distinct conopeptide precursor sequences from eight gene families representing three cysteine scaffold superfamilies (Conticello et al, 2001).
- Numerous conopeptide precursor sequences have now been identified from all eight superfamilies and a multitude of families within each superfamily (Jones et al, 2001).
- Conopeptide diversity is a reflection of a targeted mutagenic mechanism to generate high variability and subsequent diversifying selection.
- D s the number of synonymous substitutions per synonymous site (Nei and Gojobori, 1986) is an adequate representation of the mutation rate.
- D s in the mature peptide region is significantly higher than for the signal domain, with the propeptide region in most families exhibiting an intermediate value.
- the apparent mutation rates for the mature domain of conopeptides are elevated by about an order of magnitude relative to the signal peptide. Thus, there is hypervariability of intercysteine residues in the mature region of conopeptides.
- a single amino acid change can have a significant impact on receptor subtype specificity (Luo et al, 1990).
- a single amino acid substitution is sufficient to alter the receptor subtype selectivity profile by two orders of magnitude (Luo et al, 1990).
- an amino acid substitution may result in different biological activity.
- Wells et al, 1987 showed that exchanging a limited set of amino acids from one protein into another can carry the function of those amino acids into the new chimeric protein. Therefore, the invention utilizes allowed amino acid substitutions observed at each position as a source for new, improved, or altered function, such as targeting molecules or receptors.
- cysteines which are necessary for disulfide bond formation and the protein architecture, are very highly conserved. This conservation extends to the individual cysteine codons, TGT and TGC, where there is a strong bias for one or the other codon at different positions in the alignments (Conticello et al. 2001). Since different codons are conserved at different positions, a simple codon bias cannot be responsible, especially in view of the extremely hypervariable environment of the mature domain. Thus, the cysteine organization and/or codon preference may serve as a basis for the assignment of conopeptides to particular superfamilies and families.
- the ability to sequence hundreds or thousands of conopeptides genes provides the ability to generate very large phyolgenetic trees. However, this information does not address the molecular biology or the peptide chemistry necessary to ascertain the function of the gene products.
- the invention combines bioinformatics data and phylogenetic relations to construct a peptide or nucleic acid library that is both practical and provides the necessary ability to address the molecular function of the gene products. In this manner, an effective integration of molecular biology, bioinformatics, and peptide chemistry is provided.
- One approach to addressing function involves the generation of a library of peptides where all non-conserved amino acids are randomized, which is referred to as a combinatorial approach (Jeffrey D.
- a peptide pool having more than about 1 X 10 1 different sequences will result in only one copy of each sequence per 0.3 ⁇ g of peptide (COMBINATORIAL PEPTIDE AND NONPEPTIDE LIBRARIES: A HANDBOOK 238 (G ⁇ nfher Jung ed., 1996)).
- a 4.0 ⁇ mol peptide synthesis can only generate a limited number of individual sequences. For example, it is estimated that a 4.0 ⁇ mol peptide synthesis can
- the library should contain less than about 8 X 10 individual sequences/synthesis. Since a single copy of any one member is frequently insufficient the number of individual sequences that can be synthesized in any single synthesis reaction decreases rapidly. While multiple synthesis reactions may be conducted and combined, an upper limit of peptide solubility exists, which likewise limits the number of individual sequences that can be screened or selected per unit volume. Furthermore, as the number of unique peptide sequences increases, the ability to properly fold any one sequence is decreased. As a hypothetical example, an alignment of highly conserved peptides
- the present invention allows the generation of a practical number of peptides that retain function. Natural selection functions to eliminate deleterious mutations and introduce function-enhancing changes, hi the case of a Conus toxin, for example, a mutation that prevents the function of a toxin reduces the effectiveness of the snail's venom and results in the capture of less prey (food). Therefore, such a mutation will be selected against and eventually be eliminated from the population. In contrast, a mutation that enhances the effectiveness or function of a toxin will increase the fitness of the organism (increase prey capture) and will be positively selected.
- the invention utilizes allowed substitutions as a source of variation having positive effects and likely retaining the function of the peptide (e.g., venom toxin).
- the novel approach of the invention allows the number of peptides required to be dramatically reduced and increases the percentage of functional peptides.
- the invention provides the practical ability to synthesize the required peptides and increases the concentration of any one sequence in the set.
- the invention also includes a set (library) of sequences having new sequences likely to confer an activity not present in the original representatives.
- a individual member approach tests the function of individual members one at a time.
- the individual member approach reduces the number of sequences to one and makes the synthesis of the sequence readily obtainable.
- the individual approach is necessarily limited to ascertaining the function and properties of that one member.
- the approach provides no information regarding the function of sequences other that the tested sequence.
- the individual approach may not identify a function where the assay and the individual member selected are incompatible.
- the individual member approach is extremely labor and time intensive where multiple sequences must be assayed for different molecular targets.
- the individual member approach cannot address combinations of allowed substitutions not present in an identified sequence.
- the invention provides significant advantages over the individual member approach and performs a different function.
- the sequences of the invention are useful in the treatment of disease and adverse medical conditions.
- Numerous diseases have been proposed to be treated with conopeptides, including neurological disorders, such as epilepsy, multiple sclerosis, Parkinson's disease, Huntington's disease, schizophrenia, and other conditions, such as pain, anxiety, depression and sleep disorders.
- Neuronal receptors such as the NMDA receptor
- receptors such as the NMDA receptor
- receptors exist in multiple subtypes. These subtypes frequently bind to different molecules (antagonists or agonists) and provide different responses. Because receptors exist as different subtypes, the treatment of many diseases is best accomplished by the production of highly selective treatments that effect specific receptor subtypes.
- peptides of the invention may bind to a specific receptor or receptor subtype and can be used to target specific receptors.
- the peptides may be used in assays for this receptor.
- the peptides of the invention may be assayed to identify specific binding to a single receptor subtype and for reduced or no binding to different receptor subypes.
- the peptides of the invention relating to conopeptides and venom peptides in general have a particularly useful characteristic of high affinity for a particular macromolecular receptor, accompanied by a narrow receptor-subtype specificity.
- the pharmacological specificity of the conotoxins makes them attractive for drug development for a variety of therapeutic applications, including neurological and cardiovascular disorders.
- the peptides of the invention provide combinations of allowed substitutions that can have specificity to new receptor subtypes, different binding affinities and have different properties (e.g., different off rates).
- the invention provides nucleic acid sequences which encode a set of polypeptides.
- the skilled artisan may convert between nucleic acid and amino acid sequences, for example, a known nucleic acid sequence may be used to determine the presumed gene product or a known polypeptide sequence may be used to determine a nucleic acid that will encode the polypeptide.
- the nucleic acid sequences of the invention may be analyzed relative to encoded polypeptides and or designed to reflect the degeneracy of the genetic code.
- the invention provides nucleic acid sequences having a function independent of encoding a polypeptide.
- the nucleic acid sequences may encode a telomerase RNA molecule.
- telomerase RNA sequences (possibly including pseudogenes) would be aligned and the observed members at each position of the alignment identified.
- a set of nucleic acid sequence combinations is generated, wherein the set is the union of observed members at each position.
- the set of nucleic acid sequence combinations is then assayed for a desired function.
- the set of telomerase RNA sequences may be assayed for decreased telomerase activity, thereby identifying and generating a potential anti-cancer product.
- the invention provides nucleic acid sequences which encode proteins having a desired property and nucleic acids which themselves provide a desired property.
- conotoxin includes conantokin peptides, conantokin peptide derivatives, conotoxin peptides (including, contryphans, bromocontryphans, congesakins, conophysins, conopressins and conorfamides) and conotoxin peptide derivatives.
- Conotoxms are typically derived from the venom of Conus snails, and may include one or more amino acid substitutions, deletions and/or additons. These peptides may be referred to in the literature as conotoxins, conantokins or conopeptides.
- the conotoxin may be produced by methods, such as, in vitro translation, in vitro transcription and translation, recombinant expression systems, and chemical synthesis.
- substantially pure means a preparation which is at least 60% by weight (dry weight) the compound or set of compounds of interest, for example, a nucleic acid, polypeptide or set of polypeptides or nucleic acids.
- the preparation is at least 75%, more preferably at least 90%, and most preferably at least 99% by weight the compound of interest. Purity can be measured by any appropriate method (e.g., column chromatography, polyacrylamide gel electrophoresis, or HPLC analysis).
- an "isolated nucleic acid” 1 means a nucleic acid that is not immediately contiguous with both of the coding sequences with which it is immediately contiguous (one on the 5' end and one on the 3' end) in the naturally-occurring genome of the organism from which it is derived.
- the term includes a recombinant nucleic acid which is incorporated into a vector; into an autonomously replicating plasmid or virus; or into the genomic DNA of a prokaryote or eukaryote; or a recombinant nucleic acid which exists as a separate molecule (for example, a cDNA or a genomic DNA fragment produced by PCR or restriction endonuclease treatment) independent of other sequences. It also includes a recombinant nucleic acid which is part of a hybrid gene encoding additional polypeptide sequence.
- the nucleic acid sequences may be RNA or DNA.
- nucleic acid molecule As used herein "positioned for expression” means that the nucleic acid molecule is operably linked to a sequence which directs transcription and, where appropriate, translation of the nucleic acid molecule.
- Probes are molecules capable of interacting with a target nucleic acid, typically in a sequence specific manner, for example through hybridization. The hybridization of nucleic acids is well understood in the art. Typically a probe can be made from any combination of nucleotides or nucleotide derivatives or analogs available in the art.
- peptide As used herein “specifically binds” means a molecule which binds to a target, but which does not substantially recognize and bind other molecules in a sample (for example, a biological sample).
- peptide, polypeptide and protein (which, at times may be used interchangeably herein) include polymers of two or more amino acids (whether or not naturally occurring) linked via a peptide bond. No distinction, based on length, is intended between a peptide, a polypeptide or a protein.
- proteins comprising multiple polypeptide subunits (e.g., DNA polymerase III, RNA polymerase II) or other components (e.g., an RNA molecule, as occurs in telomerase) are included within the meaning of "protein” as used herein.
- proteins comprising multiple polypeptide subunits (e.g., DNA polymerase III, RNA polymerase II) or other components (e.g., an RNA molecule, as occurs in telomerase) are included within the meaning of "protein” as used herein.
- fragments of a protein and polypeptide are also within the scope of the invention and may be referred to herein as “peptide,” “polypeptide” or “protein.”
- a particular amino acid sequence of a given protein may be determined by the nucleotide sequence of the coding portion of a mRNA, which is in turn specified by genetic information, typically genomic DNA (including organelle DNA, e.g., mitochondrial or chloroplast DNA).
- genomic DNA including organelle DNA, e.g., mitochondrial or chloroplast DNA.
- a nucleic acid may be derived from the amino acid sequence of a peptide.
- Receptor means a molecule that has an affinity for a given ligand. Receptors may be naturally-occurring or manmade molecules. Also, they can be employed in their unaltered state or as aggregates with other species. Receptors may be attached, covalently or noncovalently, to a binding member, either directly or via a specific binding substance.
- receptors include, but are not limited to, antibodies (e.g., monoclonal antibodies, polyclonal antibodies and antisera reactive with specific antigenic determinants (such as on viruses, cells or other materials)), cell membrane receptors (for example, voltage- gated or ligand-gated receptors, such as nicotinic receptors, gamma-aminobutyric acid (GABA) receptors, glycine receptors, glutamate receptors, serotonin receptors, ⁇ -bungarotoxin receptors, muscarinic receptors, N-methyl-D-aspartate (NMDA) receptors, nicotinic acetylcholine (nACh) receptors), voltage-gated ion channels, sodium channels, calcium channels, potassium channels and the like.
- GABA gamma-aminobutyric acid
- NMDA N-methyl-D-aspartate
- nACh nicotinic acetylcholine
- nAChRs are assembled from five subunits arranged around a central cation-conducting pore.
- muscle only one subtype has been identified, which is composed of two ⁇ , one ⁇ , one ⁇ and one ⁇ subunit.
- eight neuronal nAChRs ⁇ subunits ( ⁇ 2 - ⁇ and ⁇ 9 - 10 ) and three ⁇ subunits ( ⁇ 2 - ⁇ ) have been identified in mammalian systems.
- Receptor subtypes may be differentially distributed throughout the central and peripheral nervous system. Different conopeptides are known to selectively target nAChRs. The conopeptides identified to date which specifically bind nAChRs are antagonists that fall into two classes: those that act at the ACh site and those that bind noncompetitively as pore blockes.
- ligands selected for binding to a receptor may act as either an antagonist, blocking an action, or an agonist, eliciting an action.
- a "Ligand Receptor Pair" or “Ligand binding pair” is formed when two macromolecules have combined through molecular recognition to form a complex.
- a ligand binding pair member is one of the two macromolecules forming the ligand binding pair.
- Sequence identity is typically measured using sequence analysis software (e.g., Sequence Analysis Software Package of the Genetics Computer Group, University of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, Wis. 53705, or BLAST software available from the National Library of Medicine).
- sequence analysis software e.g., Sequence Analysis Software Package of the Genetics Computer Group, University of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, Wis. 53705, or BLAST software available from the National Library of Medicine.
- useful software include, but are not limited to, the GCG (Genetics Computer Group, Madison Wis.) program package (Devereux, J., et al., 1984), BLASTP, BLASTN, FASTA (Altschul et al, 1990); Altschul et al, 1997), PLLE-UP and PRETTYBOX.
- the well-known Smith- Waterman algorithm may also be used to determine identity.
- Such software matches similar sequences by assigning degrees of homology to various substitutions, deletions, additions, and other modifications. While there exist a number of methods to measure identity between two polynucleotide or polypeptide sequences, the term "identity" is well known to skilled artisans. Preferred methods to determine identity are designed to give the largest match between the two sequences tested. Such methods are codified in computer programs. Conservative substitutions typically include substitutions within the following representative groups: glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid, asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, tyrosine. It is understood that other groups known in the art may also constitute conservative substitutions.
- homologous or “homologue” or “ortholog” or “paralog” refer to related sequences that share a common ancestor or arise from gene duplication and are determined based on degree of sequence identity.
- a related sequence may be a sequence having homology, which has arisen by convergent evolution. These terms describe the relationship between a gene found in one species, subspecies, variety, cultivar or strain and the corresponding or equivalent gene in another species, subspecies, variety, cultivar or strain or, in the case of paralogous genes, two related sequences within a species, subspecies, variety, cultivar or strain.
- homologous includes orthologs and paralogs.
- “Homologous sequences” are thought, believed, or known to be functionally related.
- a functional relationship may be indicated in a number of ways, including, but not limited to: (a) the degree of sequence identity; and/or (b) the same or similar biological function. Preferably, both (a) and (b) are indicated.
- the degree of sequence identity may vary, but is preferably at least 50% over the region defining the relationship (when using standard sequence alignment programs known in the art), preferably between about 60% to about 99%, more preferably between about 75% to about 99%, even more preferably between about 85% to about 99%.
- Homology can be determined using software programs readily available in the art, such as those discussed in Current Protocols in Molecular Biology (F. M.
- Phylogenetically related sequences are sequences, either nucleic acid or amino acid, which are homologous sequences. Phylogenetically related sequences may be defined based on a specific domain (e.g., kinase domains), signal sequences, structural motifs (e.g., the cysteine motifs of conopeptides), and/or homology in untranslated regions such as the 5' or 3' UTR. Phylogenetically related sequences may be related by any evolutionary distance, preferably the sequences are closely related and more prefereably are from the same genus. Preferably, phylogenetically related sequences are selected from a clade.
- Evolutionary distance or phylogenetic distance can be calculated using computer algorithms such as, PHYLIP (Felsenstein, J. 1989), PAUP (Swofford, D. L., 1993; Swofford, D. L., 1998), MEGA (Kumar et al, 1993), and the like. See WEN-HsiUNG Li, 1997.
- conopeptides may be aligned based on sequence conservation in the signal sequence, the 3TJTR, the cysteine architecture and optionally the pro-domain. Alignment of conopeptides using the nucleic acid sequence coding for the mature toxin, using information generated by silent base changes, may also be used to generate an alignment. Alternatively, an alignment may be generated from the amino acid sequence of the peptide. For example, alignment of the amino acid sequence of mature toxins may be used to generate an alignment.
- the invention utilizes the 3' UTR and signal sequence to generated phylogenetic relationships between conopeptides, where the cysteine scaffold serves to verify the alignment.
- one representative embodiment of the invention is the generation of phylogenetic relations between conopeptides.
- hybridization typically means a sequence driven interaction between at least two nucleic acid molecules in a nucleotide specific manner, such as a primer or a probe and a gene. Typically sequence driven interactions occur on the Watson-Crick face or Hoogsteen face of the nucleotide.
- the hybridization of two nucleic acids is affected by a number of conditions and parameters known to those of skill in the art. For example, the salt concentrations, pH, and temperature of the reaction all affect whether two nucleic acid molecules will hybridize. Parameters for selective hybridization between two nucleic acid molecules are well known to those of skill in the art. For example, in some embodiments selective hybridization conditions can be defined as stringent hybridization conditions.
- stringency of hybridization is controlled by both temperature and salt concentration of either or both of the hybridization and washing steps.
- the conditions of hybridization to achieve selective hybridization may involve hybridization in high ionic strength solution (6X SSC or 6X SSPE) at a temperature that is about 12-25°C below the Tm (the melting temperature at which half of the molecules dissociate from their hybridization partners) followed by washing at a combination of temperature and salt concentration chosen so that the washing temperature is about 5°C to 20°C below the Tm.
- the temperature and salt conditions are readily determined empirically in preliminary experiments in which samples of reference DNA immobilized on filters are hybridized to a labeled nucleic acid of interest and then washed under conditions of different stringencies.
- Hybridization temperatures are typically higher for DNA-RNA and RNA-RNA hybridizations.
- a preferable stringent hybridization condition for a DNA:DNA hybridization can be at about 68°C (in aqueous solution) in 6X SSC or 6X SSPE followed by washing at 68°C.
- Stringency of hybridization and washing can be reduced accordingly as the degree of complementarity desired is decreased, and further, depending upon the G-C or A-T richness of any area wherein variability is searched for.
- stringency of hybridization and washing if desired, can be increased accordingly as homology desired is increased, and further, depending upon the G-C or A-T richness of any area wherein high homology is desired, all as known in the art.
- a characteristic structural feature of conotoxins is a large number of posttranslational modifications, in particular disulfide bridges. The primary function of disulfide bonds appears to be stabilization of the structure. Conotoxins are grouped into families, based upon the number and arrangement of disulfides bonds.
- two-disulfide containing ⁇ -conotoxins contain the cysteine pattern, CC— C— C, with disulfides between 1 st and 3 rd , 2 nd and 4 th cysteines.
- Tliree-disulfide containing ⁇ - and ⁇ -conotoxins share the native cysteine pattern, C — C — CC — C — C, whereas ⁇ -conotoxins share the common cysteine pattern, CC— C— C— CC.
- the 1 st & 4 th , 2 nd & 5 th and 3 rd & 6 th cysteines are connected, for native ⁇ -conotoxins the 1 st & 4 th , 2 nd & 5 th and 3 rd & 6 th cysteines are connected by disulfide bonds.
- the correct pairing of disulfides in the native conotoxins has been viewed as a prerequisite for maintaining their biological activity. However, non-native disulfide bonds.
- the disulfide bridges are formed in a process of oxidative pairing of the cysteine residues.
- the conopeptides may be grouped according to the following superfamily and family structure: Superfamily Family Target Cysteine Structure
- Homologous sequences may have different lengths, which may be viewed as an insertion or deletion in one or the other sequence. Since an insertion in one sequence can always be seen as a deletion in the other, the term "indel" is frequently used to describe this situation. The result of an indel is that a position or a stretch of positions may be paired up with dashes (the gap-character) in the other sequence to signify such an insertion or deletion. Indels are assigned "gap penalties," which are known in the art and incorporated into computer programs used in determining homology. Phylogenetically related sequences may be subdivided based on any appropriate criteria, for example, phylogenetic distance, function, motif organization, or the like. Selection of the most appropriate phylogenetically related sequences is known by a person of skill in the art and determined by such persons.
- a person of skill in the art may select or set the criteria for grouping related sequences as appropriate for the situation.
- robust phylogenetic groupings for related sequences.
- sequences within a superfamily may be further divided into families and/or subfamilies and even further divided into evolutionarily closer clades.
- Sequences having a robust phylogenetic relationship for example, as expressed by relatively short evolutionary distances within the group, will likely perform the same function or affect the same target (for example, the same receptor subtype).
- nucleotide change refers to one or more nucleotide substitution, deletion, and/or insertion, as is well understood in the art.
- the proteins of the invention may be co-translationally, post-translationally or spontaneously modified.
- the peptides of the invention may be synthesized using modified amino acids or be modified subsequent to synthesis.
- proteins of the invention may be synthesized using modified or non-natural amino acids and derivatives. For example, a large number of non-natural or unusual amino acids are available from Chem-Impex International, Inc.
- Phenylglycine Phg
- Propanolol Propanolo
- the source of the polynucleotide from an organism or its ancestor can be any suitable source, for example, genomic sequences or cDNA sequences. Preferably, cDNA sequences are compared.
- the source of the polypeptide from the organism or its ancestor can be any suitable source, for example defined tissues or cells, intracellular or exctracellular material, or recombinant expression systems.
- Polypeptide sequences may be determined by direct sequencing of the polypeptide, for example by Edmond degradation or Mass spectrometry (MS), or by deriving the sequence from the nucleic acid encoding the polypeptide.
- Nucleic acid or polypeptide sequences can be obtained from available private, public and/or commercial databases. These databases serve as repositories of the molecular sequence data generated by ongoing research efforts.
- Nucleic acid or polypeptide sequences may be obtained from, for example, sequencing of cDNA reverse transcribed from mRNA expressed in cells, or after PCR amplification, according to methods well known in the art (using, for example, GeneAmp PCR System 9700 thermocyclers (Applied Biosystems, Inc.)). Alternatively, genomic sequences may be used for sequence comparison.
- the cDNA is prepared from mRNA obtained from a specific tissue, a tissue at a determined developmental stage or a tissue obtained after the organism has been subjected to certain conditions.
- cDNA libraries used for the sequence comparison of the present invention can be constructed using conventional cDNA library construction techniques that are explained fully in the literature of the art. Total mRNAs may be used as templates to reverse-transcribe cDNAs. Transcribed cDNAs may be subcloned into appropriate vectors to establish a cDNA library. The established cDNA library can be maximized for full-length cDNA contents, although less than full-length cDNAs may be used.
- the sequence frequency can be normalized according to, for example, Bonaldo et al, 1996.
- cDNA clones randomly selected from the constructed cDNA library can be sequenced using standard automated sequencing teclmiques. Preferably, full-length cDNA clones are used for sequencing.
- cDNA clones to be sequenced can be pre-selected according to their expression specificity.
- the cDNAs can be subject to subtraction hybridization using mRNAs obtained from other organs, tissues or cells of the same animal. Under certain hybridization conditions, with appropriate stringency and concentration, those cDNAs that hybridize with non-tissue specific mRNAs, and thus likely represent "housekeeping" genes, will be excluded from the cDNA pool. Accordingly, remaining cDNAs to be sequenced are more likely to be associated with tissue-specific functions.
- non-tissue-specific mRNAs can be obtained from one organ, or preferably from a combination of different organs and cells. The amount of non-tissue-specific mRNAs are maximized to saturate the tissue-specific cDNAs.
- sequences can be pre-selected by using PCR primers which are specific to the desired class of sequences.
- primers may be made from one or more organism's sequences using standard methods in the art, including publicly available primer design programs such as PRL ER.RTM. (Whitehead Institute).
- the amplified sequence may then be sequenced using standard methods and equipment in the art, such as automated sequencers (Applied Biosystems, Inc.).
- information from online databases can be used to select or give priority to cDNAs that are more likely to be associated with specific functions.
- the cDNA candidates for sequencing can be selected by PCR using primers designed from representative candidate cDNA sequences.
- Representative candidate cDNA sequences are, for example, those that are only found in a specific tissue, such as venum duct, or that correspond to genes likely to be important in the specific function.
- tissue-specific cDNA sequences may be obtained by searching online sequence databases in which information with respect to the expression profile and/or biological activity for cDNA sequences may be specified.
- the peptides of the invention may be synthesized by a suitable method, such as by exclusively solid-phase techniques, by partial solid-phase techniques, by fragment condensation or by classical solution couplings.
- a suitable method such as by exclusively solid-phase techniques, by partial solid-phase techniques, by fragment condensation or by classical solution couplings.
- the employment of recombinant DNA techniques may be used to prepare these peptides, particularly longer ones.
- the peptide chain can be prepared by a series of coupling reactions in which the constituent amino acids are added to the growing peptide chain in the desired sequence.
- various N-protecting groups e.g., dicyclohexylcarbodiimide or carbonyldimidazole
- various active esters e.g., esters of N-hydroxyphthalimide or N-hydroxy-succinimide
- various cleavage reagents e.g., to carry out reaction in solution, with subsequent isolation and purification of intermediates, is well known classical peptide methodology.
- a side chain amino protecting group As far as the selection of a side chain amino protecting group is concerned, generally one is chosen which is not removed during deprotection of the ⁇ -amino groups during the synthesis. However, for some amino acids (e.g., His) protection is not generally necessary.
- the protecting group preferably retains its protecting properties and is not split off under coupling conditions
- the protecting group should be stable under the reaction conditions selected for removing the ⁇ -amino protecting group at each step of the synthesis
- the side chain protecting group must be removable, upon the completion of the synthesis containing the desired amino acid sequence, under reaction conditions that will not undesirably alter the peptide chain.
- the C-terminal amino acid, protected by Boc and by a side-chain protecting group, if appropriate, can be first coupled to a chloromethylated resin according to procedures known in the art (See, Hlavacek and Ragnarsson, 2001). For example, using KF in DMF at about 60° C. for 24 hours with stirring, when a peptide having free acid at the C-terminus is to be synthesized.
- the ⁇ -amino protecting group is removed, as by using trifluoroacetic acid (TFA) in methylene chloride or TFA alone. The deprotection is carried out at a temperature between about 0° C and room temperature.
- TFA trifluoroacetic acid
- the deprotection is carried out at a temperature between about 0° C and room temperature.
- Other standard cleaving reagents, such as HC1 in dioxane, and conditions for removal of specific ⁇ -amino protecting groups may be used as described in SCHRODER & LuBKE,
- Cyclization of the linear peptide is preferably effected, as opposed to cyclizing the peptide while a part of the peptidoresin, to create bonds between Cys residues.
- the fully protected peptide can be cleaved from a hydroxymethylated resin or a chloromethylated resin support by ammonolysis, as is well known in the art, to yield the fully protected amide intermediate, which is thereafter suitably cyclized and deprotected.
- deprotection as well as cleavage of the peptide from the above resins or a benzhydrylamine (BHA) resin or a methyl-benzhydrylamine (MBHA) can take place at 0° C with hydrofluoric acid (HF), followed by air-oxidation under high dilution conditions.
- HF hydrofluoric acid
- a method for making disulfide containing peptides includes oxidizing the linear peptide and then fractionating the resulting product, using reverse-phase high performance liquid chromatography (HPLC) or the like, to separate peptides having different disulfide linkage configurations. By comparing these fractions with the elution of the native material or by using a simple assay, the particular fraction having the correct linkage for maximum biological potency may be determined.
- HPLC reverse-phase high performance liquid chromatography
- venom duct niRNA may be prepared from specimens from any species (e.g., Conus arenatus, Conus pennaceus, Conus tessulatus, and Conus ventricosus) (Conticello et al. 2001).
- cDNAs may be prepared by oligo dT (with or without a restriction site) priming and/or ligated to adaptors, and cloned into an appropriate vector. Clones from the library may then be sequenced, for example, by the dye terminator method on ABI 373 or ABI 377 automated sequencers.
- Sequences may be edited to discard vector and adaptor regions using computer programs, such as, Sequencher 3.0 (GeneCodes Corp., Ann Arbor, Mich.). Contigs (the assembly of individual sequences into a contiguous sequence) may be assembled manually or automatically, with or without subsequent manual edition. Individual transcripts are typically aligned using computer programs, such as, CLUSTAL X (Thompson et al. 1997), and the alignments may be refined manually. Phylogenetic trees may be constructed using the neighbor-joining method (Saitou and Nei, 1987) and visualized with computer programs, such as, TreeView (Page, 1996). Synonymous versus nonsynonymous substitution rates may be analyzed using MEGA (Kumar, et al. 1993).
- a one-tailed t-test with infinite degrees of freedom may be used. Tip tests (Templeton, 1996) are performed on the basis of alignments specific to the analyzed region (for example, signal+propeptide, mature domain) in order to reduce the complexity of the cladogram (for example, clades in signal+propeptide-based trees are different from the clades in mature-peptide based trees).
- a Fisher 2 x 2 contingency test for example, as suggested by Castelloe and Templeton, 1994, is performed on silent versus replacement substitutions in external and internal branches of the gene tree.
- RT-PCR is performed using primers that anneal to conserved elements in the 5' and 3' untranslated regions (UTRs) of each conopeptide family.
- Conditions for RT-PCR are determined by a person of skill in the art and may include: 50 °C for 40 min and 94 °C for 2 min, followed by 25 amplification cycles of 94 °C for 30 s, 55- 60 °C for 30 s, and 68 °C for 1 min.
- the resulting PCR fragments may be ligated directly into a T-overhang vector, and clones from each reaction sequenced.
- the ratio of non-synonymous substitutions to synonymous substitutions may be carried out by the methods of Li et al., although other analysis programs that can detect positively selected genes between species can also be used. Li et al, 1985; Li, 1993; Messier and Stewart, 1997; Nei, 1987.
- the K A /K s method which comprises a comparison of the rate of non-synonymous substitutions (K A ) per non-synonymous site with the rate of synonymous substitutions (Ks) per synonymous site between homologous protein-coding regions of genes in terms of a ratio, is used to identify sequence substitutions that may be driven by adaptive selections as opposed to neutral selections during evolution.
- a synonymous (“silent") substitution is one that, owing to the degeneracy of the genetic code, makes no change to the amino acid sequence encoded.
- a non-synonymous substitution results in an amino acid replacement.
- KA and Ks The extent of each type of change can be estimated as KA and Ks, respectively, the numbers of synonymous substitutions per synonymous site and non-synonymous substitutions per non-synonymous site.
- Calculations of K A /Ks may be performed manually or by using software.
- An example of a suitable program is MEGA (Molecular Genetics Institute, Pennsylvania State University).
- either complete or partial protein-coding sequences are used to calculate total numbers of synonymous and non-synonymous substitutions, as well as non-synonymous and synonymous sites.
- the length of the polynucleotide sequence analyzed can be any appropriate length.
- the entire coding sequence is compared, in order to determine any and all significant changes. Where appropriate and desirable, the comparison may be restricted to specific functional domains or the like.
- Publicly available computer programs, such as Li93 (Li (1993)) or INA, can be used to calculate the KA and Ks values for all pairwise comparisons.
- This analysis can be further adapted to examine sequences in a "sliding window” fashion such that small numbers of important changes are not masked by the whole sequence.
- “Sliding window” refers to examination of consecutive, overlapping subsections of the gene (the subsections can be of any length).
- K A /K S has been shown to be a reflection of the degree to which adaptive evolution has been at work in the sequence under study.
- Nucleic acid or polypeptide sequences are compared to identify homologous sequences. Any appropriate mechanism for completing this > comparison is contemplated by this invention. Alignment may be perfonned manually or by software (examples of suitable alignment programs are known in the art). Nucleic acid or polypeptide sequences may be selected for comparison via database searches (e.g., BLAST searches). The high scoring "hits," i.e., sequences that show a significant similarity after BLAST analysis, will be retrieved and analyzed. Sequences showing a significant similarity can be those having at least about 60% to about 99% sequence identity over comparable regions. Preferably, sequences showing greater than about 80% identity are further analyzed. The homologous sequences identified via database searching can be aligned in their entirety using sequence alignment methods and programs that are known and available in the art, such as the commonly used simple alignment program CLUSTAL V by Higgins et al. 1992.
- sequencing and homology comparison of nucleic acid or polypeptide sequences may be performed simultaneously by sequencing chip technology. See, for example, U.S. Pat. 5,545,531.
- the aligned nucleic acid or polypeptide sequences are analyzed to identify the nucleotide(s) or amino acid(s) observed at each position of the alignment. Again, any suitable method for achieving this analysis is contemplated by this invention.
- the detected sequence differences are generally checked for accuracy.
- the initial checking comprises performing one or more of the following steps, any and all of which are known in the art: (a) finding the points where there , are changes between two sequences; (b) checking the sequence fluorogram (chromatogram) or data source to determine if the bases or amino acids that appear unique to the sequence in question correspond to strong, clear signals specific for the called base or amino acid; and /or (c) checking additional sequences to see if there is more than one sequence that corresponds to a sequence change.
- Multiple sequence entries for the same gene or peptide that have the same nucleotide or amino acid at a position where there is a different nucleotide or amino acid in a reference sequence provides independent support that the sequence in question is accurate, and that the change is significant.
- nucleotide change encompasses at least one nucleotide change, a substitution, a deletion or an insertion, in a protein-coding polynucleotide sequence as compared to a corresponding sequence.
- Newly identified significant changes within a nucleotide or polypeptide sequence, particularly in sequences subject to a high degree of selection pressure, may suggest a potential association with unique, enhanced or altered functional capabilities.
- Nucleic acids encoding the peptides of the invention may be fused to reporter constructs, such as any of the Two-hybrid reporter systems or a display system, such as phage display.
- the nucleic acids may be fused to signal sequences or the like.
- the nucleic acids may also be inserted into a vector, including an expression vector, and, where appropriate, introduced into a prokaryotic or eukaryotic cell.
- nucleic acids which are included within the scope of the invention include, antisense molecules, aptamers, probes, ribozymes, triplex forming molecules, and external guide sequences.
- Aptamers are molecules that interact with a target molecule, preferably in a specific way (for a review see Gold et al., 1995, Annu. Rev. Biochem., 64, 763; and Szostak & Ellington, 1993, in The RNA World, ed. Gesteland and Atkins, pp 511, CSH Laboratory Press).
- aptamers are small nucleic acids ranging from 15-50 bases in length that fold into defined secondary and tertiary structures, such as stem-loops or G-quartets.
- Aptamers can bind small molecules, such as ATP (United States patent 5,631,146) and theophiline (United States patent 5,580,737), as well as large molecules, such as reverse transcriptase (United States patent 5,786,462) and thrombin (United States patent 5,543,293). Aptamers can bind very tightly with k s from the target molecule of less than 10 "12 M. It is preferred that the aptamers bind the target molecule with a k less than 10 "6 . A representative example of how to make and use aptamers can be found in United States Patent 6,458,559.
- Peptide libraries constructed by the method of the invention may be screened by iterative library analysis and resynthesis.
- the peptides of the library are pooled and screened for the desired activity, for example, for binding activity to a specific receptor.
- the library is resynfhesized as one or more pools of peptides having one variable amino acid held constant.
- the one or more pools of peptides are iteratively rescreened for the identified or desired activity. Screening or selecting may be performed by methods known in the art, such as phage-display, selectively infective phage, polysome technology to screen for binding, assay systems for enzymatic activity or protein stability.
- Polypeptides having the desired property can be identified by sequencing of the corresponding nucleic acid sequence or by amino acid sequencing.
- the peptide libraries may also be screened for binding to one or more substance, with unbound peptide being removed (e.g., by washing the unbound peptides clear) and the bound peptides then eluted. The eluted peptides may then be rescreened using the same or more stringent conditions. This process may be repeated to achieve the desired binding specificity or activity.
- the peptide(s) eluted from the final round of selection may be sequenced, for example, using ionizing mass spectrometry or other methods known in the art.
- a set of sequences may also be constructed using the methods disclosed herein, coupled with a location identification approach (e.g., an array or chip approach).
- a location identification approach e.g., an array or chip approach.
- the peptides or nucleic acids of the library are synthesized and individual species are attached by known methods, such as covalently to microcarrier beads, as described in U.S. Patent 5,143,854.
- the peptides or nucleic acids may be attached to multiwell supports or other known structures and physical supports.
- This method may be further modified by the method of Jayawickreme et al, 1994. The method of Jayawickreme et al.
- the peptides and nucleic acids of the invention may be linked to fluorescent compounds, such as fluorescein, Rhodamine, Texas Red, UV-light excitable dyes, quenching moieties and combinations thereof. Linking of fluorescent componds may be acomplished by methods known in the art, such as through use of amine-reactive probes.
- the peptides and nucleic acids may be labeled by other means known in the art, such as with radioactive isotopes, biotin, haptens, an antibody or fragment thereof, nonfluorescent dyes, enzymes (e.g., peroxidase and topoisomerase), peptides or chemicals.
- Additional amino acids and/or nucleotides may be added to the library at either end (the carboxy terminus, 5' end, amino terminus or 3' end) to facilitate attachment of a label, h addition the labels may be attached via a linker, such as carbon chains of between about 2 to 50 carbon atoms or the like.
- the invention also provides for a kit, including one or more of the following: nucleic acid(s) of the invention; peptide(s) of the invention; recombinant vectors; suitable host cell(s), which may or may not contain nucleic acids of the invention, or express peptides of the invention; antibodies; receptors; computer programs; and methods for producing the peptides or nucleic acids of the invention.
- kit including one or more of the following: nucleic acid(s) of the invention; peptide(s) of the invention; recombinant vectors; suitable host cell(s), which may or may not contain nucleic acids of the invention, or express peptides of the invention; antibodies; receptors; computer programs; and methods for producing the peptides or nucleic acids of the invention.
- cDNA Library Construction A cDNA library is constructed using appropriate cells, tissues or organisms.
- the cells or organisms may be temporally (e.g., in G2 of the cell cycle) or developmentally (e.g., insects in the third larval stage, organisms undergoing gastrulation, or the like) staged and harvested.
- a person of ordinary skill in the art may select the appropriate cells, tissue or organism to analyze according to the trait of interest.
- venom producing cells are known to produce conopeptides useful for the treatment of disease.
- isolation of venom producing cells enriches for cells containing and expressing conopeptides.
- RNA is extracted from Conus venom duct cells (RNeasy kit, Quiagen; RNAse-free Rapid Total RNA kit, 5 Prime ⁇ 3 Prime, Inc.) and the integrity and purity of the RNA is determined according to conventional molecular cloning methods.
- Poly A+ RNA is isolated (Mini-Oligo(dT) Cellulose Spin Columns, 5 Prime—3 Prime, Inc.) and is used as a template for the reverse-transcription of cDNA with oligo (dT) as a primer.
- the synthesized cDNA is treated and modified for cloning using commercially available kits. Recombinants are then packaged and propagated in a host cell line. Portions of the packaging mixes are amplified and the remainder retained prior to amplification.
- the library can be normalized and the numbers of independent recombinants in the library determined. EXAMPLE II Sequence Comparison
- Suitable primers based on a candidate organism gene are prepared and used for PCR amplification of cDNA either from a cDNA library in a host cell line or from cDNA prepared directly from mRNA. Selected cDNA clones from the cDNA library are sequenced using an automated sequencer, such as an ABI 377. Primers, such as the Ml 3 Universal and Reverse primers may be used to carry out the sequencing. Alternatively, the primers may be designed to amieal to one or more sites found in the cDNA. For inserts that are not completely sequenced by end sequencing, dye-labeled terminators or custom primers can be used to fill in remaining gaps. The sequences can also be examined by direct sequencing of the encoded protein.
- DNA coding for conopeptides was isolated and cloned using conventional techniques and procedures known in the art, such as described in Olivera et al, 1996.
- primers may be based on the DNA sequence of known conopeptides.
- DNA from single clones was amplified by conventional techniques using primers which correspond approximately to the Ml 3 universal priming site and the Ml 3 reverse universal priming site. Clones having a size of approximately 300-500 nucleotides were sequenced and screened for similarity to sequences of known conopeptides.
- Example III shows the alignment of phylogenetically related conopeptides. As illustrated in Table 1, positions 3-5, 7-9, 13-15 and 17 are conserved, whereas positions 1-2, 6, 10-12, 16 and 18 exhibit different allowed substitutions (shown in bold).
- variable positions contain either 2 or 3 allowed substitutions.
- the sequences of the present example represent toxins used by snails of the predatory Conus genus to, among other things, immobilize their prey.
- the sequences are subject to strong natural selection.
- the peptides must bind to the receptor of the various prey species with high affinity. Therefore, the allowed substitutions may reflect differences in the target receptors in the various prey. Alternatively, the allowed substitutions may reflect other selective advantages, such as a slow off rate or the like.
- the invention provides for the identification of advantageous sequences not present in the original population without having to sample all possible combinations, including detrimental or non-allowed substitutions.
- using a library of all allowed amino acids provides a library with the greatest possible range of useful peptides without introducing random, and possibly deleterious, amino acids.
- Xaa 8 is Gly, Arg or Asp is prepared.
- the phylogenetic relation of sequences shown in FIG. 1 was used to select phylogenetically related sequences.
- the selection of the phylogenetically related sequences may include any number of sequences desirable, based on the purpose of the desired library, h this example, the phylogenetically related sequences were selected based on their close phylogenetic relationship and as members of the ⁇ -conotoxin family, which interact with nicotinic acetylcholine receptors.
- Six initial sequences (FIG. 2) were selected. As shown in table 1, eight amino acid positions contained variable amino acids. Each of these variations has the potential to add specificity or increased activity.
- Positions 3 to 5 are conserved, therefore, these three positions have one observed member and are fixed as Cys-Cys-Ser. Likewise, positions 7 to 9, 13 to 15, and 17 are conserved and non- variable or fixed amino acids. Positions 1-2, 6,
- position 6 exhibits two allowed amino acids, His (H) and Tyr (Y).
- the invention includes a library representing a combination of all allowed substitutions.
- Position 1 is composed of one of the three allowed amino acids, which are combined with the three allowed amino acids of position 2, the two allowed amino acids observed at position 6, the two allowed amino acids observed at position 10, the two allowed amino acids observed at position 11, the three allowed amino acids observed at position 12, the two allowed amino acids observed at position 16, and the three allowed amino acids observed at position 18.
- the set of sequences representing the union of observed amino acids at each position results in 1296 (3 X 3 X 2 X 2 X 2 X 3 X 2 X 3) possible combinations of amino acids.
- a set of 1296 sequences is generated, having the sequence of SEQ ID NO:7.
- the library of the present Example may be treated so as to form disulfide bonds (see U.S. Application No. 10/377,332, filed 2/28/03).
- the peptides may be treated with a protein disulfide isomerase to form the disulfide bonds.
- a protein disulfide isomerase for example, bovine protein disulfide isomerase (PDI) is added to the set of peptides in 0.1M Tris/HCl, pH 7.5, containing lmM EDTA, O.lmM GSSG and 2 ⁇ M PDI, at 0° C.
- PDI bovine protein disulfide isomerase
- oxidative folding reactions of may be performed in 0.1M Tris/HCl, pH 8.7, containing 1 mM EDTA, 0.5 mM GSSG and 5 mM GSH, at 22° C. After an appropriate time the reaction is quenched by adding formic acid to the final concentration of 8%.
- the library containing the 1296 representative sequences allows for the selection of evolutionarily non-represented species selected from a combination of all possible allowed amino acids for any position.
- the function of a set of sequences may be determined by assaying the set (for example, the set of 1296 sequences disclosed in Example III).
- a receptor assay is created using an adaptation of the method described in Mclntosh, 2000. Physiological, morphological and/or biochemical examination of the receptor will permit association of the library, and subsequently, each representative sequence, with a particular phenotype.
- Conotoxin Library Binding The conotoxin library of Example III is iodinated by the methods described in Cruz and Olivera, 1986. The binding protocol is a modification of that described in Hillyard et al, 1992.
- Nonspecific binding is measured by preincubating the membrane preparation with 1 mM unlabeled conotoxin library for 30 min on ice before the addition of [ 125 I] conotoxin library.
- the library of Example III is assessed for activity by preincubating the library for 30 min on ice.
- the final assay mix is then incubated at room temperature for 30 min and diluted with 1.5 ml of wash buffer containing 160 mM NaCl, 1.5 mM CaCl 2 , 2 mg/ml bovine serum albumin, 5 mM HEPES/Tris (pH 7.4).
- Membrane is collected on glass fiber filters (Whatman GF/C soaked in 0.1% polyethyleneamine), for example, using a Brandell apparatus model M-24, and washed with 1.5 ml of wash buffer four times. The amount of radioactivity in the filters is then measured.
- the labeled library is screened against one or more targets.
- the library is screened for ⁇ 4 ⁇ 2 Nicotinic Acetylcholine Receptor (nAChR) binding.
- nAChR Nicotinic Acetylcholine Receptor
- the procedure of Pabreza et al, 1991 is used.
- HJCytisine (15-40 Ci/mmol) is obtained from a commercial supplier such as PerkinElmer Life Sciences.
- Rat forebrain membrane is incubated for 75 min at 4 °C in 50 mM Tris-HCl (pH 7.0 at room temperature) containing 120 mM NaCl, 5 mM KC1, 1 mM MgCl 2 , and 2.5 mM CaCl 2 .
- Nonspecific binding is defined with 10 mM nicotine.
- Rat forebrain membranes are incubated with 0.3 nM [ 3 H]prazosin (70-87 Ci/mmol). Reactions are carried out in 50 mM Tris-HCl (pH 7.7) at 25 °C for about 60 min. Prazosin (1.0 mM) is used to define nonspecific binding (19, 20).
- Rat cortical membranes are incubated with 1.0 nM [ 3 H]RX821002 (40-67 Ci/mmol). Reactions are carried out in 50 mM Tris-HCl (pH 7.4) at 25 °C for 75 min. RX821002 (0.1 mM) is used to define nonspecific binding (20, 21).
- the library is screened for Adrenergic bl binding.
- Rat cortical membranes are incubated with 0.2 nM (2)[ 125 I]iodopindolol (2200 Ci/mmol) and 120 nM ICI-118,551 (to block adrenergic b2 receptors). Reactions are carried out in 50 mM Tris-HCl (pH 7.5) containing 150 mM NaCl, 2.5 mM MgCl 2 , and 0.5 mM ascorbate at 37 °C for about 60 min.
- Alprenolol HCl (10 mM) is used to define nonspecific binding (22, 23).
- Each representative of the library has the potential to bind to a target.
- one or more target for one or more members of the library is identified.
- the assay identifies ligand binding members. Because the library contains combinations of the allowed substitutions at two or more non-conserved positions in the phylogenetically related sequences the library is particularly suited to identify one or more functional target. Further, the library is particularly well suited to identify binding or increased binding to a subfamily of receptors.
- the library member with the most desired binding property is identified by binding assays performed under stringent conditions.
- the individual library member (ligand binding member) may be identified by iterative screening or by elution of the bound polypeptides, where possible, followed by peptide sequencing or mass spectrometry analysis.
- One method of screening or selecting an amino acid or nucleic acid sequence is to introduce a nucleic acid or set of nucleic acids, which may include one or more degenerate sites, into a host cell.
- the host cell may then be screened or selected to identify one or more nucleic acid or amino acid sequences having desired properties.
- amino acid sequences may be identified by phage display or use of an expression vector.
- a library such as a library encoding the polypeptides of Example III
- synthetic genes encoding the representative polypeptides of the library are cloned into a filamentous phage vector, such as fdSN.
- the library polypeptides may be tethered at either their C termini or N termini to a carrier protein (e.g., the gene 3 protein of phage).
- Cultures containing fdSN and fusion phage constructs are tested to determine that the fusion of the library sequences to the carrier protein had no significant detrimental effect on phage infectivity or packaging.
- the amino acid sequence is backtranslated into a corresponding coding nucleic acid sequence individually or as a degenerate sequence.
- the peptides of Example III may be generated using the following degenerate nucleic acid: 5'-NNA NNA UGC UGU UCC NAC CCC GCC UGC NNC NUC NNC CAC CCC GAG NUA UGC NNN -3' (SEQ LO NO:8).
- a person of skill in the art will recognize that a number of other sequences may also be used, where the appropriate choice may be influenced by such factors as possible secondary structure in the RNA, melting temperature, codon preference in the host cell, and other factors known in the art.
- Expression of the library on phage is determined by comparison of the Western blot band size of the carrier protein from helper phage, which contains the wild-type protein, and fd-library phage.
- the binding activity of fd-library phage to a target such as the ⁇ 4 ⁇ 2 Nicotinic Acetylcholine Receptor is tested. Because reducing the disulphide bonds of the library may abolish binding, the requirement for the disulphide scaffold for the functional activity of the library is tested. Phage incubated with 1% ⁇ -mercaptoethanol, or similar agents, to reduce the disulfide bonds of the fd-library, are tested for a reduction in binding affinity compared to non-treated phage.
- the phage may be treated so as to increase disulfide bond formation.
- phage may be treated with a protein disulfide isomerase or a protein disulfide isomerase may be expressed in the phage host cell.
- the phage library is tested for receptor binding. Positive clones are selected and re-assayed for binding, with positive clones isolate at each successive round being re-screened. Individual colonies, for example, from round 4, are assayed for receptor binding. The DNA from these clones is then sequenced.
- An identified clone is tested to determine whether the selected sequence retains function in the absence of phage.
- the identified sequence is isolated and expressed in Escherichia coli or synthesized.
- the protein sequence is isolated and retested for receptor binding.
- the phage library may be constructed with a cleavage site between the carrier protein and the peptide.
- one or more proteolytic cleavage site may be encoded by the nucleic acid (e.g., Endoproteinase Pro-Pro-Y-Pro, Factor X or Thrombin (available from Invitrogen).
- the peptides may be cleavable from the carrier protein by chemical cleavage.
- EXAMPLE VI Identification of Targets The present invention is used to identify targets, including insect family specific targets. Insect-specific neurotoxins isolated from Australian funnel-web spiders have been reported. The w-ACTX-1 family of peptides (U.S. Patent
- VGCCs voltage-gated calcium channels
- the molecular target of the w-ACTX-1 is determined
- the w-ACTX-1 family of peptides are aligned and the representative sequences of the library are determined, for example, as shown in Tables 2 and 2.1.
- a polypeptide library as previously described is constructed.
- the peptides of the library are then arrayed individually on a solid support.
- the array is then screened with receptor preparations, including N- and L- Voltage-gated calcium channels, and/or voltage-gated sodium chaimels and/or GABA receptors isolated from Heliothis armigera. Receptor binding is identified.
- receptor preparations including N- and L- Voltage-gated calcium channels, and/or voltage-gated sodium chaimels and/or GABA receptors isolated from Heliothis armigera. Receptor binding is identified.
- Table 2
- the phylogenetically related sequences may be selected based on evolutionary distance. For example, Accession Number P81803 may be removed from the selected phylogenetically related sequences. In this case, Tables 3 and 3.1 illustrate the sequences which represent the combination of allowed amino acid substitutions. When Accession Number P81803 is removed from the phylogenetically related sequences, the set of sequence combinations having the union of observed amino acids at each position decreases from 2,304 to 24 sequences. Thus, the selection of the phylogenetically related sequences influences the complexity of the library. Table 3
- the library array of w-ACTX-1 peptides previously described is screened for specific binding to an Anopheles stefensi receptor.
- the library array is tested for specific binding to a M. longisetus receptor.
- One or more peptides showing specific binding to a. A. stefensi receptor and demonstrating reduced binding, or no binding, to a M. longisetus receptor are selected.
- Peptides isolated according to this method may be used in mosquito control without a substantial adverse effect on natural predators such as M. longisetus.
- Using a library prepared according to the invention facilitates identification of the target receptor, as each sequence represents one or more amino acid possibilities, which have been subject to natural selection for binding to receptors in various insects.
- each sequence represents one or more amino acid possibilities, which have been subject to natural selection for binding to receptors in various insects.
- mosquito larvae may not express a receptor for which the aligned peptides have been selected to bind, all possible allowed substitutions are represented in the library, increasing the likelihood of identifying the desired binding.
- Table 4 illustrates the alignment of phylogenetically related sequences from the ⁇ -conotoxin family.
- Conotoxins Cn3.4 and Im3.1 represent novel members of the ⁇ -conotoxin family, the function and attributes of which are described in U.S. Patent Application 09/910,009, filed July 23, 2001.
- Table 4.1 illustrates the observed amino acids at each position (e.g., conserved amino acids and allowed substitutions). This Example illustrates an alignment having a deletion or insertion (indel) at a non-conserved position, wherein the indel is an allowed substitution.
- Position 1 in Table 4 and 4.1 can be viewed as an insertion in A3.3, Nb3.2, A3.5, Sm3.1 and Cn3.4 or a deletion in Im.3.1.
- an indel which is used herein to describe the absence of an amino acid or nucleic acid, and a pyroglutamic acid (Z) are observed at position 1.
- the deleted position is treated individually.
- a first subset of sequences is generated wherein the first position is considered to be absent from all sequences (a non-position).
- a second subset is generated wherein the first position is pyroglutamic acid and the subset has 16 peptide sequences of 23 amino acids.
- the two subsets are then combined to generate the final set.
- Table 5 illustrates the observed amino acids at each position (e.g., conserved and non-conserved amino acids) of phylogenetically related conotoxin sequences.
- Positions 8 and 14 illustrated in Table 5 may be considered to have conservative substitutions. Generating a library having a combination of all allowed substitutions, without considering conservative substitutions, would require the production of 864 peptides, as shown in SEQ ID NO:24. As the number of peptides necessary to produce a library increases, practical limitations (particularly for in vitro protein synthesis) become increasingly important. Therefore, where the number of peptides is prohibitively high or other considerations warrant a reduction in the complexity of the library, conservative substitutions may be used to achieve a further reduction in the complexity. conserveed substitutions, such as hydrophobic side groups may be treated as being equivalent.
- the PAM 250 matrix above has been arranged so that similar amino acids are close to each other. As illustrated in the above Matrix, Ala and Ser or Leu and
- Ser may be removed from the determination of allowed substitutions by assuming that Ala and Ser are equivalent, as in SEQ LD NO:27.
- KDSD S DRM A is assumed to be equivalent to S; L is assumed PG S F NNY to be equivalent to M . E W
- the number of peptides required in SEQ LD NO:24 is probably not sufficiently large to pose a problem, it serves as an example of how the number of representative peptides may be reduced.
- the ultimate number of peptides required is reduced. For example, utilizing the conservative amino acids observed at positions 2 and 7, the number of peptides required is reduced from 864 to 288.
- Table 6 illustrates an alignment of phylogenetically related sequences. The observed amino acids at each position are shown in Table 6.1. This Example illustrates an alignment having multiple indels and conservative substitutions.
- the indel position is treated individually for the purposes of chemical synthesis of the library.
- the peptide subsets are then combined to generate the final set or library, as illustrated in Table 6.2.
- Subset #1 22 3 3 2 3 2 2 864 peptides
- Subsets 1 to 4 shown in Table 6.2, are combined to produce a final set or library having 2,592 unique peptides.
- conservative substitutions can be taken into account to reduce the number of peptides in the library.
- Table 6.3 illustrates the positions having conservative substitutions (the representative amino acid is shown in ourtlin ⁇ text) and the effect of treating conservative substitutions as equivalent amino acids.
- Subset #1A 3 3 2 3 2 2 216 peptides
- Subset #3A 3 3 2 3 2 2 216 peptides
- Subset #4A 3 3 2 3 2 108 peptides
- Subsets 1 A to 4A shown in Table 6.3, are combined to produce a final set or library having 648 peptides. Treatment of conservative substitutions as functionally equivalent reduces the number of required peptides from 2,592 to 648.
- Sequences are entered into a computer program, for example, Mn3.1, Ac3.3, , A3.1, M3.7, M3.3, Cn3.1, A3.3, Nb3.2, A3.5, Sm3.1 and Cn3.4, which compares the relationship of the sequences and, preferably, generates a visual output, such as a phylogenetic tree.
- the operator may determine the desired phylogenetically related sequences and input the sequence identifications to the computer.
- the desired degree of phylogenetic relation may be set at the onset of analysis and the output of selected phylogenetically related sequences may be automatically routed for further analysis of allowed substitutions and optionally conservative substitutions.
- the computer program subsequently identifies the observed members at each position as described herein.
- the sequences required for generation of a set of sequences composed of the union of observed members at each position are then output to the end user.
- the invention may be implemented in computer programs executing on computers, having at least one processor, at least one data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
- Program code is applied to input data to perform the functions described herein and generate output information.
- the output information is applied to one or more output devices, in known fashion.
- the program may be implemented in a high level procedural or object oriented programming language, so as to communicate with a computer system.
- the program may also be implemented in assembly or machine language, if desired.
- the language may be any language capable of being compiled and/or interpreted by a computer.
- a computer program may be stored on any storage media or device (e.g.,
- ROM or magnetic diskette readable by a general or special purpose computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein.
- the invention may also be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
- the output to the end user may, where appropriate and desirable, be used as the input for an automated peptide or nucleic acid synthesizer.
- the coupling reactions can be performed automatically, as on a Beckman 990 automatic synthesizer, using a program such as that reported in Rivier et al. 1978.
- the input may also, where appropriate and desirable, constitute the output of an automated sequencer.
- Table 7 shows the alignment of phylogenetically related nucleic acid sequences. The sequences shown are reported TATA-box sites in Saccharornyces cerevisiae. As illustrated in Table 7, positions five to nine exhibit different allowed substitutions (shown in bold). Table 7
- Positions one to four are conserved positions and positions five to nine are non-conserved.
- the allowed substitutions at positions five and six are each A or T.
- the allowed substitutions at positions seven and eight are A, T or an indel and A, G or an indel, respectively.
- the allowed substitutions at position nine are an A or an indel.
- the first four positions are invariant and the last five positions contain allowed substitutions.
- a library of nucleic acid sequences may be generated by limiting positions five through nine to the allowed substitutions. For example, a first subset of sequences is generated having a length of six nucleotides. The sequences are: tatata, tatatt, tataaa, tataat. A second subset of sequences is generated having a length of seven nucleotides. The sequences are: tatatta, tatattt, tatataa, tatatat, tataata, tataatt, tataaaa, tataata. A third subset of sequences is generated having a length of eight nucleotides.
- sequences are: tatataaa, tatataag, tatatata, tatatatg, tatattaa, tatattag, tatattta, tatatttg, tataaaaa, tataaaag, tataaata, tataaatg, tataataa, tataatag, tataatta, tataattg.
- a fourth subset of sequences is generated having a length of nine nucleotides.
- sequences are: tatataaaa, tatataaga, tatatataa, tatatatga, tatattaaa, tatattaga, tatatttaa, tatatttga, tataaaaa, tataaaaga, tataaataa, tataaatga, tataataaa, tataataga, tataattaa, tataattga.
- the four subsets of sequences are combined to form the set of sequences composed of the union of observed nucleotides or indels at each position.
- the library may be screened to identify a TATA-box sequence having a desired property.
- a TATA-box binding protein may be synthesized and attached to a column.
- the library may then be passed over the column under conditions favorable for binding of the TATA-box binding protein to members of the library.
- Non-binding members may be removed in the flow through and subsequent washing steps.
- Bound members may be eluted and reapplied to the column. These steps may be repeated as often as appropriate and desirable.
- the bound sequences are eluted and directly sequenced or cloned into vectors known in the art. By this procedure optimal binding sites for the TATA-box binding protein are isolated.
- the conotoxin Cn3.4 is a novel member of the ⁇ -conotoxin family, having the following sequence: XXCCXGXXGXCXGXACXXXCCX (SEQ ID NO:39).
- the conotoxin Im3.1 is a novel member of the ⁇ -conotoxin family, having the following sequence: XCCXGXXGXCXGXACXNXXCCA (SEQ ID NO:40).
- the venum ducts from the Conus genus express an active toxin, for example, Cn3.4, wherein the amino acid sequence may contain either pyroglutamate or glutamine and retain biological activity.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Medicinal Chemistry (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Organic Chemistry (AREA)
- Public Health (AREA)
- General Chemical & Material Sciences (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Neurology (AREA)
- Neurosurgery (AREA)
- Veterinary Medicine (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Animal Behavior & Ethology (AREA)
- Medical Informatics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Evolutionary Biology (AREA)
- Molecular Biology (AREA)
- Pharmacology & Pharmacy (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- Library & Information Science (AREA)
- Urology & Nephrology (AREA)
- Immunology (AREA)
- Hematology (AREA)
- Pain & Pain Management (AREA)
- Crystallography & Structural Chemistry (AREA)
- Food Science & Technology (AREA)
- Cell Biology (AREA)
- Computing Systems (AREA)
- Psychology (AREA)
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002527317A CA2527317A1 (en) | 2003-06-05 | 2004-06-04 | A library of phylogenetically related sequences |
JP2006515233A JP2007526221A (en) | 2003-06-05 | 2004-06-04 | Library of phylogenetic related sequences |
EP04754500A EP1639080A4 (en) | 2003-06-05 | 2004-06-04 | A library of phylogenetically related sequences |
AU2004246009A AU2004246009A1 (en) | 2003-06-05 | 2004-06-04 | A library of phylogenetically related sequences |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/456,375 US20040248189A1 (en) | 2003-06-05 | 2003-06-05 | Method of making a library of phylogenetically related sequences |
US10/456,375 | 2003-06-05 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2004108901A2 true WO2004108901A2 (en) | 2004-12-16 |
WO2004108901A3 WO2004108901A3 (en) | 2006-03-09 |
Family
ID=33490158
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2004/017903 WO2004108901A2 (en) | 2003-06-05 | 2004-06-04 | A library of phylogenetically related sequences |
Country Status (6)
Country | Link |
---|---|
US (1) | US20040248189A1 (en) |
EP (1) | EP1639080A4 (en) |
JP (1) | JP2007526221A (en) |
AU (1) | AU2004246009A1 (en) |
CA (1) | CA2527317A1 (en) |
WO (1) | WO2004108901A2 (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006112885A1 (en) * | 2005-04-14 | 2006-10-26 | The Curators Of The University Of Missouri | System and method for sequence variation prediction and genetic engineering detection using documented codon/amino acid mutation and/or substitution patterns |
EP2109054A1 (en) | 2008-04-09 | 2009-10-14 | Biotempt B.V. | Methods for identifying biologically active peptides and predicting their function |
EP2889307B1 (en) * | 2012-08-07 | 2018-05-02 | Hainan University | Alpha-conotoxin peptide, and medical composition and purpose thereof |
US9859394B2 (en) | 2014-12-18 | 2018-01-02 | Agilome, Inc. | Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids |
US9857328B2 (en) | 2014-12-18 | 2018-01-02 | Agilome, Inc. | Chemically-sensitive field effect transistors, systems and methods for manufacturing and using the same |
US10020300B2 (en) | 2014-12-18 | 2018-07-10 | Agilome, Inc. | Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids |
US9618474B2 (en) | 2014-12-18 | 2017-04-11 | Edico Genome, Inc. | Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids |
CA2971589C (en) | 2014-12-18 | 2021-09-28 | Edico Genome Corporation | Chemically-sensitive field effect transistor |
US10006910B2 (en) | 2014-12-18 | 2018-06-26 | Agilome, Inc. | Chemically-sensitive field effect transistors, systems, and methods for manufacturing and using the same |
EP3459115A4 (en) | 2016-05-16 | 2020-04-08 | Agilome, Inc. | Graphene fet devices, systems, and methods of using the same for sequencing nucleic acids |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5264371A (en) * | 1989-11-22 | 1993-11-23 | Neurex Corporation | Screening method for neuroprotective compounds |
US5670113A (en) * | 1991-12-20 | 1997-09-23 | Sibia Neurosciences, Inc. | Automated analysis equipment and assay method for detecting cell surface protein and/or cytoplasmic receptor function using same |
US5763568A (en) * | 1992-01-31 | 1998-06-09 | Zeneca Limited | Insecticidal toxins derived from funnel web (atrax or hadronyche) spiders |
DE69621940T2 (en) * | 1995-08-18 | 2003-01-16 | Morphosys Ag | PROTEIN - / (POLY) PEPTIDE LIBRARIES |
US5989814A (en) * | 1997-04-01 | 1999-11-23 | Reagents Of The University Of California | Screening methods in eucaryotic cells |
US6274319B1 (en) * | 1999-01-29 | 2001-08-14 | Walter Messier | Methods to identify evolutionarily significant changes in polynucleotide and polypeptide sequences in domesticated plants and animals |
EP1181359A1 (en) * | 1999-05-28 | 2002-02-27 | Sangamo Biosciences Inc. | Gene switches |
WO2001090149A2 (en) * | 2000-05-22 | 2001-11-29 | Pharmacia & Upjohn Company | G protein-coupled receptors |
WO2002031745A1 (en) * | 2000-10-10 | 2002-04-18 | Genencor International, Inc. | Information rich libraries |
US20030022240A1 (en) * | 2001-04-17 | 2003-01-30 | Peizhi Luo | Generation and affinity maturation of antibody library in silico |
CA2456950A1 (en) * | 2001-08-10 | 2003-02-20 | Xencor | Protein design automation for protein libraries |
-
2003
- 2003-06-05 US US10/456,375 patent/US20040248189A1/en not_active Abandoned
-
2004
- 2004-06-04 CA CA002527317A patent/CA2527317A1/en not_active Abandoned
- 2004-06-04 EP EP04754500A patent/EP1639080A4/en not_active Withdrawn
- 2004-06-04 JP JP2006515233A patent/JP2007526221A/en active Pending
- 2004-06-04 AU AU2004246009A patent/AU2004246009A1/en not_active Abandoned
- 2004-06-04 WO PCT/US2004/017903 patent/WO2004108901A2/en active Application Filing
Non-Patent Citations (1)
Title |
---|
See references of EP1639080A4 * |
Also Published As
Publication number | Publication date |
---|---|
US20040248189A1 (en) | 2004-12-09 |
EP1639080A2 (en) | 2006-03-29 |
AU2004246009A1 (en) | 2004-12-16 |
WO2004108901A3 (en) | 2006-03-09 |
CA2527317A1 (en) | 2004-12-16 |
EP1639080A4 (en) | 2008-10-01 |
JP2007526221A (en) | 2007-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Egan et al. | Applications of next‐generation sequencing in plant biology | |
Weile et al. | A framework for exhaustively mapping functional missense variants | |
Wang et al. | Large-scale discovery of non-conventional peptides in maize and Arabidopsis through an integrated peptidogenomic pipeline | |
Berglund et al. | Next-generation sequencing technologies and applications for human genetic history and forensics | |
Thompson et al. | RASCAL: rapid scanning and correction of multiple sequence alignments | |
CA2923758C (en) | Structure based predictive modeling | |
Kharrat et al. | Structure of the dsRNA binding domain of E. coli RNase III. | |
Yamasaki et al. | Solution structure of the major DNA-binding domain of Arabidopsis thaliana ethylene-insensitive3-like3 | |
Witherspoon et al. | Human population genetic structure and diversity inferred from polymorphic L1 (LINE-1) and Alu insertions | |
US20040248189A1 (en) | Method of making a library of phylogenetically related sequences | |
Poluri et al. | Protein engineering techniques: Gateways to synthetic protein universe | |
Kraberger et al. | Australian monocot-infecting mastrevirus diversity rivals that in Africa | |
US6743580B2 (en) | Methods for producing transgenic plants containing evolutionarily significant polynucleotides | |
Goli et al. | Global and local ancestry and its importance: a review | |
Roly et al. | A comparative in silico characterization of functional and physicochemical properties of 3FTx (three finger toxin) proteins from four venomous snakes | |
Zhang et al. | Genome-wide identification of microsatellites in white clover (Trifolium repens L.) using FIASCO and phpSSRMiner | |
Przezdziak et al. | Probing the ligand‐binding specificity and analyzing the folding state of SPOT‐synthesized FBP28 WW domain variants | |
Deem et al. | Problems with paralogs: the promise and challenges of gene duplicates in evo-devo research | |
Boisbouvier et al. | Simultaneous determination of disulphide bridge topology and three-dimensional structure using ambiguous intersulphur distance restraints: Possibilities and limitations | |
Chen et al. | Identification and fine-mapping of Xo2, a novel rice bacterial leaf streak resistance gene | |
Habermann | Oh Brother, where art thou? Finding orthologs in the twilight and midnight zones of sequence similarity | |
Zakon et al. | Voltage-gated sodium channel gene repertoire of lampreys: gene duplications, tissue-specific expression and discovery of a long-lost gene | |
AU784869B2 (en) | Methods to identify evolutionarily significant changes in polynucleotide and polypeptide sequences in domesticated plants and animals | |
Caporale et al. | Probing the modelled structure of wheatwin1 by controlled proteolysis and sequence analysis of unfractionated digestion mixtures | |
Almeida et al. | Dynamic co-evolution of transposable elements and the piRNA pathway in African cichlid fishes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2527317 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2004246009 Country of ref document: AU |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2006515233 Country of ref document: JP |
|
ENP | Entry into the national phase |
Ref document number: 2004246009 Country of ref document: AU Date of ref document: 20040604 Kind code of ref document: A |
|
WWP | Wipo information: published in national office |
Ref document number: 2004246009 Country of ref document: AU |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2004754500 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2004754500 Country of ref document: EP |