EP1141301A2 - Sequenzbestimmte dna-fragmente und entsprechende, durch diese kodierte polypeptide - Google Patents

Sequenzbestimmte dna-fragmente und entsprechende, durch diese kodierte polypeptide

Info

Publication number
EP1141301A2
EP1141301A2 EP00901405A EP00901405A EP1141301A2 EP 1141301 A2 EP1141301 A2 EP 1141301A2 EP 00901405 A EP00901405 A EP 00901405A EP 00901405 A EP00901405 A EP 00901405A EP 1141301 A2 EP1141301 A2 EP 1141301A2
Authority
EP
European Patent Office
Prior art keywords
seq
sequence
alignment
location
polypeptide
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP00901405A
Other languages
English (en)
French (fr)
Other versions
EP1141301A4 (de
Inventor
Nickolai Alexandrov
Vyacheslav Brover
Xianfeng Chen
Gopalakrishnan Subramanian
Maxim E. Troukhan
Liansheng Zheng
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ceres Inc
Original Assignee
Ceres Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ceres Inc filed Critical Ceres Inc
Publication of EP1141301A2 publication Critical patent/EP1141301A2/de
Publication of EP1141301A4 publication Critical patent/EP1141301A4/de
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/415Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from plants

Definitions

  • the present invention relates to isolated polynucleotides that encode all, or a portion of, a gene that is expressed and the corresponding polypeptide.
  • the present invention also relates to isolated polynucleotides that encode regulatory regions of genes.
  • the present invention comprises polynucleotides, such as complete cDNA sequences and/or sequences of genomic DNA encompassing complete genes, portions of genes, and/or intergenic regions, hereinafter collectively referred to as "Sequence-Determined DNA Fragments" (SDFs), from plants, particularly corn and Arabidopsis thaliana and polypeptides derived therefrom.
  • SDFs Sequence-Determined DNA Fragments
  • the SDFs span the entirety of a protein-coding segment.
  • the entirety of an mRNA is represented.
  • Other objects of the invention are the control sequences, such as but not limited to promoters, that are also represented by SDFs of the invention. Complements of any sequence of the invention are also considered part of the invention.
  • polynucleotides comprising exon sequences, polynucleotides comprising intron sequences, polynucleotides comprising introns together with exons, intron/exon junction sequences, 5' untranslated sequences, and 3' untranslated sequences of the SDFs of the present invention.
  • Polynucleotides representing the joinder of any exons described herein, in any arrangement, for example, to produce a sequence encoding any desirable amino acid sequence are within the scope of the invention.
  • the present invention also resides in probes useful for isolating and identifying nucleic acids that hybridize to an SDF of the invention.
  • the probes are typically of a length of 12 to 2000 nucleotides long; more typically, 15 to 200 nucleotides long; even more typically, 18 to 100 nucleotides long.
  • Yet another object of the invention is a method of isolating and/or identifying nucleic acids using the following steps:
  • the conditions for hybridization can be from low to moderate to high stringency conditions.
  • the sample can include a polynucleotide having a sequence unique in a plant genome. Probes and methods of the invention are useful, for example, without limitation, for mapping of genetic traits and/or for positional cloning of a desired portion of genomic DNA.
  • Probes and methods of the invention can also be used for detecting alternatively spliced messages within a species. Probes and methods of the invention can further be used to detect or isolate related genes in other plant species using genomic DNA (gDNA) and/or cDNA libraries. In some instances, especially when longer probes and low to moderate stringency hybridization conditions are used, the probe will hybridize to a plurality of cDNA and/or gDNA sequences of a plant. This approach is useful for isolating representatives of gene families which are identifiable by possession of a common functional domain in the gene product or which have common cis- acting regulatory sequences. This approach is also useful for identifying orthologous genes from other organisms, which can be more or less related to corn, Arabidopsis, or another plant.
  • the present invention also resides in constructs for modulating the expression of the genes comprised of all or a portion of an SDF.
  • the constructs comprise all or a portion of the expressed SDF, or of a complementary sequence.
  • Examples of constructs include ribozymes comprising RNA encoded by an SDF or by a sequence complementary thereto, antisense constructs, constructs comprising coding regions or parts thereof, constructs comprising promoters, introns, untranslated regions, etc.
  • the construct When inserted into a host cell the construct is, preferably, functionally integrated with or operatively linked to a heterologous polynucleotide.
  • a coding region from an SDF might be operably linked to a promoter that is functional in a plant.
  • the present invention also resides in host cells, including bacterial or yeast cells or plant cells, and transgenic plants that harbor constructs such as described above.
  • Another aspect of the invention relates to methods for modulating expression of specific genes in transgenic plants by expression of the structural gene component of the constructs, by regulation of expression of one or more endogenous genes in a transgenic plant or by suppression of expression of the polynucleotides of the invention in a transgenic plant.
  • Methods of modulation of gene expression include without limitation (1) inserting into a host cell additional copies of a polynucleotide comprising a coding sequence; (2) modulating an endogenous promoter in a host cell; (3) inserting antisense or ribozyme constructs into a host cell and (4) inserting into a host cell a polynucleotide comprising a sequence encoding a mutant, fragment, or fusion of the native polypeptides of the instant invention.
  • sequences of exemplary SDFs and polypeptides encoded thereby of the instant invention are listed in SEQ TABLES 1 and 2; annotation relevant to the sequences shown in SEQ TABLES 1 and 2 is presented in REF TABLES 1 and 2.
  • Each sequence corresponds to a Maximum Length cDNA Polynucleotide Sequence.
  • Each Maximum Length cDNA Polynucleotide Sequence corresponds to at least one sequence in SEQ TABLE 1 and 2.
  • REF TABLE 1 corresponds with SEQ TABLE 1;
  • REF TABLE 2 corresponds with SEQ TABLE 2.
  • REF TABLES 1 and 2 are Reference Tables which correlate each of the sequences and SEQ ID NOS in SEQ TABLES 1 and 2 with a corresponding Maximum Length cDNA Sequence (Ac) , Ceres
  • SEQ TABLES 1 and 2 are Sequence Tables containing the sequence of each nucleic acid and amino acid sequence.
  • each section begins by identifying the Maximum Length cDNA Polynucleotide Sequence, indicating a "Clone ID” that is a number used for identification purposes by the applicant and in some instances a "Public Genomic DNA” sequence, indicated by a "gi No” .
  • a public sequence is recited, there follows information about gene annotations such as predicted exons.
  • INIT denotes an initial exon.
  • INTR denotes an internal exon.
  • TERM denotes a terminal exon.
  • the cDNA MLS is identified by its SEQ ID NO ("Pat. Appln. SEQ ID NO") and the Ceres sequence identifier ("Ceres seq_id” ) , which is also merely an identifier useful for the applicant.
  • SEQ ID NO SEQ ID NO
  • Ceres seq_id Ceres sequence identifier
  • the designation of "Alternative transcription start sites” can include both positive and negative numbers. Positive numbers refer to the referenced SEQ ID NO directly. The positions indicated by negative numbers, if any, refer to positions in the public genomic sequences. In instances where there is a "Public
  • Genomic DNA the relevant genomic sequence can be found by direct reference to the nucleotide sequence indicated by the "gi" number shown for the Public Genomic DNA.
  • the relevant nucleotide sequence for alignment is the nucleotide sequence associated with the amino acid sequence designated by a "gi" number in the section (Dp).
  • the nucleotide sequence is found in GENBANK by clicking on the link in the National Center for Biotechnology Information Entrez database. The numbering is relative to position 1 as determined by aligning the first residue of the MLS cDNA sequence (SEQ ID NO *) with the genomic sequence corresponding to the relevant "gi" number.
  • Subsection (B) lists SEQ ID NOS and Ceres seq_ids for polypeptide sequences encoded by the cDNA sequence and the location of the start codon within the cDNA sequence that codes for the polypeptide. Subsection (B) also describes additional features within the polypeptide sequence. Subsection (C) provides information regarding identified domains (where present) within the polypeptide and (where present) a name for the polypeptide. Subsection (Dp) provides
  • “related” sequences are identified by a "gi” number and are amino acid sequences in the publicly accessible BLAST databases on the NCBI FTP web site (accessible at ncbi.nlm.gov/blast).
  • the database at the NCBI FTP site utilizes the "gi" identifiers to assign by NCBI a unique identifier for each sequence in the databases, thereby providing a non-redundant database for sequences from various databases, including GenBank, EMBL, DBBJ (DNA Database of Japan) and PDB (Brookhaven Protein Data Bank) .
  • (Ba) when present, describes a sequence as being considered plant-specific (i.e. a gene found only in a plant) or describes a bichemical activity for the protein encoded by the exemplary SDF.
  • Subsection (Dn) provides polynucleotide sequences (where present) related to the Maximum Length cDNA sequence .
  • the invention relates to (I) polynucleotides and methods of use thereof, such as IA. Probes, Primers and Substrates;
  • IB Methods of Detection and Isolation; B.l. Hybridization; B.2. Methods of Mapping; B.3. Southern Blotting; B.4. Isolating cDNA from Related Organisms;
  • polypeptides including, without limitation, native proteins, mutants, fragments, and fusions. Antibodies to said polypeptides are also disclosed.
  • the specification also discloses (III) methods of modulating polypeptide production or activity. Examples of such methods include (i) suppressed, (ii) enhanced, and (iii) directed expression.
  • the specification also discloses (IV) gene constructs and construction of expression vectors, including (IVA) coding sequences and (IVB) promoters, and (IVC) Signal Peptides, (V) transformation procedures to illustrate the invention by way of examples .
  • a number of the nucleotide sequences disclosed in SEQ TABLES 1 AND 2 herein as representative of the SDFs of the invention can be obtained by sequencing genomic DNA (gDNA) and/or cDNA from corn plants grown from HYBRID SEED # 35A19, purchased from Pioneer Hi-Bred International, Inc., Supply Management, P.O. Box 256, Johnston, Iowa 50131-0256.
  • Exemplified SDFs of the invention represent portions of the genome of corn or Arabidopsis and/or represent mRNA expressed from that genome.
  • the isolated nucleic acid of the invention also encompasses corresponding portions of the genome and/or cDNA complement of other organisms as described in detail below.
  • HYBRID SEED # 35A19 Male inflorescences and female (pre-and post-fertilization) inflorescences were isolated at various stages of development. Selection for poly (A) containing polysomal RNA was done using oligo d(T) cellulose columns, as described by Cox and Goldberg, "Plant Molecular Biology: A Practical Approach", pp. 1-35, Shaw ed., c. 1988 by IRL, Oxford.
  • Tissues were or each organ was individually pulverized and frozen in liquid nitrogen. Next, the samples were homogenized in the presence of detergents and then centrifuged. The debris and nuclei were removed from the sample and more detergents were added to the sample. The sample was centrifuged and the debris was removed. Then the sample was applied to a 2M sucrose cushion to isolate polysomes. The RNA was isolated by treatment with detergents and proteinase K followed by ethanol precipitation and centrifugation. The polysomal RNA from the different tissues was pooled according to the following mass ratios: 15/15/1 for male inflorescences, female inflorescences and root, respectively. The pooled material was then used for cDNA synthesis by the methods described below.
  • a number of the nucleotide sequences disclosed in SEQ TABLES 1 AND 2 herein as representative of the SDFs of the invention can also be obtained by sequencing genomic DNA from Arabidopsis thaliana , Wassilewskija ecotype or by sequencing cDNA obtained from mRNA from such plants as described below. This is a true breeding strain. Seeds of the plant are available from the Arabidopsis Biological Resource Center at the Ohio State University, under the accession number CS2360.
  • Arabidopsis cDNA clones having sequences presented in SEQ TABLES 1 AND 2 was polysomal RNA isolated from the top-most inflorescence tissues and roots of Arabidopsis thaliana Landsberg erecta (L. er.) also obtained from the Arabidopsis Biological Resource Center. Nine parts inflorescence to every part root was used, as measured by mass. Tissue was pulverized and exposed to liquid nitrogen. Next, the sample was homogenized in the presence of detergents and then centrifuged. The debris and nuclei were removed from the sample and more detergents were added to the sample. The sample was centrifuged ind the debris was removed and the sample was applied to a 2M sucrose cushion to isolate polysomal RNA.
  • mRNAs Following preparation of the mRNAs from various tissues as described above, selection of mRNA with intact 5' ends and specific attachment of an oligonucleotide tag to the 5' end of such mRNA was performed using either a chemical or enzymatic approach. Both techniques take advantage of the presence of the "cap" structure, which characterizes the 5' end of most intact mRNAs and which comprises a guanosine generally methylated once, at the 7 position.
  • the chemical modification approach involves the optional elimination of the 2', 3'-cis diol of the 3' terminal ribose, the oxidation of the 2', 3'-cis diol of the ribose linked to the cap of the 5' ends of the mRNAs into a dialdehyde, and the coupling of the such obtained dialdehyde to a derivatized oligonucleotide tag. Further detail regarding the chemical approaches for obtaining mRNAs having intact 5' ends are disclosed in International Application No. W096/34981 published
  • the oligonucleotide tag has a restriction enzyme site (e.g. an EcoRI site) therein to facilitate later cloning procedures.
  • a restriction enzyme site e.g. an EcoRI site
  • the integrity of the mRNA is examined by performing a Northern blot using a probe complementary to the oligonucleotide tag.
  • first strand cDNA synthesis is performed using an oligo-dT primer with reverse transcriptase.
  • This oligo-dT primer can contain an internal tag of at least 4 nucleotides, which can be different from one mRNA preparation to another.
  • Methylated dCTP is used for cDNA first strand synthesis to protect the internal EcoRI sites from digestion during subsequent steps.
  • the first strand cDNA is precipitated using isopropanol after removal of RNA by alkaline hydrolysis to eliminate residual primers.
  • Second strand cDNA synthesis is conducted using a DNA polymerase, such as Klenow fragment and a primer corresponding to the 5' end of the ligated oligonucleotide.
  • the primer is typically 20-25 bases in length.
  • Methylated dCTP is used for second strand synthesis in order to protect internal EcoRI sites in the cDNA from digestion during the cloning process.
  • the full-length cDNAs are cloned into a phagemid vector, such as pBlueScriptTM (Stratagene) .
  • the ends of the full-length cDNAs are blunted with T4 DNA polymerase (Biolabs) and the cDNA is digested with EcoRI. Since methylated dCTP is used during cDNA synthesis, the EcoRI site present in the tag is the only hemi-methylated site; hence the only site susceptible to EcoRI digestion.
  • an Hind III adapter is added tc the 3' end of full-length cDNAs .
  • the full-length cDNAs are then size fractionated using either exclusion chromatography (AcA, Biosepra) or electrophoretic separation which yields 3 to 6 different fractions.
  • the full-length cDNAs are then directionally cloned either into pBlueScriptTM using either the EcoRI and Smal restriction sites or, when the Hind III adapter is present in the full-length cDNAs, the EcoRI and Hind III restriction sites.
  • the ligation mixture is transformed, preferably by electroporation, into bacteria, which are then propagated under appropriate antibiotic selection. Clones containing the oligonucleotide tag attached to full-length cDNAs are selected as follows.
  • the plasmid cDNA libraries made as described above are purified (e.g. by a column available from Qiagen) .
  • a positive selection of the tagged clones is performed as follows. Briefly, in this selection procedure, the plasmid DNA is converted to single stranded DNA using phage FI gene II endonuclease in combination with an exonuclease (Chang et al.,
  • the single stranded DNA is hybridized with a biotinylated oligonucleotide having a sequence corresponding to the 3' end of the oligonucleotide tag.
  • the primer has a length of 20-25 bases.
  • Clones including a sequence complementary to the biotinylated oligonucleotide are selected by incubation with streptavidin coated magnetic beads followed by magnetic capture. After capture of the positive clones, the plasmid DNA is released from the magnetic beads and converted into double stranded DNA using a DNA polymerase such as ThermoSequenaseTM (obtained from Amersham Pharmacia Biotech) .
  • Protocols such as the Gene TrapperTM kit (Gibco BRL) ca be used.
  • the double stranded DNA is then transformed, preferably by electroporation, into bacteria.
  • the percentage of positive clones having the 5' tag oligonucleotide is typically estimated to be between 90 and 98% from dot blot analysis.
  • the Arabidopsis library was deposited at the American Type Culture Collection on January 7, 2000 as E-coli liba 010600" under the accession number
  • Sequence errors may arise in the normal course of determination of nucleotide sequences. Sequence errors can be corrected by obtaining seeds deposited under the accession numbers cited above, propagating them, isolating genomic DNA or appropriate mRNA from the resulting plants or seeds thereof, amplifying the relevant portion of the genomic DNA or mRNA using primers having a sequence that flanks the erroneous sequence, and sequencing the amplification product.
  • SDFs of the invention can be applied to substrates for use in array applications such as, but not limited to, assays of global gene expression, for example under varying conditions of development, growth conditions.
  • the arrays can also be used in diagnostic or forensic methods.
  • Probes and primers of the instant invention will hybridize to a polynucleotide comprising a sequence in SEQ TABLES 1 AND 2. Though many different nucleotide sequences can encode an amino acid sequence, in some instances, the sequences of SEQ TABLES 1 AND 2 are preferred for encoding polypeptides of the invention. However, the sequence of the probes and/or primers of the instant invention need not be identical to those in SEQ TABLES 1 AND 2 or the complements thereof. For example, some variation in probe or primer sequence and/or length can allow additional family members to be detected, as well as orthologous genes and more taxonomically distant related sequences. Similarly probes and/or primers of the invention can include additional nucleotides that serve as a label for detecting the formed duplex or for subsequent cloning purposes.
  • Probe length will vary depending on the application. For use as PCR primers, probes should be 12-40 nucleotides, preferably 18-30 nucleotides long. For use in mapping, probes should be 50 to 500 nucleotides, preferably 100-250 nucleotides long. For Southern hybridizations, probes as long as several kilobases can be used as explained below.
  • the probes and/or primers can be produced by synthetic procedures such as the triester method of Matteucci et al. J.
  • Probes and/or primers can be used for detection and/or isolation of polynucleotide sequences. Such polynucleotides are included in the subject matter of the invention. Depending on the stringency of the conditions under which these probes and/or primers are used, polynucleotides exhibiting a wide range of similarity to those in SEQ TABLES 1 AND 2 can be detected or isolated.
  • N is the length of the probe. This equation works well for probes 14 to 70 nucleotides in length that are identical to the target sequence.
  • the equation below for T m of DNA-DNA hybrids is useful for probes in the range of 50 to greater than 500 nucleotides, and for conditions that include an organic solvent (formamide) .
  • T-. 81 . 5 + 16 . 6 log ⁇ [Na + ] / ⁇ 1 + 0 . 7 [Na + ] ) ⁇ + 0 . 41 ( %G+C ) -500/L 0 . 63 ( % f orm ⁇ mide ) ( 2 )
  • T m of equation (2) is affected by the nature of the hybrid; for DNA-RNA hybrids T m is 10-15°C higher than calculated, for RNA-RNA hybrids T m is 20-25°C higher. Because the T m decreases about 1 °C for each 1% decrease in homology when a long probe is used (Bonner et al . , J. Mol . Biol .
  • Equation (2) is derived assuming equilibrium and therefore, hybridizations according to the present invention are most preferably performed under conditions of probe excess and for sufficient time to achieve equilibrium. The time required to reach equilibrium can be shortened by inclusion of a "hybridization accelerator" such as dextran sulfate or another high volume polymer in the hybridization buffer.
  • the practitioner When using SDFs to identify orthologous genes in other species, the practitioner will preferably adjust the amount of target DNA of each species so that, as nearly as is practical, the same number of genome equivalents are present for each species examined. This prevents faint signals from species having large genomes, and thus small numbers of genome equivalents per mass of DNA, from erroneously being interpreted as absence of the corresponding gene in the genome .
  • Hybridization of one nucleic acid to another constitutes a physical property that defines the subject SDF of the invention. Also, such hybridization imposes structural limitations on the pair. For example, for a probe molecule, given that the sequence of the probe nucleic acid is known and fixed, equation (2) indicates that the combined variation in GC content of the target DNA and mismatch between the probe and the hybridizing DNA is determined for any given hybridization buffer composition and T m .
  • the probes and/or primers of the instant invention can be used to detect or isolate nucleotides that are “identical” to the probes or primers.
  • Two nucleic acid sequences or polypeptides are said to be “identical” if the sequence of nucleotides or amino acid residues, respectively, in the two sequences is the same when aligned for maximum correspondence as described below.
  • the term "complementary to” is used herein to mean that the sequence can form a Watson-Crick base pair with a reference polynucleotide sequence.
  • Complementary sequences can include nucleotides, such as inosine, that neither disrupt Watson-Crick base pairing nor contribute to the pairing.
  • Optimal alignment of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman Add. APL . Ma th . 2:482 (1981), by the homology alignment algorithm of Needleman and Wunsch J. Mol . Biol . 48:443 (1970), by the search for similarity method of Pearson and Lipman Proc. Na tl . Acad. Sci . (USA) 85_: 2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, BLAST, PASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group (GCG) , 575 Science Dr., Madison, WI), or by inspection. Given that two sequences have been identified for comparison, GAP and BESTFIT are preferably employed to determine their optimal alignment. Typically, the default values of 5.00 for gap weight and 0.30 for gap weight length are used.
  • the probes and/or primers of the invention can also be used to detect and/or isolate polynucleotides exhibiting at least 80% sequence identity with the sequences of SEQ TABLES 1 AND 2 or fragments thereof.
  • Percentage of sequence identity is determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions
  • the percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity. "Percentage of sequence identity" can be determined by the algorithms described above.
  • substantially identical between polynucleotide or polypeptide sequences refers to polynucleotide or polypeptide comprising a sequence that has at least 80% sequence identity, preferably at least 85%, more preferably at least 90% and most preferably at least 95%, even more preferably, at least 96%, 97%, 98% or 99% sequence identity compared to a reference sequence using the programs.
  • Isolated polynucleotides within the scope of the invention also include allelic variants of the specific sequences presented in SEQ TABLES 1 AND 2.
  • An "allelic variant” is a sequence that is a variant from that of the SDF, but represents the same chromosomal locus in the organism. Allelic variants can arise by normal genetic variation in a population. Allelic variants can also be produced by genetic engineering methods. An allelic variant can be one that is found in a naturally occurring plant, including a cultivar or ecotype. A silent allele can give rise to phenotypic and expression profiles. An allelic variant may or may not give rise to a phenotypic change, and may or may not be expressed. An expressed allele can result in a detectable change in the phenotype of the trait represented by the locus. Allelic variations can occur in any portion of the gene sequence, including regulatory regions as well as structural regions.
  • degeneracy of the genetic code provides the possibility to substitute at least one base of the base sequence of a gene with a different base without causing the amino acid sequence of the polypeptide produced from the gene to be changed.
  • the DNA of the present invention may also have any base sequence that has been changed from a sequence in SEQ TABLES 1 AND 2 by substitution in accordance with degeneracy of genetic code.
  • the isolated SDF DNA of the invention can be used to create various types of genetic and physical maps of the genome of corn, Arabidopsis or other plants. Some SDFs may be absolutely associated with particular phenotypic traits, allowing construction of gross genetic maps. While not all SDFs will immediately be associated with a phenotype, all SDFs can be used as probes for identifying polymorphisms associated with phenotypes of interest. Briefly, total DNA is isolated from individuals and is subsequently cleaved with one or more restriction enzymes, separated according to mass, transferred to a solid support, hybridized with SDF DNA and the pattern of fragments compared.
  • Polymorphisms associated with a particular SDF are visualized as differences in the size of fragments produced between individual DNA samples after digestion with a particular restriction enzyme and hybridization with the SDF.
  • linkage studies can be conducted. Recombinants produced are analyzed using the same restriction enzyme/hybridization procedure. After identification of many polymorphisms using SDF sequences, linkage studies can be conducted by using the individuals showing polymorphisms as parents in crossing programs. F2 progeny recombinants or recombinant inbreds, for example, are then analyzed using the same restriction enzyme/hybridization procedure.
  • the order of DNA polymorphisms along the chromosomes can be inferred based on the frequency with which they are inherited together versus independently. The closer two polymorphisms are together in a chromosome the higher the probability that they are inherited together. Integration of the relative positions of all the polymorphisms and associated marker SDFs produces a genetic map of the species, where the distances between markers reflect the recombination frequencies in that chromosome segment.
  • this procedure is not limited to plants and can be used for other organisms (such as yeast) or for individual cells .
  • SDFs of the present invention can also be used for simple sequence repeat (SSR) mapping.
  • SSR simple sequence repeat
  • SSR mapping can be achieved using various methods.
  • polymorphisms are identified when sequence specific probes flanking an SSR contained within an SDF are made and used in polymerase chain reaction (PCR) assays with template DNA from two or more individuals of interest.
  • PCR polymerase chain reaction
  • a change in the number of tandem repeats between the SSR-flanking sequence produces differently sized fragments (U.S. Patent 5,766,847).
  • polymorphisms can be identified by using the PCR fragment produced from the SSR-flanking sequence specific primer reaction as a probe against Southern blots representing different individuals (U.H. Refseth et al., (1997) Electrophoresis 18: 1519).
  • QTLs Quantitative Trait Loci
  • Many important crop traits such as the solids content of tomatoes, are quantitative traits and result from the combined interactions of several genes. These genes reside at different loci in the genome, oftentimes on different chromosomes, and generally exhibit multiple alleles at each locus.
  • the SDFs of the invention can be used to identify QTLs and isolate specific alleles as described by de Vicente and Tanksley (Genetics 134:585 (1993)). In addition to isolating QTL alleles present crop species, the SDFs of the invention can also be used to isolate alleles from the corresponding QTL of wild relatives.
  • Transgenic plants having various combinations of QTL alleles can then be created and the effects of the combinations measured. Once an ideal allele combination has been identified, crop improvement can be accomplished either through biotechnological means or by directed conventional breeding programs (for review see Tanksley and McCouch, Science
  • the SDFs can be used to help create physical maps of the genome of corn, Arabidopsis and related species. Where SDFs have been ordered on a genetic map, as described above, then SDFs can be used as probes to discover which clones in large libraries of plant DNA fragments in YACs, BACs, etc. contain the same SDF or similar sequences, thereby facilitating the assignment of the large DNA fragments to chromosomal positions. Subsequently, the large BACs, YACs, etc. can be ordered unambiguously by more detailed studies of their sequence composition (e.g. Marra et al.
  • any individual can be genotyped. These individual genotypes can be used for the identification of particular cultivars, varieties, lines, ecotypes and genetically modified plants or can serve as tools for subsequent genetic studies involving multiple phenotypic traits.
  • sequences from SEQ TABLES 1 AND 2 can be used as probes for various hybridization techniques. These techniques are useful for detecting target polynucleotides in a sample or for determining whether transgenic plants, seeds or host cells harbor a gene or sequence of interest and thus might be expected to exhibit a particular trait or phenotype.
  • the hybridization of the SDFs of the invention to nucleic acids obtained from other organisms can be used to identify orthologous genes from other species and/or additional members of gene families either in the same or different species.
  • a Southern blot of genomic DNA provides description of isolated DNA fragments that comprise the orthologous genes or additional members of the gene families. That is, given such data, one of ordinary skill in the art could distinguish the isolated DNA fragments by their size together with the restriction sites at each end and by the property of hybridizing with the SDF probe under the stated conditions.
  • the SDFs from the invention can be used to isolate additional members of gene families from the same species and/or orthologous genes from different species.
  • transgenic plants having various combinations of alleles can be created and the effects of the combinations measured. Once a more favorable ideal allele combination has been identified, crop improvement can be accomplished either through biotechnological means or by directed conventional breeding programs (Tanksley et al. Science 277:1063(1997) ) .
  • results from hybridizations of the SDFs of the invention to Southern blots containing DNA from another species can also be used to generate restriction fragment maps for the corresponding genomic regions. These maps provide map provides additional information about the relative positions of restriction sites within fragments, further distinguishing mapped DNA from the remainder of the genome. Physical maps can be made by digesting genomic DNA with different combinations of restriction enzymes.
  • Probes for Southern blotting to distinguish individual restriction fragments can range in size from 15 to 20 nucleotides to several thousand nucleotides. More preferably, the probe is 100 to 1000 nucleotides long for identifying members of a gene family when it is found that repetitive sequences would complicate the hybridization. For identifying an entire corresponding gene in another species, the probe is more preferably the length of the gene, typically 2000 to 10,000 nucleotides, but probes 50-1,000 nucleotides long might be used. Some genes, however, might require probes up to 15,000 nucleotides long or overlapping probes constituting the full-length sequence to span their lengths.
  • the probe be homogeneous with respect to its sequence, that is not necessary.
  • a probe representing members of a gene family having diverse sequences can be generated using PCR to amplify genomic DNA or RNA templates using primers derived from SDFs that include sequences that define the gene family.
  • the probe for Southern blotting most preferably would be the genomic copy of the probe gene. This allows all elements of the gene to be identified in the other species.
  • the next most preferable probe is a cDNA spanning the entire coding sequence, which allows all of the mRNA-coding portion of the gene to be identified; in this case it is possible that some introns in the gene might be missed.
  • Probes for Southern blotting can easily be generated from SDFs by making primers having the sequence at the ends of the SDF and using corn or Arabidopsis genomic DNA as a template. In instances where the SDF includes sequence conserved among species, primers including the conserved sequence can be used for PCR with genomic DNA from a species of interest to obtain a probe.
  • the SDF includes a domain of interest
  • that portion of the SDF can be used to make primers and, with appropriate template DNA, used to make a probe to identify genes containing the domain.
  • the PCR products can be resolved, for example by gel electrophoresis, and cloned and/or sequenced. In this manner, the variants of the domain among members of a gene family, both within and across species, can be examined.
  • B.4.1 Isolating DNA from Related Organisms The SDFs of the invention can be used to isolate the corresponding DNA from other organisms. Either cDNA or genomic DNA can be isolated.
  • a lambda, cosmid, BAC or YAC, or other large insert genomic library from the plant of interest can be constructed using standard molecular biology techniques as described in detail by Sambrook et al. 1989 (Molecular Cloning: A Laboratory Manual, 2 nd ed. Cold Spring Harbor Laboratory Press, New York) and by Ausubel et al. 1992 (Current Protocols in Molecular Biology, Greene Publishing, New York) .
  • recombinant lambda clones are plated out on appropriate bacterial medium using an appropriate E. coli host strain.
  • the resulting plaques are lifted from the plates using nylon or nitrocellulose filters.
  • the plaque lifts are processed through denaturation, neutralization, and washing treatments following the standard protocols outlined by Ausubel et al. (1992).
  • the plaque lifts are hybridized to either radioactively labeled or non- radioactively labeled SDF DNA at room temperature for about 16 hours, usually in the presence of 50% formamide and 5X SSC (sodium chloride and sodium citrate) buffer and blocking reagents.
  • formamide and 5X SSC sodium chloride and sodium citrate
  • the plaque lifts are then washed at 42°C with 1% Sodium Dodecyl Sulfate (SDS) and at a particular concentration of SSC.
  • SSC concentration used is dependent upon the stringency at which hybridization occurred in the initial Southern blot analysis performed. For example, if a fragment hybridized under medium stringency (e.g., Tm - 20°C), then this condition is maintained or preferably adjusted to a less stringent condition (e.g., Tm- 30°C) to wash the plaque lifts.
  • Positive clones show detectable hybridization e.g., by exposure to X-ray films or chromogen formation.
  • the positive clones are then subsequently isolated for purification using the same general protocol outlined above. Once the clone is purified, restriction analysis can be conducted to narrow the region corresponding to the gene of interest. The restriction analysis and succeeding subcloning steps can be done using procedures described by, for example Sambrook et al. (1989) cited above.
  • the procedures outlined for the lambda library are essentially similar except the YAC clones are harbored in bacterial colonies.
  • the YAC clones are plated out at reasonable density on nitrocellulose or nylon filters supported by appropriate bacterial medium in petri plates.
  • the filters are processed through the denaturation, neutralization, and washing steps following the procedures of Ausubel et al. 1992.
  • the same hybridization procedures for lambda library screening are followed.
  • the library can be constructed in a lambda vector appropriate for cloning cDNA such as ⁇ gtll.
  • the cDNA library can be made in a plasmid vector.
  • cDNA for cloning can be prepared by any of the methods known in the art, but is preferably prepared as described above.
  • a cDNA library will include a high proportion of full-length clones.
  • Probes and primers of the invention can be used to identify and/or isolate polynucleotides related to those in SEQ
  • orthologous genes a gene that has a high degree of sequence similarity, often along the entire length of the coding portion of the gene, and also encodes a gene product that performs a similar function in the organism.
  • orthologous genes may be distinguished from homologous genes in that homologous genes share sequence similarity but often only in a portion of the sequence, which often represents a functional domain such as a tyrosine kinase activity, a DNA binding domain, or the like.
  • the functional activities of homologous genes are not necessarily the same, but are the same for orthologous genes.
  • the degree of identity is a function of evolutionary separation and, in closely related species, the degree of identity can be 98 to 100%.
  • the amino acid sequence of a protein encoded by an orthologous gene can be less than 75% identical, but tends to be at least75% or at least 80% identical, more preferably at least 90%, most preferably at least 95% identical to the amino acid sequence of the reference protein.
  • the probes are hybridized to nucleic acids from a species of interest under low stringency conditions and blots are then washed under conditions of increasing stringency. It is preferable that the wash stringency be such that sequences that are 85 to 100% identical will hybridize. More preferably, sequences 90 to 100% identical will hybridize and most preferably only sequences greater than 95% identical will hybridize.
  • the low stringency condition is preferably one where sequences containing as much as 40-45% mismatches will be able to hybridize. This condition is established by T m - 40°C to Tm - 48°C ( see below) .
  • amino acid sequences that are identical can be encoded by DNA sequences as little as 67% identical.
  • Identification of the relationship of nucleotide or amino acid sequences among plant species can be done by comparison of the subject nucleotide or amino acid sequence to the sequences of SDFs of the present application presented in SEQ TABLES 1 and 2.
  • the SDFs of the invention can also be used as probes to search for genes that are related to the SDF within a species.
  • Such related genes are typically considered to be members of a "gene family.”
  • the sequence similarity will often be concentrated into one or a few portions of the sequence.
  • the portions of similar sequence that define the gene family typically encode a portion of a protein or RNA that has an enzymatic or structural function.
  • the degree of identity in the amino acid sequence of the domain that defines the gene family is preferably at least 70%, more preferably 80 to 95%, most preferably 85 to 99%.
  • a "low stringency" hybridization is usually performed, but this will depend upon the size, distribution and degree of sequence divergence of domains that define the gene family.
  • SDFs encompassing regulatory regions can be used to identify "coordinately expressed" genes by using the regulatory region portion of the SDF as a probe.
  • the SDFs are identified as being expressed from genes that confer a particular phenotype, then the SDFs can also be used as probes to assay plants of different species for those phenotypes.
  • a well-known instance is the FLAVOR-SAVORTM tomato, in which the gene encoding ACC synthase is inactivated by an antisense approach, thus delaying softening of the fruit after ripening. See for example, U.S. Patent No. 5,859,330; U.S. Patent No. 5,723,766; Oeller, et al, Science, 254:437-439(1991); and Hamilton et al, Na ture, 346:284-287 (1990).
  • timing of flowering can be controlled by suppression of the FLOWERING LOCUS C; high levels of this transcript are associated with late flowering, while absence of FLC is associated with early flowering (S.D. Michaels et al., Plant Cell 11:949 (1999).
  • the transition of apical meristem from production of leaves with associated shoots to flowering is regulated by TERMINAL FLOWER1 , APETALA1 and LEAFY.
  • TFL1 expression S.J. Liljegren, Plan t Cell 11:1007 (1999)
  • dominant negative mutant proteins are useful tools for research, for example when a dominant negative mutation of a receptor is used to constitutively activate or suppress a signal transduction cascade, allowing examination of the phenotype and thus the trait (s) controlled by that receptor and pathway.
  • the introduced sequence need not be perfectly identical to a sequence of the target endogenous gene.
  • the introduced polynucleotide sequence will typically be at least substantially identical (as determined above) to the target endogenous sequence.
  • SDFs in SEQ TABLES 1 AND 2 represent sequences that are expressed in corn and/or Arabidopsis .
  • the invention includes using these sequences to generate antisense constructs to inhibit transcription and/or translation of said SDFs, typically in a plant cell.
  • a polynucleotide segment from the desired gene that can hybridize to the mRNA expressed from the desired gene (the "antisense segment") is operably linked to a promoter such that the antisense strand of RNA will be transcribed when the construct is present in a host cell.
  • a regulated promoter can be used in the construct to control transcription of the antisense segment so that transcription occurs only under desired circumstances.
  • the antisense segment to be introduced generally will be substantially identical to at least a portion of the endogenous gene or genes to be repressed. The sequence, however, need not be perfectly identical to inhibit expression. Further, the antisense product may hybridize to the untranslated region instead of or in addition to the coding portion of the gene.
  • the vectors of the present invention can be designed such that the inhibitory effect applies to other proteins within a family of genes exhibiting homology or substantial homology to the target gene.
  • the introduced antisense segment sequence also need not be full length relative to either the primary transcription product or fully processed mRNA. Generally, higher sequence identity can be used to compensate for the use of a shorter sequence. Furthermore, the introduced sequence need not have the same intron or exon pattern, and homology of non-coding segments may be equally effective. Normally, a sequence of between about 30 or 40 nucleotides and the full length of the transcript should be used, though a sequence of at least about 100 nucleotides is preferred, a sequence of at least about 200 nucleotides is more preferred, and a sequence of at least about 500 nucleotides is especially preferred.
  • Ribozymes can also be used to inhibit expression of genes by suppressing the translation of the mRNA into a polypeptide. It is possible to design ribozymes that specifically pair with virtually any target RNA and cleave the phosphodiester backbone at a specific location, thereby functionally inactivating the target RNA. In carrying out this cleavage, the ribozyme is not itself altered, and is thus capable of recycling and cleaving other molecules, making it a true enzyme. The inclusion of ribozyme sequences within antisense RNAs confers RNA-cleaving activity upon them, thereby increasing the activity of the constructs.
  • ribozymes are derived from a number of small circular RNAs, which are capable of self-cleavage and replication in plants. The RNAs replicate either alone
  • RNAs viroid RNAs
  • helper virus satellite RNAs
  • RNAs from avocado sunblotch viroid examples include RNAs from avocado sunblotch viroid and the satellite RNAs from tobacco ringspot virus, lucerne transient streak virus, velvet tobacco mottle virus, solanum nodiflorum mottle virus and subterranean clover mottle virus.
  • the design and use of target RNA-specific ribozymes is described in Haselhoff et al . Nature, 334:585 (1988).
  • the ribozyme sequence portion necessary for pairing need not be identical to the target nucleotides to be cleaved, nor identical to the sequences in SEQ TABLES 1 AND 2.
  • the sequence in the ribozyme capable of binding to the target sequence exhibits substantial sequence identity to a sequence in SEQ TABLES 1 AND 2 or the complement thereof, or to a portion of said sequence or complement.
  • the ribozyme sequence also need not be full length relative to either the primary transcription product or fully processed mRNA.
  • the ribozyme can be equally effective in inhibiting mRNA translation by cleaving either in the untranslated or coding regions. Generally, higher sequence identity can be used to compensate for the use of a shorter sequence.
  • the introduced sequence need not have the same intron or exon pattern, and homology of non-coding segments may be equally effective.
  • Another method of suppression is by introducing an exogenous copy of the gene to be suppressed.
  • Introduction of expression cassettes in which a nucleic acid is configured in the sense orientation with respect to the promoter into the chromosome of a plant or by a self-replicating virus has been shown to be an effective means by which to induce degradation of mRNAs of target genes.
  • this method to modulate expression of endogenous genes see, Napoli et al., The Plant Cell 2:279 (1990), and U.S. Patents Nos.
  • the introduced sequence generally will be substantially identical to the endogenous sequence intended to be inactivated.
  • the minimal identity will typically be greater than about 65%, but a higher identity might exert a more effective repression of expression of the endogenous sequences.
  • Sequence identity of more than about 80% is preferred, though about 95% to absolute identity would be most preferred. As with antisense regulation, the effect would likely apply to any other proteins within a similar family of genes exhibiting homology or substantial homology to the suppressing sequence.
  • Another means of suppressing gene expression is to insert a polynucleotide into the gene of interest to disrupt transcription or translation of the gene.
  • Low frequency homologous recombination can be used to target a polynucleotide insert to a gene by flanking the polynucleotide insert with sequences that are substantially similar to the gene to be disrupted. Sequences from SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto can be used for homologous recombination.
  • random insertion of polynucleotides into a host cell genome can also be used to disrupt the gene of interest. Azpiroz-Leehan et al., Trends in Genetics 13: 152 (1997).
  • screening for clones from a library containing random insertions is preferred to identifying those that have polynucleotides inserted into the gene of interest. Such screening can be performed using probes and/or primers described above based on sequences from SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto. The screening can also be performed by selecting clones or Ri plants having a desired phenotype.
  • constructs described in the methods under I.C. above can be used to determine the function of the polypeptide encoded by the gene that is targeted by the constructs.
  • the host cell or organisms such as a plant, may produce phenotypic changes as compared to a wild-type cell or organism.
  • vi tro assays can be used to determine if any biological activity, such as calcium flux, DNA transcription, nucleotide incorporation, etc., are being modulated by the down-regulation of the targeted gene.
  • SDFs of the invention representing transcription activation and DNA binding domains can be assembled into hybrid transcriptional activators. These hybrid transcriptional activators can be used with their corresponding DNA elements (i.e., those bound by the DNA-binding SDFs) to effect coordinated expression of desired genes (J.J. Schwarz et al., Mol . Cell . Biol . 12:266 (1992), A. Martinez et al., Mol . Gen . Genet . 261:546 (1999)).
  • the SDFs of the invention can also be used in the two- hybrid genetic systems to identify networks of protein- protein interactions (L.
  • the SDFs of the invention can also be used in various expression display methods to identify important protein-DNA interactions (e.g. B. Luo et al., J. Mol . Biol . 266: 479 (1997) ) .
  • the SDFs of the invention are also useful as structural or regulatory sequences in a construct for modulating the expression of the corresponding gene in a plant or other organism, e.g. a symbiotic bacterium.
  • promoter sequences represented in SEQ TABLES 1 AND 2 can be useful in directing expression of coding sequences either as constitutive promoters or to direct expression in particular cell types, tissues, or organs or in response to environmental stimuli.
  • promoter refers to a region ofe-3? sequence determinants located upstream or downstream from the start of transcription and which are involved in recognition and binding of RNA polymerase and other proteins to initiate transcription.
  • a "plant promoter” is a promoter capable of initiating transcription in plant cells and can be used to drive expression of a translated portion of an SDF. Such promoters need not be of plant origin.
  • promoters derived from plant viruses such as the CaMV35S promoter or from Agrobacterium tumefaciens such as the T-DNA promoters, can be plant promoters.
  • a typical example of a constitutive promoter of plant origin is the promoter of the cowpea trypsin inhibitor gene.
  • Typical examples of temporal and/or tissue specific promoters of plant origin that can be used with the polynucleotides of the present invention, are: PTA29, a promoter which is capable of driving gene expression specifically in tapetum and only during anther development
  • promoters that have a high preference of driving gene expression in the specified tissue and/or at the specified time during the concerned tissue or organ development.
  • high preference is meant at least 3-fold, preferably 5-fold, more preferably at least 10-fold still more preferably at least 20-fold, 50-fold or 100-fold increase in expression in the desired tissue over the expression in any undesired tissue.
  • a typical example of an inducible promoter which can be utilized with the polynucleotides of the present invention, is PARSKl, the promoter from the Arabidopsis gene encoding a serine-threonine kinase enzyme, and which promoter is induced by dehydration, abscissic acid and sodium chloride (Wang and Goodman, Plant J. 8:37 (1995)).
  • a promoter is likely to be a relatively small portion of a genomic DNA (gDNA) sequence located in the first 2000 nucleotides upstream from an initial exon identified in a gDNA sequence or initial "ATG” or methionine codon in a corresponding cDNA or mRNA sequence.
  • gDNA genomic DNA
  • Such promoters are more likely to be found in the first 1000 nucleotides upstream of an initial ATG or methionine codon of a cDNA sequence corresponding to a gDNA sequence.
  • the promoter is usually located upstream of the transcription start site.
  • Such a start site is located at the first exon predicted in the OCKHAM-cDNA predictions.
  • the transcription start site is the first nucleotide of the 5' most exon, if the predictions are in the plus (+) strand, or the 3' most if the predictions are in the minus (-) strand.
  • Alternative transcription start sites may be located between the first nucleotide of the 5' most exon (or the 3' most exon in the minus (-) strand) and the initial ATG or methionine codon in the cDNA sequence.
  • the portions of a particular gDNA sequence that function as a promoter in a plant cell will preferably be found to hybridize at medium or high stringency to gDNA sequences presented in SEQ TABLES 1 AND 2.
  • Promoters are generally modular in nature. Short DNA sequences representing binding sites for proteins can be separated from each other by intervening sequences of varying length. For example, within a particular functional module protein binding sites may be constituted by regions of 5 to 60, preferably 10 to 30, more preferably 10 to 20 nucleotides. Within such binding sites, there are typically 2 to 6 nucleotides that specifically contact amino acids of the nucleic acid binding protein. The protein binding sites are usually separated from each other by 10 to several hundred nucleotides, typically by 15 to 150 nucleotides, often by 20 to 50 nucleotides. DNA binding sites in promoter elements often display dyad symmetry in their sequence. Often elements binding several different proteins, and/or a plurality of sites that bind the same protein, will be combined in a region of 100 to 1000 basepairs.
  • Elements that have transcription regulatory function can be isolated from their corresponding endogenous gene, or the desired sequence can be synthesized, and recombined in constructs to direct expression of a structural gene in a desired tissue-specific, temporal-specific or other desired manner of inducibility or suppression.
  • hybridizations are performed to identify or isolate elements of a promoter by hybridization to the long sequences presented in SEQ TABLES 1 AND 2, conditions should be adjusted to account for the above- described nature of promoters. For example short probes, constituting the element sought, should be used under low temperature and/or high salt conditions. When long probes, which might include several promoter elements are used, low to medium stringency conditions are preferred when hybridizing to promoters across species.
  • Promoters can consist of a "basal promoter" that functions as a site for assembly of a transcription complex comprising an RNA polymerase, for example RNA polymerase II.
  • a typical transcription complex will include additional factors such as TF ⁇ I B, TFnD, and TF TI E. Of these, TF D appears to be the only one to bind DNA directly.
  • Basal promoters frequently include a "TATA box" element usually located _ ⁇ tween 20 and 35 nucleotides upstream from the site of initiation of transcription.
  • Basal promoters also sometimes include a "CCAAT box” element (typically a sequence CCAAT) and/or a GGGCG sequence, usually located between 40 and 200 nucleotides, preferably 60 to 120 nucleotides, upstream from the start site of transcription.
  • CCAAT box typically a sequence CCAAT
  • GGGCG sequence usually located between 40 and 200 nucleotides, preferably 60 to 120 nucleotides, upstream from the start site of transcription.
  • the promoter might also contain one or more "enhancers” and/or “suppressors” that function as binding sites for additional transcription factors that have the function of modulating the level of transcription with respect to tissue specificity of transcription, transcriptional responses to particular environmental or nutritional factors, and the like.
  • nucleotide sequence of an SDF functions as a promoter or portion of a promoter
  • nucleotide substitutions, insertions or deletions that do not substantially affect the binding of relevant DNA binding proteins would be considered equivalent to the exemplified nucleotide sequence. It is envisioned that there are instances where it is desirable to decrease the binding of relevant DNA binding proteins to "silence" or “down-regulate” a promoter, or conversely to increase the binding of relevant
  • DNA binding proteins to "enhance” or "up-regulate” a promoter.
  • polynucleotides representing changes to the nucleotide sequence of the DNA-protein contact region by insertion of additional nucleotides, changes to identity of relevant nucleotides, including use of chemically-modified bases, or deletion of one or more nucleotides are considered encompassed by the present invention.
  • Promoter function can be assayed by methods known in the art, preferably by measuring activity of a reporter gene operatively linked to the sequence being tested for promoter function. Examples of reporter genes include those encoding luciferase, green fluorescent protein, G.'S, neo, cat and bar.
  • UTR sequences include introns and 5' or 3' untranslated regions ( 5' UTRs or 3' UTRs) .
  • Portions of the sequences shown in SEQ TABLES 1 AND 2 can comprise UTRs and introns or intron/exon junctions.
  • SDFs can have regulatory functions related to, for example, translation rate and mRNA stability.
  • these portions of SDFs can be isolated for use as elements of gene constructs for expression of polynucleotides encoding desired polypeptides.
  • Introns of genomic DNA segments might also have regulatory functions. Sometimes promoter elements, especially transcription enhancer or suppressor elements, are found within introns. Also, elements related to stability of heteronuclear RNA and efficiency of transport to the cytoplasm for translation can be found in intron elements. Thus, these segments can also find use as elements of expression vectors intended for use to transform plants.
  • introns and UTR sequences and intron/exon junctions can vary from those shown in SEQ TABLES
  • Isolated polynucleotides of the invention can include coding sequences that encode polypeptides comprising an amino acid sequence encoded by a sequences in SEQ TABLES 1 AND 2 or an amino acid sequence presented in SEQ TABLES 1 AND 2.
  • a nucleotide sequence "encodes" a polypeptide if a cell
  • an isolated nucleic acid that "encodes" a particular amino acid sequence can be a genomic sequence comprising exons and introns or a cDNA sequence that represents the product of splicing thereof.
  • An isolated nucleic acid "encoding an amino acid sequence” also encompasses heteronuclear RNA, which contains sequences that are spliced out during expression, and mRNA, which lacks those sequences. Coding sequences can be constructed using chemical synthesis techniques or by isolating coding sequences or by modifying such synthesized or isolated coding sequences as described above.
  • the isolated polynucleotides can be variant polynucleotides that encode mutants, fragments, and fusions of those native proteins. Such polypeptides are described below in part II.
  • the number of substitutions, deletions or insertions is preferably less than 20%, more preferably less than 15%; even more preferably less than 10%, 5%, 3% or 1% of the number of nucleotides comprising a particularly exemplified sequence.
  • nucleotide sequence changes that result in 1 to 10, more preferably 1 to 5 and most preferably 1 to 3 amino acid insertions, deletions or substitutions will not greatly affect the function of an encoded polypeptide.
  • the most preferred embodiments are those wherein 1 to 20, preferably 1 to 10, most preferably 1 to 5 nucleotides are added to, deleted from and/or substituted in the sequences specifically disclosed in SEQ TABLES 1 AND 2.
  • Insertions or deletions in polynucleotides intended to be used for encoding a polypeptide should preserve the reading frame. This consideration is not so important in instances when the polynucleotide is intended to be used as a hybridization probe.
  • Polypeptides within the scope of the invention include both native proteins as well as mutants, fragments, and fusions thereof.
  • Polypeptides of the invention are those encoded by any of the six reading frames of sequences shown in SEQ TABLES 1 AND 2, preferably encoded by the three frames reading in the 5' to 3' direction of the sequences as shown.
  • Native polypeptides include the proteins encoded by the sequences shown in SEQ TABLES 1 AND 2. Such native polypeptides include those encoded by allelic variants.
  • Variants including mutants, will exhibit at least 80% sequence identity to those native polypeptides of SEQ TABLES 1
  • sequence identity is used for polypeptides as defined above for polynucleotides. More preferably, the variants will exhibit at least 85% sequence identity; even more preferably, at least 90% sequence identity; more preferably at least 95%, 96%, 97%, 98%, or 99% sequence identity. "Fragments" of polypeptide or "portions" of polypeptides will exhibit similar degrees of identity to the relevant portions of the native polypeptide. Fusions will exhibit similar degrees of identity in that portion of the fusion represented by the variant of the native peptide.
  • variants will exhibit at least one of the functional properties of the native protein.
  • properties include, without limitation, protein interaction, DNA interaction, biological activity, immunological activity, receptor binding, signal transduction, transcription activity, growth factor activity, secondary structure, three-dimensional structure, etc.
  • the variants preferably exhibit at least 60% of the activity of the native protein; more preferably at least 70%, even more preferably at least 80%, 85%, 90% or 95% of at least one activity of the native protein.
  • a type of mutant of the native polypeptides comprises amino acid substitutions.
  • Constant substitutions are preferred to maintain the function or activity of the polypeptide. Such substitutions include conservation of charge, polarity, hydrophobicity, size, etc.
  • one or more amino acid residues within the sequence can be substituted with another amino acid of similar polarity that acts as a functional equivalent, for example providing a hydrogen bond in an enzymatic catalysis.
  • Substitutes for an amino acid within an exemplified sequence are preferably made among the members of the class to which the amino acid belongs.
  • the nonpolar (hydrophobic) amino acids include alanine, leucine, isoleucine, valine, proline, phenylalanine, tryptophan and methionine.
  • the polar neutral amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, and glutamine.
  • the positively charged (basic) amino acids include arginine, lysine and histidine.
  • the negatively charged (acidic) amino acids include aspartic acid and glutamic acid.
  • a polypeptide of the invention may have additional individual amino acids or amino acid sequences inserted into the polypeptide in the middle thereof and/or at the N-terminal and/or C-terminal ends thereof. Likewise, some of the- amino acids or amino acid sequences may be deleted from the polypeptide.
  • Isolated polypeptides can be utilized to produce antibodies.
  • Polypeptides of the invention can generally be used, for example, as antigens for raising antibodies by known techniques.
  • the resulting antibodies are useful as reagents for determining the distribution of the antigen protein within the tissues of a plant or within a cell of a plant.
  • the antibodies are also useful for examining the expression level of proteins in various tissues, for example in a wild-type plant or following genetic manipulation of a plant, by methods such as Western blotting.
  • Antibodies of the present invention may be prepared by conventional methods.
  • the polypeptides of the invention are first used to immunize a suitable animal, such as a mouse, rat, rabbit, or goat. Rabbits and goats are preferred for the preparation of polyclonal sera due to the volume of serum obtainable, and the availability of labeled anti-rabbit and anti-goat antibodies as detection reagents.
  • Immunization is generally performed by mixing or emulsifying the protein in saline, preferably in an adjuvant such as Freund' s complete adjuvant, and injecting the mixture or emulsion parenterally (generally subcutaneously or intramuscularly).
  • Immunization is generally boosted 2-6 weeks later with one or more injections of the protein in saline, preferably using Freund' s incomplete adjuvant.
  • Polyclonal antisera is obtained by bleeding the immunized animal into a glass or plastic container, incubating the blood at 25°C for one hour, followed by incubating the blood at 4°C for 2-18 hours.
  • the serum is recovered by centrifugation (e.g., l,000xg for 10 minutes). About 20-50 ml per bleed may be obtained from rabbits.
  • Monoclonal antibodies are prepared using the method of Kohler and Milstein, Nature 256: 495 (1975), or modification thereof.
  • a mouse or rat is immunized as described above.
  • the spleen (and optinally several large lymph nodes) is removed and dissociated into single cells.
  • the spleen cells can be screened (after removal of nonspecifically adherent cells) by applying a cell suspension to a plate, or well, coated with the protein antigen.
  • B-cells expressing membrane-bound immunoglobulin specific for the antigen bind to the plate, and are not rinsed away with the rest of the suspension.
  • Resulting B-cells, or all dissociated spleen cells are then induced to fuse with myeloma cells to form hybridomas, and are cultured in a selective medium (e.g., hypoxanthine, aminopterin, thymidine medium, "HAT").
  • the resulting hybridomas are plated by limiting dilution, and are assayed for the production of antibodies which bind specifically to the immunizing antigen (and which do not bind to unrelated antigens) .
  • the selected Mab-secreting hybridomas are then cultured either in vi tro ( e . g. , in tissue culture bottles or hollow fiber reactors), or in vivo (as ascites in mice) .
  • the antibodies may be labeled using conventional techniques. Suitable labels include fluorophores, chromophores, radioactive atoms (particularly 32 P and 125 I) , electron-dense reagents, enzymes, and ligands having specific binding partners. Enzymes are typically detected by ther activity. For example, horseradish peroxidase is usually detected by its ability to convert 3, 3' , 5, 5' -tetramethylbenzidine (TNB) to a blue pigment, quantifiable with a spectrophotometer .
  • TAB 3, 3' , 5, 5' -tetramethylbenzidine
  • Specific binding partner refers to a protein capable of binding a ligand molcule with high specificity, as for example in the case of an antigen and a monoclonal antibody specific therefor.
  • Other specific binding partners include biotin and avidin or streptavidin, IgG and protein A, and the numerous receptor- ligand couples known in the art. It should be understood that the above description is not meant to categorize the various labels into distinct modes. For example, 125 I may serve as a radioactive label or as an electron-dense reagent. HRP may serve as an enzyme or as an antigen for a Mab. Further one may combine various labels for desired effect.
  • Mabs and avidin also require labels in the practice of this invention: thus, one might label a Mab with biotin, and detect its presence with avidin labeled with 125 I, or with an anti- biotin Mab labeled with HRP.
  • a Mab with biotin detects its presence with avidin labeled with 125 I, or with an anti- biotin Mab labeled with HRP.
  • soybean trypsin inhibitor (Kunitz) family is one of the numerous families of proteinase inhibitors. It comprises plant proteins which have inhibitory activity against serine proteinases from the trypsin and subtilisin families, thiol proteinases and aspartic proteinases.
  • these peptides find in vi tro use in protein purification protocols and perhaps in therapeutic settings requiring topical application of protease inhibitors.
  • Delta-aminolevulinic acid dehydratase (EC 4.2.1.24) (ALAD) catalyzes the second step in the biosynthesis of heme, the condensation of two molecules of 5-aminolevulinate to form porphobilinogen.
  • ALAD proteins can be used as catalysts in synthesis of heme derivatives.
  • Enzymes of biosynthetic pathways generally can be used as catalysts for in vi tro synthesis of the compounds representing products of the pathway.
  • Polypeptides encoded by SDFs of the invention can be engineered to provide purification reagents to identify and purify additional polypeptides that bind to them. This allows one to identify proteins that function as multimers or elucidate signal transduction or metabolic pathways.
  • polypeptide in the case of DNA binding proteins, can be used in a similar manner to identify the DNA determinants of specific binding (S. Pierrou et al., Anal . Biochem . 229:99 (1995), S. Chusacultanachai et al., J. Biol . Chem . 274:23591 (1999), Q.
  • mutants, fragments, or fusions of the polypeptides encoded by the maximum length seuqence can exhibit at least one of the activities of the identified domains and/or related polypeptides described in Sections (C) and (D) of REF TABLES 1 and 2 corresponding to the MLS of interest .
  • a type of mutant of the native polypeptides comprises amino acid substitutions.
  • Constant substitutions described above (see II.), are preferred to maintain the function or activity of the polypeptide. polypeptide. Such substitutions include conservation of charge, polarity, hydrophobicity, size, etc.
  • one or more amino acid residues within the sequence can be substituted with another amino acid of similar polarity that acts as a functional equivalent, for example providing a hydrogen bond in an enzymatic catalysis.
  • Substitutes for an amino acid within an exemplified sequence are preferably made among the members of the class to which the amino acid belongs.
  • the nonpolar (hydrophobic) amino acids include alanine, leucine, isoleucine, valine, proline, phenylalanine, tryptophan and methionine.
  • the polar neutral amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, and glutamine.
  • the positively charged (basic) amino acids include arginine, lysine and histidine.
  • the negatively charged (acidic) amino acids include aspartic acid and glutamic acid.
  • a polypeptide of the invention may have additional individual amino acids or amino acid sequences inserted into the polypeptide in the middle thereof and/or at the N-terminal and/or C-terminal ends thereof. Likewise, some of the amino acids or amino acid sequences may be deleted from the polypeptide. Amino acid substitutions may also be made in the sequences; conservative substitutions being preferred.
  • mutants are those that comprise (1) the domain of a MLS encoded polypeptide and/or
  • Another class of mutants includes those that comprise a MLS encoded polypeptide sequence that is changed in the domain or conserved residues by a conservative substitution.
  • mutants include those that lack one of the in vi tro activities, or structural features of the MLS encoded polypeptides.
  • One example is dominant negative mutants.
  • Such a mutant may comprise an MLS encoded polypeptide sequence with non-conservative changes in a particular domain or group of conserved residues.
  • Fragments of particular interest are those that comprise a domain identified for a polypeptide encoded by an MLS of the instant invention and mutants thereof. Also, fragments that comprise at least one region of residues conserved between an MLS encoded polypeptide and its related polypeptides are of great interest. Fragments are sometimes useful as dominant negative mutations.
  • FUSIONS Of interest are chimeras comprising (1) a fragment of the MLS encoded polypeptide or mutants thereof of interest and (2) a fragment of a polypeptide comprising the same domain.
  • the present invention also encompasses fusions of MLS encoded polypeptides, mutants, or fragments thereof fused with related proteins or fragments thereof. DEFINITION OF DOMAINS
  • the polypeptides of the invention may possess identifying domains as shown in REF TABLES 1 and 2. Domains are fingerprints or signatures that can be used to characterize protein families and/or motifs. Such fingerprints or signatures can comprise conserved (1) primary sequence, (2) secondary structure, and/or (3) three- dimensional conformation. Generally, each domain has been associated with either a family of proteins or a motif. Typically, these families and/or motifs have been correlated with specific in-vi tro and/or in-vivo activities. A domain can be any length, including the entirety of the sequence of a protein. Detailed descriptions of the domains, associated families and motifs, and correlated activities of the polypeptides of the instant invention are described below. Usually, the polypeptides with designated domain (s) can exhibit at least one activity that is exhibited by any polypeptide that comprises the same domain (s).
  • domains within the MLS encoded polypeptides are indicated by the reference REF TABLES 1 and 2.
  • the domains within the MLS encoded polypeptide can be defined by the region that exhibits at least 70% sequence identity with the consensus sequences listed in the detailed description below of each of the domains. The majority of the protein domain descriptions given below are obtained from Prosite,
  • AAA 220 amino acids that contains anATP-binding site.
  • This family is now called AAA, for 'A'TPases 'A'ssociated with diverse cellular ' A 1 ctivities .
  • the proteins that belong to this family either contain one or two AAA domains. Proteins containing two AAA domains:
  • NSF N-ethylmaleimide-sensitive fusion protein
  • SEC18 the fungal homolog
  • - Mammalian transitional endoplasmic reticulum ATPase (previously known as p97 or VCP) which is involved in the transfer of membranes from the endoplasmic reticulum to the golgi apparatus.
  • This protein forms a ring-shaped homooligomer composed of six subunits.
  • the yeast homolog is CDC48 and it may play a role in spindle pole proliferation.
  • FtsH is an ATP-dependent zinc metallopeptidase that seems to degrade the heat-shock sigma-32 factor. It is an integral membrane protein with a large cytoplasmic C-terminal domain that contain both the AAA and the protease domains .
  • YMEl a protein important for maintaining the integrity of the mitochondrial compartment. YMEl is also a zinc-dependent protease.
  • Yeast protein AFG3 (or YTA10) . This protein also seems to contain a AAA domain followed by a zinc-dependent protease domain.
  • Subunits from the regulatory complex of the 26S proteasome [6] which is involved in the ATP-dependent degradation of ubiquitinated proteins a) Mammalian subunit 4 and homologs in other higher eukaryotes, in yeast (gene YTA5) and fission yeast (gene mts2) . b) Mammalian subunit 6 (TBP7) and homologs in other higher eukaryotes and in yeast (gene YTA2) . c) Mammalian subunit 7 (MSS1) and homologs in other higher eukaryotes and in yeast (gene CIM5 or YTA3) .
  • Mammalian subunit 8 P45
  • homologs in other higher eukaryotes and in yeast SUV1 or CIM3 or TBY1
  • fission yeast gene letl
  • Other probable subunits such as human TBP1 which seems to influences HIV gene expression by interacting with the virus tat transactivator protein and yeast YTA1 and YTA6.
  • Yeast protein BCS1 a mitochondrial protein essential for the expression of the Rieske iron-sulfur protein.
  • - Yeast protein MSP1 a protein involved in intramitochondrial sorting of proteins.
  • - Mouse protein SKD1 and its fission yeast homolog SpAC2G11.06
  • Caenorhabditis elegans meiotic spindle formation protein mei-1 Yeast protein SAP1. - Yeast protein YTA7.
  • AAA domains in these proteins act as ATP-dependent protein clamps [5] .
  • ATP-dependent protein clamps [5] .
  • ATP-binding 'A' and 'B' motifs which are located in the N-terminal half of this domain, there is a highly conserved region located in the central part of the domain which was used to develop a signature pattern.
  • Aminotransferases share certain mechanistic features with other pyridoxal-phosphate dependent enzymes, such as the covalent binding of the pyridoxal-phosphate group to a lysine residue. On the basis of sequence similarity, these various enzymes can be grouped [1,2] into subfamilies.
  • class-IV currently consists of the following enzymes: Branched-chain amino-acid aminotransferase (EC 2.6.1.42) (transaminase B) , a bacterial (gene ilvE) and eukaryotic enzyme which catalyzes the reversible transfer of an amino group from 4-methyl-2- oxopentanoate to glutamate, to form leucine and 2- oxoglutarate.
  • D-alanine aminotransferase (EC 2.6.1.21) .
  • a bacterial enzyme which catalyzes the transfer of the amino group from D-alanine (and other D-amino acids) to 2- oxoglutarate, to form pyruvate and D-aspartate.
  • ADC 4-amino-4-deoxychorismate
  • gene pabC gene pabC
  • the above enzymes are proteins of about 270 to 415 amino-acid residues that share a few regions of sequence similarity. Surprisingly, the best-conserved region does not include the lysine residue to which the pyridoxal- phosphategroup is known to be attached, in ilvE. The region that has been selected as a signature pattern is located some
  • the bacterial mutT protein is involved in the GO system [1] responsible for removing an oxidatively damaged form of guanine (8-hydroxyguanine or7 , 8-dihydro-8-oxoguanine) from DNA and the nucleotide pool.
  • 8-oxo-dGTP is inserted opposite to dA and dC residues of template DNA with almost equal efficiency thus leading to A.T to G.C transversions .
  • MutT specifically degrades 8-oxo-dGTP to the monophosphate with the concomitant release of pyrophosphate.
  • MutT is a small protein of about 12 to 15 Kd. It has been shown [2,3] that a region of about 40 amino acid residues, which is found in the
  • N-terminal part of mutT can also be found in a variety of other prokaryotic, viral, and eukaryotic proteins. These proteins are:
  • Streptomyces pneumoniae mutX Streptomyces pneumoniae mutX.
  • Bartonella bacilliformis invasion protein A (gene invA) .
  • Escherichia coli dATP pyrophosphohydrolase Escherichia coli dATP pyrophosphohydrolase .
  • Protein D250 from African swine fever viruses Protein D250 from African swine fever viruses.
  • Proteins D9 and D10 from a variety of poxviruses are proteins D9 and D10 from a variety of poxviruses.
  • Mammalian diadenosine 5 ' , 5 ' ' ' -Pi, P4-tetraphosphate asymmetrical hydrolase (Ap4Aase) (EC 3.6.1.17) [5], which cleaves A-5'-PPPP-5'A to yield AMP and ATP.
  • Yeast protein YSA1 Yeast protein YSA1.
  • Escherichia coli hypothetical protein yfaO Escherichia coli hypothetical protein yfaO.
  • Escherichia coli hypothetical protein ygdU and HI0901 the corresponding Haemophilus influenzae protein.
  • Escherichia coli hypothetical protein yjaD and HI0432 the corresponding Haemophilus influenzae protein.
  • Escherichia coli hypothetical protein yrfE Escherichia coli hypothetical protein yrfE.
  • Bacillus subtilis hypothetical protein yqkG Bacillus subtilis hypothetical protein yqkG.
  • the conserved domain could be involved in the active center of a family of pyrophosphate- releasing NTPases. As a signature pattern the core region of the domain was selected; it contains four conserved glutamate residues .
  • Type 1 cystatins (or stefins) , molecules of about 100 amino acid residues with neither disulfide bonds nor carbohydrate groups.
  • Type 2 cystatins molecules of about 115 amino acid residues which contain one or two disulfide loops near their C-terminus.
  • Kininogens which are multifunctional plasma glycoproteins .
  • bradykinins They are the precursor of the active peptide bradykinin and play a role in blood coagulation by helping to position optimally prekallikrein and factor XI next to factor XII.
  • kininogens are made of three contiguous type-2 cystatin domains, followed by an additional domain (of variable length) which contains the sequence of bradykinin. The first of the three cystatin domains seems to have lost its inhibitory activity.
  • a number of proteins are produced by plants that experience water-stress. Water-stress takes place when the water available to a plant falls below a critical level. The plant hormone abscisic acid (ABA) appears to modulate the response of plant to water-stress. Proteins that are expressed during water-stress are called dehydrins [1,2] or LEA group 2 proteins [3] . The proteins that belong to this family are listed below. - Arabidopsis thaliana XERO 1, XERO 2 (LTI30), RAB18,
  • Dehydrins share a number of structural features. One of the most notable features is the presence, in their central region, of a continuous run of five to nine serines followed by a cluster of charged residues. Such a region has been found in all known dehydrins so far with the exception of pea dehydrins.
  • a second conserved feature is the presence of two copies of alysine-rich octapeptide; the first copy is located just after the cluster of charged residues that follows the poly-serine region and the second copy is found at the C-terminal extremity. Signature patterns for both regions were derived.
  • Consensus pattern S (5) - [DE] -x- [DE] -G-x (1, 2) -G-x (0, 1) - [KR] (4 Consensus pattern: [KR] - [LIM] -K- [DE] -K- [LIM] -P-G-
  • This Pfam covers the Formate dehydrogenase, D-glycerate dehydrogenase and D-lactate dehydrogenase families in SCOP.
  • a number of NAD-dependent 2-hydroxyacid dehydrogenases which seem to be specific for the D-isomer of their substrate have been shown [1,2,3,4] to be functionally and structurally related. These enzymes are listed below.
  • D-lactate dehydrogenase (EC 1.1.1.28), a bacterial enzyme which catalyzes the reduction of D-lactate to pyruvate.
  • 3-phosphoglycerate dehydrogenase (EC 1.1.1.95), a bacterial enzyme that catalyzes the oxidation of D-3- phosphoglycerate to 3-phosphohydroxypyruvate .
  • This reaction is the first committed step in the 1 phosphorylated ' pathway of serine biosynthesis.
  • Erythronate-4-phosphate dehydrogenase (EC 1.1.1.-) (gene pdxB) , a bacterial enzyme involved in the biosynthesis of pyridoxine (vitamin B6) .
  • D-2-hydroxyisocaproate dehydrogenase (EC 1.1.1.-) (D- hicDH) , a bacterial enzyme that catalyzes the reversible and stereospecific interconversion between 2-ketocarboxylic acids and D-2-hydroxy-carboxylic acids .
  • Formate dehydrogenase (EC 1.2.1.2) (FDH) from the bacteria Pseudomonas sp. 101 and various fungi [5] .
  • Escherichia coli hypothetical protein ycdW (EC 1.2.1.2)
  • Escherichia coli hypothetical protein yiaE Escherichia coli hypothetical protein yiaE.
  • Haemophilus influenzae hypothetical protein HI1556 Haemophilus influenzae hypothetical protein HI1556.
  • Yeast hypothetical protein YIL074w All these enzymes have similar enzymatic activities and are structurally related. Three of the most conserved regions of these proteins have been selected to develop patterns. The first pattern is based on a glycine-rich region located in the central section of these enzymes; this region probably corresponds to the NAD-binding domain. The two other patterns contain a number of conserved charged residues, some of which may play a role in the catalytic mechanism.
  • the prokaryotic heat shock protein dnaJ interacts with the chaperone hsp70-like dnaK protein [1].
  • the dnaJ protein consists of an N- terminal conserved domain (called 'J' domain) of about 70 amino acids, a glycine-rich region ('G' domain') of about 30 residues, a central domain containing four repeats of a CXXCXGXG motif ('CRR' domain) and a C-terminal region of 120 to 170 residues.
  • 'J' domain conserved domain
  • 'G' domain' glycine-rich region
  • 'CRR' domain central domain containing four repeats of a CXXCXGXG motif
  • C-terminal region 120 to 170 residues.
  • Yeast protein MDJ1 involved in mitochondrial biogenesis and protein folding.
  • Yeast protein SCJ1 involved in protein sorting.
  • Yeast protein XDJl involved in protein sorting.
  • Plants dnaJ homologs (from leek and cucumber) .
  • Yeast hypothetical protein YNL077w b) Proteins containing a 'J' domain without a 'CRR' domain: Rhizobium fredii nolC, a protein involved in cultivar- specific nodulation of soybean.
  • Escherichia coli cbpA [3] a protein that binds curved
  • DNA - Yeast protein SEC63/NPL1, important for protein assembly into the endoplasmic reticulum and the nucleus .
  • Yeast protein SIS1 required for nuclear migration during mitosis.
  • RESA Plasmodium falciparum ring-infected erythrocyte surface antigen
  • Human HSJ1 a neuronal protein.
  • csp Drosophila cysteine-string protein
  • a signature pattern for the 'J' domain was developed, based on conserved positions in the C-terminal half of this domain.
  • a pattern for the 'CRR' domain based on the first two copies of that motif was also developed.
  • a profile for the 'J' domain was also developed.
  • Gamma-thionins from wheat endosperm gamma- purothionins
  • barley gamma- hordothionins
  • FST flower-specific thionin
  • AFP Antifungal proteins from the seeds of Brassicaceae species such as radish, mustard, turnip and Arabidopsis thaliana [3] .
  • SF18 is a protein that contains a gamma-thionin domain at its N-terminus and a proline-rich C- terminal domain.
  • Vicia faba antibacterial peptides fabatin-1 and -2 In their mature form, these proteins generally consist of about 45 to 50amino-acid residues. As shown in the following schematic representation, these peptides contain eight conserved cysteines involved in disulfide bonds.
  • This family is structurally different from the alpha/ beta hydrolase family (abhydrolase) .
  • This family includes L- 2-haloacid dehalogenase, epoxide hydrolases and phosphatases.
  • the structure of the family consists of two domains. One is an inserted four helix bundle, which is the least well conserved region of the alignment, between residues 16 and 96 of Swiss : P24069. The rest of the fold is composed of the core alpha/beta domain.
  • Helix-turn-helix This large family of DNA binding helix-turn helix proteins includes Cro Swiss: P03036 and CI Swiss: P03034.
  • Cytochrome b5 is a membrane-bound hemo protein which acts as an electron carrier for several membrane-bound oxygenases [1]. There are two homologous forms of b5, one found in microsomes and one found in the outer membrane of mitochondria. Two conserved histidine residues serve as axial ligands for the heme group.
  • the structure of a number of oxidoreductases consists of the juxtaposition of a heme- binding domain homologous to that of b5 and either a flavodehydrogenase or a molybdopterin domain. These enzymes are:
  • Lactate dehydrogenase (EC 1.1.2.3) [2], an enzyme that consists of a flavodehydrogenase domain and a heme- binding domain called cytochrome b2.
  • Nitrate reductase (EC 1.6.6.1) , a key enzyme involved in the first step of nitrate assimilation in plants, fungi and bacteria [3,4].
  • Consists of a molybdopterin domain see ⁇ PDOC00484>
  • cytochrome b557 a heme-binding domain called cytochrome b557
  • cytochrome reductase domain Consists of a molybdopterin domain (see ⁇ PDOC00484>) , a heme-binding domain called cytochrome b557, as well as a cytochrome reductase domain.
  • Fission yeast hypothetical protein SpAClF12.10c Yeast hypothetical protein YMR073c. Yeast hypothetical protein YMR272c.
  • a segment was used which includes the first of the two histidine heme ligands, as a signature pattern for the heme- binding domain of cytochrome b5 family.
  • KH domain KH motifs probably bind RNA directly. Auto antibodies to
  • MAPEG family (aka: FLAP/GST2/LTC4S family signature)
  • mammalian proteins are evolutionary related [1] : - Leukotriene C4 synthase (EC 2.5.1.37) (gene LTC4S) , an enzyme that catalyzes the production of LTC4 from LTA4.
  • Microsomal glutathione S-transferase II (EC 2.5.1.18)
  • GST-II gene GST2
  • 5-lipoxygenase activating protein (gene FLAP) , a protein that seems to be required for the activation of 5-lipoxygenase.
  • proteins of 150 to 160 residues that contain three transmembrane segments that contain three transmembrane segments.
  • a conserved region between the first and second transmembrane domains was selected.
  • Bet v I the major pollen allergen from white birch.
  • Bet v I is the main cause of type I allergic reactions in Europe, North America and USSR.
  • Aln g I the major pollen allergen from alder.
  • Api G I the major allergen from celery.
  • Car b I the major pollen allergen from hornbeam.
  • Mai d I the major pollen allergen from apple.
  • Pea disease resistance response proteins pI49, pI176 and DRRG49-C Pea disease resistance response proteins pI49, pI176 and DRRG49-C.
  • Soybean stress-induced protein SAM22 Soybean stress-induced protein SAM22.
  • proteins are thought to be intracellularly located. They contain from 155 to 160 amino acid residues. As a signature pattern, a conserved region located in the third quarter of these proteins has been selected
  • Photosystem I psaG / psaK (PSI PSAK) proteins signature Photosystem I (PSI) [1] is an integral membrane protein complex that uses light energy to mediate electron transfer from plastocyanin to ferredoxin. It is found in the chloroplasts of plants and cyanobacteria . PSI is composed of at least 14 different subunits, two of which PSI-G (gene psaG) and PSI-K (gene psaK) are small hydrophobic proteins of about 7 to 9 Kd and evolutionary related [2] . Both seem to contain two transmembrane regions. Cyanobacteria seem to encode only for PSI-K.
  • the best-conserved region was selected which seems to correspond to the second transmembrane region.
  • Plant cells contain proteins, called lipid transfer proteins (LTP) [1,2,3], which are able to facilitate the transfer of phospholipids and other lipidsacross membranes. These proteins, whose subcellular location is not yet known, could play a major role in membrane biogenesis by conveying phospholipids such as waxes or cutin from their site of biosynthesis to membranes unable to form these lipids.
  • Plant LTP's are proteins of about 9 Kd (90 amino acids) which contain eight conserved cysteine residues all involved in disulfide bridges, as shown in the following schematic representation.
  • 'C conserved cysteine involved in a disulfide bond.
  • '*' position of the pattern.
  • Ribosomal protein S7e signature A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities [1]. One of these families consists of:
  • Ribosomal protein L34 is one of the proteins from the large subunit of the prokaryotic ribosome. It is a small basic protein of 44 to 51 amino-acid residues [1] . L34 belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups: - Eubacterial L34. - Red algal chloroplast L34. - Cyanelle L34. A conserved region that corresponds to the N-terminal half of L34 has been selected as a signature pattern. -Consensus pattern: K- [RG] -T- [FYWL] - [EQS] -x (5) - [KRHS] -x (4 , 5) - G-F-x(2)-R
  • Ribosomal protein L6 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L6 is known to bind directly to the 23S rRNA and is located at the aminoacyl-tRNA binding site of the peptidyltransferase center. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1,2,3,4], groups: -
  • Algal chloroplast L6 Algal chloroplast L6. - Cyanelle L6.
  • Marchantia polymorpha mitochondrial L6 Yeast mitochondrial YmL ⁇ (gene MRPL6) . Mammalian L9. - Drosophila L9.
  • Ribosomal protein S14 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S14 is known to be required for the assembly of 30S particles and may also be responsible for determining the conformation of 16S rRNA at the A site. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1,2], groups:
  • Archaebacterial Methanococcus vannielii S14 Plant mitochondrial S14.
  • Yeast mitochondrial MRP2. Mammalian S29.
  • S14 is a protein of 53 to 115 amino-acid residues. Our signature pattern is based on the few conserved positions located in the center of these proteins.
  • Ribosomal protein S16 signature Ribosomal protein S16 is one of the proteins from the small ribosomal subunit. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1], groups: Eubacterial S16. Algal and plant chloroplast S16.
  • Neurospora crassa mitochondrial S24 (cyt-21) .
  • S16 is a protein of about 100 amino-acid residues. A conserved region located in the N-terminal extremity of these proteins has been selected as a signature pattern.
  • Ribosomal protein S21 is one of the proteins from the small ribosomal subunit. So far S21 has only been found in eubacteria. It is a protein of 55 to 70 amino-acid residues. A conserved region in the N-terminal section of the protein has been selected as a signature pattern.
  • heterologous sequences are those that are not operatively linked or are not contiguous to each other in nature.
  • a promoter from corn is considered heterologous to an Arabidopsis coding region sequence.
  • a promoter from a gene encoding a growth factor from corn is considered heterologous to a sequence encoding the corn receptor for the growth factor.
  • Regulatory element sequences such as UTRs or 3' end termination sequences that do not originate in nature from the same gene as the coding sequence originates from, are considered heterologous to said coding sequence.
  • chimeric polynucleotides are of particular interest for modulating gene expression in a host cell upon transformation of said cell with said chimeric polynucleotide.
  • DNA molecules are useful for transforming the genome of a host cell or an organism regenerated from said host cell.
  • Such polynucleotides are "exogenous to" the genome of an individual host cell or the organism regenerated fro ⁇ m said host cell, such as a plant cell, respectively for a plant, when initially or subsequently introduced into said host cell or organism, by any means other than by a sexual cross. Examples of means by which this can be accomplished are described below, and include Agrobacterium- mediated transformation (of dicots - e.g. Salomon et al . EMBO
  • Transgenic plants which arise from a sexual cross with another parent line or by selfing are "descendants or the progeny" of a R_ plant and are generally called F n plants or S n plants, respectively, n meaning the number of generations.
  • the SDFs prepared as described herein can be used to prepare expression cassettes useful in a number of techniques for suppressing or enhancing expression.
  • Expression cassettes of the invention can be used to suppress expression of endogenous genes which comprise the SDF sequence. Inhibiting expression can be useful, for instance, to tailor the ripening characteristics of a fruit (Oeller et al., Science 254:437 (1991)) or to influence seed size__(WO98/07842) or or to provoke cell ablation (Mariani et al., Nature 357: 384-387 (1992).
  • An expression cassette as described above can be transformed into host cell or plant to produce an antisense strand of RNA.
  • antisense RNA inhibits gene expression by preventing the accumulation of mRNA which encodes the enzyme of interest, see, e.g., Sheehy et al., Proc . Na t . Acad. Sci . USA, 85:8805 (1988), and Hiatt et al., U.S. Patent No. 4,801,340.
  • Ribozymes Similarly, ribozyme constructs can be transformed into a plant to cleave mRNA and down-regulate translation.
  • Another method of suppression is by introducing an exogenous copy of the gene to be suppressed.
  • Introduction of expression cassettes in which a nucleic acid is configured in the sense orientation with respect to the promoter has been shown to be an effective means by which to block the transcription of target genes. A detailed description of this method is described above.
  • Such screening can be performed using probes and/or primers described above based on sequences from SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto.
  • the screening can also be performed by selecting clones or Ri plants having a desired phenotype.
  • triple helices can be formed using oligonucleotides based on sequences from SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto.
  • the oligonucleotide can be delivered to the host cell can bind to the promoter in the genome to form a triple helix and prevent transcription.
  • a vector capable of producing the oligonucleotide can be inserted into the host cell to deliver the oligonucleotide.
  • A.6 Expression of Mutants
  • Dominant negative mutations produce a mutant polypeptide which is capable of competing with the native polypeptide, but which does not produce the native result. Consequently, over expression of these mutations can titrate out an undesired activity of the native protein.
  • the inactive dominant-negative mutant may bind to the same receptor as the native protein, preventing the native protein from activating a signal transduction pathway.
  • the dominant- negative mutant can be an inactive enzyme still capable of binding to the same substrate as the native protein.
  • Dominant-negative mutants also can act upon the native protein itself to prevent activity.
  • the native protein may be active only as a homo-multimer or as one subunit of a hetero-multimer . Incorporation of an inactive subunit into the multimer with native subunit (s) can inhibit activity.
  • gene function can be modulated by insertion of an expression construct encoding a dominant-negative mutant into a host cell of interest.
  • III.B Enhanced Expression
  • Enhanced expression of a gene of interest in a host cell can be accomplished by either (1) insertion of an exogenous gene; or (2) promoter modulation.
  • Insertion of an expression construct encoding an exogenous gene can boost the number of gene copies expressed in a host cell.
  • Such expression constructs can comprise genes that either encode the native protein that is of interest or that encode a variant that exhibits enhanced activity as compared to the native protein.
  • genes encoding proteins of interest can be constructed from the sequences from SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto.
  • Such an exogenous gene can include either a constitutive promoter permitting expression in any cell in a host organism or a promoter that directs expression only in particular cells or times during a host cell life cycle or in response to environmental stimuli.
  • a constitutive promoter permitting expression in any cell in a host organism
  • a promoter that directs expression only in particular cells or times during a host cell life cycle or in response to environmental stimuli.
  • promoters require binding of a regulatory protein to be activated.
  • Other promoters may need a protein that signals a promoter binding protein to expose a polymerase binding site. In either case, over-expression of such proteins can be used to enhance expression of a gene of interest by increasing the activation time of the promoter.
  • Such regulatory proteins are encoded by some of the sequences in SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequences thereto.
  • Coding sequences for these proteins can be constructed as described above.
  • the useful enhancer elements can be portions of one or more of the SDFs of SEQ TABLES 1 AND 2.
  • recombinant DNA vectors which comprise said SDFs and are suitable for transformation of cells, such as plant cells, are usually prepared.
  • the vector backbone can be any of those typical in the art such as plasmids, viruses, artificial chromosomes, BACs, YACs and PACs and vectors of the sort described by **.
  • a vector will comprise the exogenous gene, which in its turn comprises an SDF of the present invention to be introduced into the genome of a host cell, and which gene may be an antisense construct, a ribozyme construct, or a structural coding sequence with any desired transcriptional and/or translational regulatory sequences, such as promoters and 3' end termination sequences.
  • Vectors of the invention can also include origins of replication, markers, homologous sequences, introns, etc.
  • a DNA sequence coding for the desired polypeptide will preferably be combined with transcriptional and translational initiation regulatory sequences which will direct the transcription of the sequence from the gene in the intended tissues of the transformed plant.
  • a plant promoter fragment may be employed that will direct expression of the gene in all tissues of a regenerated plant. Such promoters are referred to herein as "constitutive" promoters and are active under most environmental conditions and states of development or cell differentiation.
  • constitutive promoters examples include the cauliflower mosaic virus (CaMV) 35S transcription initiation region, the 1' or 2' promoter derived from T-DNA of Agrobacterium tumefaciens , and other transcription initiation regions from various plant genes known to those of skill.
  • the plant promoter may direct expression of an SDF of the invention in a specific tissue (tissue-specific promoters) or may be otherwise under more precise environmental control (inducible promoters) .
  • tissue-specific promoters under developmental control include promoters that initiate transcription only in certain tissues, such as root, ovule, fruit, seeds, or flowers. The promoter from a LEC1 gene, described in copending application U.S. Ser. No.
  • 09/103,4708 is particularly useful for directing gene expression so that a desired gene product is located in embryos or seeds.
  • suitable promoters include those from genes encoding storage proteins or the lipid body membrane protein, oleosin. A few root-specific promoters are noted above. Examples of environmental conditions that may affect transcription by inducible promoters include anaerobic conditions, elevated temperature, or the presence of light.
  • polyadenylation region at the 3 ' -end of the coding region should be included.
  • the polyadenylation region can be derived from the natural gene, from a variety of other plant genes, or from T-DNA.
  • the vector comprising the sequences (e. g. , promoters or coding regions) from genes of the invention will typically comprise a marker gene that confers a selectable phenotype on plant cells.
  • the marker may encode biocide resistance, particularly antibiotic resistance, such as resistance to kanamycin, G418, bleomycin, hygromycin, or herbicide resistance, such as resistance to chlorosulfuron or phosphinotricin.
  • sequence in the transformation vector and to be introduced into the genome of the host cell does not need to be absolutely identical to an SDF of the present invention. Also, it is not necessary for it to be full length, relative to either the primary transcription product or fully processed mRNA. Use of sequences shorter than full-length may be preferred to avoid concurrent production of some plants that are overexpressors. Furthermore, the introduced sequence need not have the same intron or exon pattern as a native gene. Also, heterologous non-coding segments can be incorporated into the coding sequence without changing the desired amino acid sequence of the polypeptide to be produced.
  • introducing an exogenous SDF from the same species or an orthologous SDF from another species can modulate the expression of a native gene corresponding to that SDF of interest.
  • Such an SDF construct can be under the control of either a constitutive promoter (e . g. , the promoter of the 35S gene of the cauliflower mosaic virus or the promotor of the gene encoding the cowpea trypsin inhibitor) or a highly regulated inducible promoter (e.g., a copper inducible promoter) .
  • the promoter of interest can initially be either endogenous or heterologous to the species in question. When re-introduced into the genome of said species, such promoter becomes "exogenous" to said species.
  • the promoter-SDF construct can be made using standard recombinant DNA techniques (Sambrook et al. 1989) and can be introduced to the species of interest by ⁇ grojacterium-mediated transformation or by other means of transformation (e.g., particle gun bombardment) as referenced above.
  • Over- expression of an SDF transgene can lead to co-suppression of the homologous gene thereby creating some alterations in the phenotypes of the transformed species as demonstrated by similar analysis of the chalcone synthase gene (Napoli et al., Plan t Cell 2:279 (1990) and van der Krol et al., Plant Cell 2:291 (1990)).
  • an SDF is found to encode a protein with desirable characteristics, its over-expression can be controlled so that its accumulation can be manipulated in an organ- or tissue-specific manner utilizing a promoter having such specificity.
  • a promoter of an SDF or an SDF that includes a promoter
  • tissue-specific or developmentally regulated such a promoter can be utilized to drive the expression of a specific gene of interest (e.g., seed storage protein or root-specific protein) .
  • a specific gene of interest e.g., seed storage protein or root-specific protein
  • the protein encoded by an introduced exogenous or orthologous SDF may be targeted (1) to a particular organelle, (2) to interact with a particular molecule or (3) for secretion outside of the cell harboring the introduced SDF. This will be accomplished using a signal peptide.
  • Signal peptides direct protein targeting, are involved in ligand-receptor interactions and act in cell to cell communication. Many proteins, especially soluble proteins, contain a signal peptide that targets the protein to one of several different intracellular compartments. In plants, these compartments include, but are not limited to, the endoplasmic reticulum (ER) , mitochondria, plastids (such as chloroplasts) , the vacuole, the Golgi apparatus, protein storage vessicles (PSV) and, in general, membranes .
  • ER endoplasmic reticulum
  • mitochondria mitochondria
  • plastids such as chloroplasts
  • the vacuole the Golgi apparatus
  • PSV protein storage vessicles
  • signal peptide sequences are conserved, such as the Asn-Pro- Ile-Arg amino acid motif found in the N-terminal propeptide signal that targets proteins to the vacuole (Marty (1999) The Plant Cell 11: 587-599).
  • Other signal peptides do not have a consensus sequence per se, but are largely composed of hydrophobic amino acids, such as those signal peptides targeting proteins to the ER (Vitale and Denecke (1999) The Plant Cell 11: 615-628). Still others do not appear to contain either a consensus sequence or an identified common secondary sequence, for instance the chloroplast stromal targeting signal peptides (Keegstra and Cline (1999) The Plant Cell 11: 557-570).
  • targeting peptides are bipartite, directing proteins first to an organelle and then to a membrane within the organelle (e.g. within the thylakoid lumen of the chloroplast; see Keegstra and Cline (1999) The Plant Cell 11: 557-570).
  • a membrane within the organelle e.g. within the thylakoid lumen of the chloroplast; see Keegstra and Cline (1999) The Plant Cell 11: 557-570.
  • placement of the signal peptide is also varied. Proteins destined for the vacuole, for example, have targeting signal peptides found at the N-terminus, at the C-terminus and at a surface location in mature, folded proteins.
  • Signal peptides also serve as ligands for some receptors. Perhaps the best known example of this is the interaction of the ER targeting signal peptide with the signal recognition particle (SRP) .
  • SRP signal recognition particle
  • the SRP binds to the signal peptide, halting translation, and the resulting signal peptide
  • SRP complex then binds to docking proteins located on the surface of the ER, prompting the transfer of the protein into the ER.
  • signal proteins can be used to more tightly control the expression of introduced SDFs.
  • associating the appropriate signal sequence with a specific SDF can allow sequestering of the protein in specific organelles (plastids, as an example) , secretion outside of the cell, targeting interaction with particular receptors, etc.
  • the inclusion of signal proteins in constructs involving the SDFs of the invention increases the range of manipulation of SDF expression.
  • constructs are made with the nucleotide sequence of a known signal peptide immediately 5' to the initiation of the coding region of an SDF so that the signal peptide is translated in frame with the coding region and immediately precedes it.
  • the nucleotide sequence of the signal peptide can be isolated from characterized genes using common molecular biological techniques or can be synthesized in vitro.
  • DNA constructs of the invention may be introduced into the genome of the desired plant host by a variety of conventional techniques.
  • the DNA construct may be introduced directly into the genomic DNA of the plant cell using techniques such as electroporation and microinjection of plant cell protoplasts, or the DNA constructs can be introduced directly to plant tissue using ballistic methods, such as DNA particle bombardment.
  • the DNA constructs may be combined with suitable T-DNA flanking regions and introduced into a conventional Agrobacterium tumefaciens host vector. The virulence functions of the Agrobacterium tumefaciens host will direct the insertion of the construct and adjacent marker into the plant cell DNA when the cell is infected by the bacteria
  • Electroporation techniques are described in Fromm et al. Proc. Na tl Acad. Sci . USA 82:5824 (1985). Ballistic transformation techniques are described in Klein et al . Nature
  • Transformed plant cells which are derived by any of the above transformation techniques can be cultured to regenerate a whole plant that possesses the transformed genotype and thus the desired phenotype such as seedlessness.
  • Such regeneration techniques rely on manipulation of certain phytohormones in a tissue culture growth medium, typically relying on a biocide and/or herbicide marker which has been introduced together with the desired nucleotide sequences.
  • Plant regeneration from cultured protoplasts is described in Evans et al., Protoplasts Isola tion and Cul ture in "Handbook of Plant Cell Culture," pp. 124-176, MacMillan Publishing Company, New York, 1983; and Binding, Regenera tion of Plants, Plant Protoplasts, pp.
  • Regeneration can also be obtained from plant callus, explants, organs, or parts thereof. Such regeneration techniques are described generally in Klee et al. Ann. Rev. of Plant Phys . 38:467 (1987). Regeneration of monocots (rice) is described by Hosoyama et al. ( Biosci . Biotechnol . Biochem . 58:1500 (1994)) and by Ghosh et al. ( J. Biotechnol . 32:1 (1994)).
  • the nucleic acids of the invention can be used to confer desired traits on essentially any plant.
  • the invention has use over a broad range of plants, including species from the genera Asparagus , Atropa , Avena ,
  • One of skill will recognize that after the expression cassette is stably incorporated in transgenic plants and confirmed to be operable, it can be introduced into other plants by sexual crossing. Any of a number of standard breeding techniques can be used, depending upon the species to be crossed.
  • the SDFs of the invention can be used in Southern hybridizations as described above.
  • the following describes extraction of DNA from nuclei of plant cells, digestion of the nuclear DNA and separation by length, transfer of the separated fragments to membranes, preparation of probes for hybridization, hybridization and detection of the hybridized probe.
  • Moderate stringency hybridization conditions as defined above, are described in the present example. These conditions result in detection of hybridization between sequences having at least 70% sequence identity. As described above, the hybridization and wash conditions can be changed to reflect the desired degree of sequence identity between probe and target sequences that can be detected.
  • a probe for t-ke hybridization is produced from two PCR reactions using two primers from genomic sequence of Arabidopsis thaliana .
  • the particular template for generating the probe can be any desired template.
  • the first PCR product is assessed to validate the size of the primer to assure it is of the expected size. Then the product of the first PCR is used as a template, with the same pair of primers used in the first PCR, in a second PCR that produces a labeled product used as the probe.
  • Fragments detected by hybridization, or other bands of interest can be isolated from gels used to separate genomic DNA fragments by known methods for further purification and/or characterization.
  • IxHB per blender Be sure that you use 5-10 ml of HB buffer per gram of tissue. Blenders generate heat so be sure to keep the homogenate cold. It is necessary to put the blenders- in ice periodically.
  • the first filtration is through a 250-micron membrane; the second is through an 85-micron membrane; the third is through a 50-micron membrane; and the fourth is through a 20-micron membrane.
  • the pellet Discard the dark green supernatant.
  • the pellet will have several layers to it.
  • the nuclei are gray and soft. In the early steps, there may be a dark green and somewhat viscous layer of chloroplasts .
  • the DNA band should be visible in room light; otherwise, use a long wave UV light to locate the band.
  • yeast DNA Precipitate the DNA by adding 3 volumes of 100% ethanol. Incubate at -20 ° C for at least two hours. Yeast DNA can be purchased and made up at the necessary concentration, therefore no precipitation is necessary for yeast DNA.
  • the digested DNA samples are electrophoresed in 1% agarose gels in lx TPE buffer. Low voltage; overnight separations are preferred. The gels are stained with EtBr and photographed.
  • a nylon membrane is prepared by soaking it in water for at least 5 min, then in 6X SSC for at least 15 min. before use. (20x SSC is 175.3 g NaCl, 88.2 g sodium citrate per liter, adjusted to pH 7.0.)
  • the nylon membrane is placed on top of the gel and all bubbles in between are removed.
  • the DNA is blotted from the gel to the membrane using an absorbent medium, such as paper toweling and 6x SCC buffer. After the transfer, the membrane may be lightly brushed with a gloved hand to remove any agarose sticking to the surface .
  • the DNA is then fixed to the membrane by UV crosslinking and baking at 80 ° C.
  • the membrane is stored at 4°C until use.
  • the template DNA is amplified using a Perkin Elmer 9700 PCR machine:
  • Arabidopsis DNA is used in the present experiment, but the procedure is a general one. The procedure can be adapted to a multi-well format if necessary.
  • the product of the PCR is analyzed by electrophoresis in a 1% agarose gel.
  • a linearized plasmid DNA can be used as a quantification standard (usually at 50, 100, 200, and 400 ng) . These will be used as references to approximate the amount of PCR products.
  • Hindlll-digested Lambda DNA is useful as a molecular weight marker.
  • the gel can be run fairly quickly; e.g., at 100 volts.
  • the standard gel is examined to determine that the size of the PCR products is consistent with the expected size and if there are significant extra bands or smeary products in the PCR reactions.
  • the amounts of PCR products can be estimated on the basis of the plasmid standard.
  • a small amount of DNA from bands with the correct size can be isolated by dipping a sterile 10- ⁇ l tip into the band while viewing though a UV Transilluminator.
  • the small amount of agarose gel (with the DNA fragment) is used in the labeling reaction.
  • 10X dNTP + DIG-11-dUTP [1:5]: (2 mM dATP, 2 mM dCTP, 2 mM dGTP, 1.65 mM dTTP, 0.35 mM DIG-11-dUTP) 10X dNTP + DIG-11-dUTP [1:10]: (2 mM dATP, 2 mM dCTP, 2 mM dGTP, 1.81 mM dTTP, 0.19 mM DIG-11-dUTP)
  • TE buffer (10 mM Tris, 1 mM EDTA, pH 8)
  • Maleate buffer In 700 ml of deionized distilled water, dissolve 11.61 g maleic acid and 8.77 g NaCl. Add NaOH to adjust the pH to 7.5. Bring the volume to 1 L. Stir for 15 min. and sterilize.
  • 10% blocking solution In 80 ml deionized distilled water, dissolve l.l ⁇ g maleic acid. Next, add NaOH to adjust the pH to 7.5. Add 10 g of the blocking reagent powder (Boehringer Mannheim, Indianapolis, IN, Cat. no. 1096176) . Heat to 60 ° C while stirring to dissolve the powder. Adjust the volume to 100 ml with water. Stir and sterilize.
  • 1% blocking solution Dilute the 10% stock to 1% using the maleate buffer.
  • Buffer 3 (100 mM Tris, 100 mM NaCl, 50 mM MgCl 2 , pH9.5). Prepared from autoclaved solutions of IM Tris pH 9.5, 5 M NaCl, and 1 M MgCl 2 in autoclaved distilled water. Procedure :
  • PCR reactions are performed in 25 ⁇ l volumes containing:
  • the PCR reaction uses the following amplification cycles:
  • the products are analyzed by electrophoresis- in a 1% agarose gel, comparing to an aliquot of the unlabelled probe starting material.
  • the amount of DIG-labeled probe is determined as follows :
  • Serial dilutions e.g., 1:50, 1:2500, 1:10,000
  • Serial dilutions e.g., 1:50, 1:2500, 1:10,000
  • the membrane is fixed by UV crosslinking.
  • d. The membrane is wetted with a small amount of maleate buffer and then incubated in 1% blocking solution for 15 min at room temp.
  • Blocking Reagent In 80 ml deionized distilled water, dissolve 1.16 g maleic acid. Next, add NaOH to adjust the pH to 7.5. Add 10 g of the blocking reagent powder. Heat to 60°C while stirring to dissolve the powder. Adjust the volume to 100 ml with water. Stir and sterilize.
  • wash solutions must be prewarmed to 60 ° C. Use about 100 ml of wash solution per membrane.
  • Buffer 1 Maleic acid buffer (0.1 M maleic acid, 0.15 M NaCl; adjusted to pH 7.5 with NaoH)
  • Washing buffer Maleic acid buffer with 0.3% (v/v) Tween 20.
  • Buffer 2 (IX blocking solution): Dilute the stock solution 1:10 in
  • Detection buffer 0.1 M Tris, 0.1 M NaCl, pH 9.5
  • Buffer 2 (1:10,000) in Buffer 2 is used for detection. 75 ml of solution can be used for 3 blots.
  • the membrane are washed twice in washing buffer with gentle shaking. About 250 mis is used per wash for 3 blots .
  • the blots are equilibrated for 2-5 min in 60 ml detection buffer.
  • Bags (one for detection and one for exposure) should be cut and ready before doing the following steps. .
  • the blot is carefully removed from the detection buffer and excess liquid removed without drying the membrane.
  • the blot is immediately placed in a bag and 1.5 ml of CSPD solution is added.
  • the CSPD solution can be spread over the membrane. Bubbles present at the edge and on the surface of the blot should be removed by gentle rubbing.
  • the membrane is incubated for 5 min. in CSPD solution.
  • Example 2 Transformation of Carrot Cells Transformation of plant cells can be accomplished by a number of methods, as described above. Similarly, a number of plant genera can be regenerated from tissue culture following transformation. Transformation and regeneration of carrot cells as described herein is illustrative.
  • Single cell suspension cultures of carrot (Daucus carota ) cells are established from hypocotyls of cultivar Early France in B 5 growth medium (O.L. Gamborg et al., Plant Physiol . 45:372 (1970)) plus 2,4-D and 15 mM CaCl 2 (B 5 -44 medium) by methods known in the art. The suspension cultures are subcultured by adding 10 ml of the suspension culture to 40 ml of B 5 -44 medium in 250 ml flasks every 7 days and are maintained in a shaker at 150 rpm at 27 °C in the dark.
  • the suspension culture cells are transformed with exogenous DNA as described by Z. Chen et al . Plant Mol . Bio. 3_6:163 (1998). Briefly, 4-days post-subculture cells are incubated with cell wall digestion solution containing 0.4 M sorbitol, 2% driselase, 5mM MES (2- [N-Morpholino] ethanesulfonic acid) pH 5.0 for 5 hours. The digested cells are pelleted gently at 60 xg for 5 min. and washed twice in W5 solution containing 154 mM NaCl, 5 mM KCl, 125 mM CaCl 2 and 5mM glucose, pH 6.0.
  • the protoplasts are suspended in MC solution containing 5 mM MES, 20 mM CaCl 2 , 0.5 M mannitol, pH 5.7 and the protoplast density is adjusted to about 4 x 10 6 protoplasts per ml. 15-60 ⁇ g of plasmid DNA is mixed with 0.9 ml of protoplasts. The resulting suspension is mixed with 40% polyethylene glycol (MW 8000, PEG 8000), by gentle inversion a few times at room temperature for 5 to 25 min. Protoplast culture medium known in the art is added into the PEG-DNA- protoplast mixture. Protoplasts are incubated in the culture medium for 24 hour to 5 days and cell extracts can be used for assay of transient expression of the introduced gene.
  • MC solution containing 5 mM MES, 20 mM CaCl 2 , 0.5 M mannitol, pH 5.7 and the protoplast density is adjusted to about 4 x 10 6 protoplasts per ml. 15-60 ⁇ g of plasmid DNA
  • transformed cells can be used to produce transgenic callus, which in turn can be used to produce transgenic plants, by methods known in the art. See, for example, Nomura and Komamine, Pi t . Phys . 79:988-991 (1985), Identifica tion and Isola tion of Single Cells tha t Produce Soma tic Embryos in Carrot Suspension Cul tures .

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Biophysics (AREA)
  • Botany (AREA)
  • Genetics & Genomics (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Peptides Or Proteins (AREA)
  • Breeding Of Plants And Reproduction By Means Of Culturing (AREA)
EP00901405A 1999-01-08 2000-01-07 Sequenzbestimmte dna-fragmente und entsprechende, durch diese kodierte polypeptide Ceased EP1141301A4 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US11529399P 1999-01-08 1999-01-08
US115293P 1999-01-08
PCT/US2000/000466 WO2000040695A2 (en) 1999-01-08 2000-01-07 Sequence-determined dna fragments and corresponding polypeptides encoded thereby

Publications (2)

Publication Number Publication Date
EP1141301A2 true EP1141301A2 (de) 2001-10-10
EP1141301A4 EP1141301A4 (de) 2003-04-02

Family

ID=22360425

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00901405A Ceased EP1141301A4 (de) 1999-01-08 2000-01-07 Sequenzbestimmte dna-fragmente und entsprechende, durch diese kodierte polypeptide

Country Status (4)

Country Link
EP (1) EP1141301A4 (de)
AU (1) AU2225000A (de)
CA (1) CA2360407A1 (de)
WO (1) WO2000040695A2 (de)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002102820A1 (en) 2001-06-20 2002-12-27 Nuevolution A/S Nucleoside derivatives for library preparation
ATE414769T1 (de) 2002-03-15 2008-12-15 Nuevolution As Eine verbesserte methode zur synthese von matritzenabhängigen molekülen
AU2003247266A1 (en) 2002-08-01 2004-02-23 Nuevolution A/S Multi-step synthesis of templated molecules
AU2003266932B2 (en) 2002-09-27 2009-04-09 Mpm-Holding Aps Spatially encoded polymer matrix
ES2640279T3 (es) 2002-10-30 2017-11-02 Nuevolution A/S Procedimiento para la síntesis de un complejo bifuncional
EP1756277B1 (de) 2002-12-19 2009-12-02 Nuevolution A/S Durch quasizufallsstrukturen und funktionen geführte synthesemethode
US20070026397A1 (en) 2003-02-21 2007-02-01 Nuevolution A/S Method for producing second-generation library
FR2857022B1 (fr) * 2003-07-04 2007-10-05 Commissariat Energie Atomique Applications d'une nouvelle classe d'enzymes: les sulfiredoxines
EP1670939B1 (de) 2003-09-18 2009-11-04 Nuevolution A/S Methode zur Gewinnung struktureller Informationen kodierter Moleküle und zur Selektion von Verbindungen
EP1957644B1 (de) 2005-12-01 2010-12-01 Nuevolution A/S Enzymvermittelnde kodierungsmethoden für eine effiziente synthese von grossen bibliotheken
JP4942082B2 (ja) * 2006-06-28 2012-05-30 独立行政法人日本原子力研究開発機構 植物のオーキシンおよびオーキシン系除草剤の感受性に関わる新規遺伝子
WO2011127933A1 (en) 2010-04-16 2011-10-20 Nuevolution A/S Bi-functional complexes and methods for making and using such complexes
AU2011248754A1 (en) * 2010-04-29 2012-11-15 Pioneer Hi-Bred International, Inc. Detection of Johnsongrass and its hybrid seed
EP4421178A2 (de) 2016-05-27 2024-08-28 The Board of Trustees of the University of Illinois Transgene pflanzen mit erhöhter photosyntheseeffizienz und erhöhtem wachstum
CN109232724B (zh) * 2018-09-28 2021-07-27 东北农业大学 降低农药霜霉威残留的基因CsMAPEG及其应用
US12054727B2 (en) 2019-03-05 2024-08-06 The Regents Of The University Of California Plants with increased water use efficiency

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DATABASE EMBL [Online] STERKY ET AL: retrieved from EBI Database accession no. AI164473 XP002217677 & STERKY ET AL: PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES USA, vol. 95, no. 22, 27 October 1998 (1998-10-27), pages 13330-13335, XP002153880 *
NEWMAN T ET AL: "GENES GALORE: A SUMMARY OF METHODS FOR ACCESSING RESULTS FROM LARGE-SCALE PARTIAL SEQUENCING OF ANONYMOUS ARABIDOPSIS CDNA CLONES" PLANT PHYSIOLOGY, AMERICAN SOCIETY OF PLANT PHYSIOLOGISTS, ROCKVILLE, MD, US, vol. 106, 1994, pages 1241-1255, XP000571449 ISSN: 0032-0889 *
See also references of WO0040695A2 *

Also Published As

Publication number Publication date
WO2000040695A2 (en) 2000-07-13
CA2360407A1 (en) 2000-07-13
EP1141301A4 (de) 2003-04-02
AU2225000A (en) 2000-07-24
WO2000040695A3 (en) 2000-11-30
WO2000040695A9 (en) 2001-10-04

Similar Documents

Publication Publication Date Title
US7361749B2 (en) Sequence-determined DNA encoding methyltransferases
US20100037352A1 (en) Sequence-determined dna fragments and corresponding polypeptides encoded thereby
EP1141301A2 (de) Sequenzbestimmte dna-fragmente und entsprechende, durch diese kodierte polypeptide
US20180237794A1 (en) Sequence determined dna fragments and corresponding polypeptides encoded thereby
US8710201B2 (en) Nucleic acid sequences encoding strictosidine synthase proteins
US7659386B2 (en) Nucleic acid sequences encoding transcription factor proteins
AU2022202318A1 (en) Methods of increasing specific plants traits by over-expressing polypeptides in a plant
US20060194959A1 (en) Sequence-determined DNA fragments encoding SRF-type transcription factors
US8710204B2 (en) Nucleic acid sequences encoding secE/sec61-gamma subunits of protein translocation complexes
US7485715B2 (en) Sequence-determined DNA encoding AP2 domain polypeptides
US7390893B2 (en) Sequence-determined DNA fragments encoding peptide transport proteins
US7368555B2 (en) Sequence-determined DNA fragments encoding EF-hand calcium-binding proteins
US7691991B2 (en) Sequence-determined DNA fragments encoding cytochrome P450 proteins
US7355026B2 (en) Sequence-determined DNA fragments encoding SRF-type transcription factor proteins
US10106586B2 (en) Sequence-determined DNA fragments encoding peptide transport proteins
US9085771B2 (en) Sequence-determined DNA fragments with regulatory functions
US7479555B2 (en) Polynucleotides having a nucleotide sequence that encodes a polypeptide having MOV34 family activity
US20060211853A1 (en) Sequence-determined DNA fragments encoding prothymosin/parathymosin proteins
US7932370B2 (en) Sequence-determined DNA fragments encoding cyclopropyl isomerase proteins
US7399850B2 (en) Sequence-determined DNA fragments encoding AP2 domain proteins
US7604971B2 (en) Sequence-determined DNA fragments encoding UBIE/COQ5 methyltransferase family proteins
US7608441B2 (en) Sequence-determined DNA fragments encoding sterol desaturase proteins
US20060281909A1 (en) Sequence-determined DNA fragments encoding SRF-type transcription factor proteins
US20060211852A1 (en) Sequence-determined DNA fragments encoding EF hand proteins
US20060217542A1 (en) Sequence-determined DNA fragments encoding homeobox domain proteins

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20010807

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

A4 Supplementary search report drawn up and despatched

Effective date: 20030217

17Q First examination report despatched

Effective date: 20041214

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20080220