WO2002079487A2 - Methods, platforms and kits useful for identifying, isolating and utilizing nucleotide sequences which regulate gene expression in an organism - Google Patents
Methods, platforms and kits useful for identifying, isolating and utilizing nucleotide sequences which regulate gene expression in an organism Download PDFInfo
- Publication number
- WO2002079487A2 WO2002079487A2 PCT/IL2002/000267 IL0200267W WO02079487A2 WO 2002079487 A2 WO2002079487 A2 WO 2002079487A2 IL 0200267 W IL0200267 W IL 0200267W WO 02079487 A2 WO02079487 A2 WO 02079487A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequences
- nucleic acid
- organism
- contigs
- acid sequences
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6897—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids involving reporter genes operably linked to promoters
Definitions
- the present invention relates to methods, platforms and kits for identifying and isolating non-coding genomic sequences which regulate gene expression in an organism.
- Embodiments of the present invention relate to methods of isolating and utilizing non-transcribed genomic sequences for generating genotypic and possibly phenotypic variation in the organism and for identifying and characterizing regulatory sequences participating in biological pathways.
- sequence information can be used to enhance the capacity to generate plant varieties having desired characteristics. This is clearly of tremendous potential impact in the fields of agriculture, pharmacology, textiles, horticulture and all other industries involving the use of plants and plant derived products.
- Non-coding sequences encode DNA regulatory elements
- DREs which play a critical role in determining the phenotype of organisms, including plants, since such sequences function as the master switches of gene expression.
- DREs comprise regulatory elements such as promoters, enhancers, suppressors, silencers and locus control regions.
- Promoters are generally located adjacent to transcriptional start sites and function in an orientation-dependent manner while enhancer and suppressor elements, which modulate the activity of promoters, are flexible with respect to their orientation and distance from transcriptional start sites and, as such, can also be located within introns.
- Gene regulatory sequences are specifically bound by gene specific transcription factors (TFs) and, in many cases, a complex of other gene specific accessory proteins, the sum of which determine the rate, cell type specificity and developmental stage specificity of gene transcription as well as transcriptional responses to given physiological conditions.
- TFs gene specific transcription factors
- the phenotype of plants is also determined by the nature of regulatory elements controlling gene expression. For example, it has been shown that the genomes of different species of cereal plants encode similar genes while varying in numbers of non-transcribed, repetitive DNA sequences (Messing and Llaca, Proc Natl Acad Sci U S A. 1998, 5:2017).
- a method of generating genotypic and possibly phenotypic variation in an organism comprising: (a) isolating at least one non-coding nucleic acid sequence from a genome of the organism; and (b) genetically transforming the organism with the at least one non-coding nucleic acid sequence to thereby generate genotypic and possibly phenotypic variation in the organism.
- a method of identifying novel gene expression regulatory sequences comprising: (a) isolating at least one non-coding nucleic acid sequence from a genome of an organism; (b) transforming the organism with an expression cassette including the at least one non-coding nucleic acid sequence covalently linked to a reporter nucleic acid sequence; and (c) monitoring reporter activity, the reporter activity being indicative of a presence of a regulatory sequence in the at least one non-coding nucleic acid sequence.
- a method of generating a database of putative regulatory sequences of a genome of an organism comprising: (a) computationally clustering transcribed nucleic acid sequences of the organism to thereby obtain a plurality of clusters; (b) computationally generating contigs from at least a subset of the plurality of clusters; (c) computationally aligning the contigs with the genomic nucleic acid sequences of the organism to thereby obtain inter-contig region sequences of the genome of the organism; and (d) storing the inter-contig region sequences of the genome of the organism in a database.
- a computer readable media comprising as retrievable records data pertaining to a plurality of nucleic acid sequences, each of the plurality of nucleic acid sequences representing an inter-contig region sequence of a genome of a single organism.
- a nucleic acid construct library comprising a plurality of nucleic acid constructs each including a specific non-coding nucleic acid sequence of an organism and devoid of coding sequences of the organism.
- kits comprising a plurality of primer pairs, each of the primer pairs being complementary with nucleic acid sequences flanking a specific inter-contig region sequence of a genome of an organism, such that the kit being useful for amplifying a plurality of inter-contig region sequences of the genome of the organism.
- a method of identifying putative regulatory sequences comprising: (a) computationally identifying inter-contig region sequences of at least two distinct organisms; and (b) computationally comparing the inter-contig region sequences of the at least two distinct organisms to thereby identify non- redundant sequences, the non-redundant sequences being putative regulatory sequences.
- a method of generating genotypic and possibly phenotypic variation in an organism comprising: (a) isolating at least one non-coding nucleic acid sequence from a genome of the organism; (b) covalently linking the at least one non-coding nucleic acid sequence to a known coding sequence to thereby generate an expression cassette; and (b) genetically transforming the organism with the expression cassette to thereby generate genotypic and possibly phenotypic variation in the organism.
- a method of uncovering regulatory sequences functional in a biological pathway of an organism comprising: (a) isolating non-coding nucleic acid sequences from a genome of the organism; (b) covalently linking each of the non-coding nucleic acid sequences to a reporter coding sequence to thereby generate a plurality of expression cassettes; (c) genetically transforming a plurality of organisms with the plurality of the expression cassettes; (d) inducing activation of the biological pathway in the plurality of organisms; and (e) monitoring reporter activity in the plurality of organisms prior to, and following, step (d), to thereby determine the presence or absents of a regulatory sequence functional in the biological pathway in each of the non-coding nucleic acid sequences.
- a method of generating phenotypic variation in an organism comprising: (a) isolating non-coding nucleic acid sequences from a genome of the organism; (b) generating a plurality of organisms genetically transformed with the non-coding nucleic acid sequences; and (c) isolating an organism of the plurality of organisms which exhibits phenotypic variation.
- a method of generating phenotypic variation in an organism comprising: (a) isolating non-coding nucleic acid sequences from a genome of the organism; (b) combinatorially shuffling regions derived from the non- coding nucleic acid sequences, to thereby generate combinatorial non-coding nucleic acid sequences; (b) generating a plurality of organisms genetically transformed with the combinatorial non-coding nucleic acid sequences; and (c) isolating an organism of the plurality of organisms which exhibits phenotypic variation.
- a computing platform for identifying inter-contig region sequences of an organism and for generating primer sequences for amplifying the inter-contig region sequences
- the computing platform comprising a processing unit being for: (a) computationally comparing data pertaining to transcribed nucleic acid sequences of an organism with data pertaining to genomic sequences of the organism to thereby generate data pertaining to inter-contig sequences of the organism; and (b) automatically generating primer sequences suitable for amplifying the inter-contig sequences of the organism.
- the at least one non-coding nucleic acid sequence is isolated from an inter-contig region of the genome.
- the organism is a plant.
- isolating the at least one non-coding nucleic acid sequence is effected by: (i) computationally clustering transcribed nucleic acid sequences of the organism to thereby obtain a plurality of clusters; (ii) computationally generating contigs from at least a subset of the plurality of clusters; (iii) computationally aligning the contigs with the genomic nucleic acid sequences of the organism to thereby identify inter-contig region sequences of the genome of the organism; and (iv) amplifying at least one of the inter-contig region sequences to thereby obtain the at least one isolated non-coding nucleic acid sequence.
- the transcribed sequences are selected from the group consisting of EST sequences, cDNA sequences, mRNA sequences and preanalyzed genomic sequences.
- the method of generating genotypic and possibly phenotypic variation in an organism further comprising assigning to the contigs a score according to at least one parameter selected from the group consisting of: (a) the number of the transcribed nucleic acid sequences clustered; (b) the percent homology of nucleotide sequences of the contigs to nucleotide sequences of known transcription factors; (c) the percent homology of nucleotide sequences of the contigs to nucleotide sequences of selected genes of interest; (d) the number of expression libraries from which the contigs were generated; (e) the number of types of expression libraries from which the contigs were generated; (f) the number of RNAs comprised in the plurality of clusters; (g) the length of the contig; (h) a user-defined quality score; (i) the type of tissues from which the transcribed nucleic acid sequences were derived; (j) the developmental stage of the tissues from which said
- the expression cassette further includes a promoter sequence upstream of the reporter nucleic acid sequence.
- the transcribed nucleic acid sequences are selected from the group consisting of EST sequences, mRNA sequences and preanalyzed genomic sequences.
- the method of identifying novel gene expression regulatory sequences further comprising assigning to the contigs a score according to at least one parameter selected from the group consisting of: (a) the number of the transcribed nucleic acid sequences clustered; (b) the percent homology of nucleotide sequences of the contigs to nucleotide sequences of known transcription factors; (c) the percent homology of nucleotide sequences of the contigs to nucleotide sequences of selected genes; (d) the number of expression libraries from which the contigs were generated; (e) the number of types of expression libraries from which the contigs were generated; (f) the number of RNAs comprised in the plurality of clusters; (g) the length of the contig; (h) the types of methods whereby the transcribed nucleic acid sequences were derived; (i) the type of tissues from which the transcribed nucleic acid sequences were derived; (j) the developmental stage of the group consisting of: (a) the number of the
- the method of generating a database of putative regulatory sequences of a genome of an organism further comprising: (e) computationally clustering the inter-contig region sequences of the genome of the organism to thereby identify and group non-redundant sequences.
- the method of generating a database of putative regulatory sequences of a genome of an organism further comprising assigning to the contigs a score according to at least one parameter selected from the group consisting of: (a) the number of the transcribed nucleic acid sequences clustered; (b) the percent homology of nucleotide sequences of the contigs to nucleotide sequences of known transcription factors; (c) the percent homology of nucleotide sequences of the contigs to nucleotide sequences of selected genes; (d) the number of expression libraries from which the contigs were generated; (e) the number of types of expression libraries from which the contigs were generated; (f) the number of RNAs comprised in the plurality of clusters; (g) the length of the contig; (h) the types of methods whereby the transcribed nucleic acid sequences were derived; (i) the type of tissues from which the transcribed nucleic acid sequences were derived; (i) the type of tissues from which the
- each of the plurality of the nucleic acid constructs further includes a coding nucleic acid sequence of a known protein covalently linked to the specific non-coding nucleic acid sequence.
- the at least two distinct organisms represent closely related species.
- isolating the non-coding nucleic acid sequences is effected by: (i) computationally clustering transcribed nucleic acid sequences of the organism to thereby obtain a plurality of clusters; (ii) computationally generating contigs from at least a subset of the plurality of clusters; (iii) computationally aligning the contigs with the genomic nucleic acid sequences of the organism to thereby identify inter-contig region sequences of the genome of the organism; and (iv) amplifying the inter-contig region sequences to thereby obtain the isolated non-coding nucleic acid sequences.
- the method of uncovering regulatory sequences functional in a biological pathway of an organism further comprising assigning to the contigs a score according to at least one parameter selected from the group consisting of: (a) the number of the transcribed nucleic acid sequences clustered; (b) the percent homology of nucleotide sequences of the contigs to nucleotide sequences of known transcription factors; (c) the percent homology of nucleotide sequences of the contigs to nucleotide sequences of selected genes; (d) the number of expression libraries from which the contigs were generated; (e) the number of types of expression libraries from which the contigs were generated; (f) the number of RNAs comprised in the plurality of clusters; (g) the length of the contig; (h) the types of methods whereby the transcribed nucleic acid sequences were derived; (i) the type of tissues from which the transcribed nucleic acid sequences were derived; (j
- the method of generating phenotypic variation in an organism further comprising the step of culturing the plurality of organisms genetically transformed with the non-coding nucleic acid sequences under conditions suitable for identifying the phenotypic variation.
- the method of generating phenotypic variation in an organism further comprising assigning to the contigs a score according to at least one parameter selected from the group consisting of: (a) the number of the transcribed nucleic acid sequences clustered; (b) the percent homology of nucleotide sequences of the contigs to nucleotide sequences of known transcription factors; (c) the percent homology of nucleotide sequences of the contigs to nucleotide sequences of selected genes; (d) the number of expression libraries from which the contigs were generated; (e) the number of types of expression libraries from which the contigs were generated; (f) the number of RNAs comprised in the plurality of clusters; (g) the length of the contig; (h) the types of methods whereby the transcribed nucleic acid sequences were derived; (i) the type of tissues from which the transcribed nucleic acid sequences were derived; (j) the developmental parameters; and the developmental factors.
- the method of generating phenotypic variation in an organism further comprising covalently linking a coding sequence of a known protein to each of the non-coding nucleic acid sequences prior to step (b).
- the method of generating phenotypic variation in an organism further comprising generating a plurality of organisms genetically transformed with the non-coding nucleic acid sequences, isolating a non-coding nucleic acid sequence from each organism which exhibits phenotypic variation and using isolated non-coding nucleic acid sequences for the combinatorial shuffling of step (b).
- the method of generating phenotypic variation in an organism further comprising characterizing the non-coding nucleic acid sequences prior to step (b).
- the present invention successfully addresses the shortcomings of the presently known configurations by providing methods platforms and kits for identifying and isolating regulatory sequences of an organism and for using such regulatory sequences to generate genotypic and optionally phenotypic variation in the organism.
- the present invention can also be utilized to identify non-coding sequences which regulate an expression of genes functional in biological pathways
- FIG. la is a flow-chart depicting steps in the process of bioinformatic identification of candidate DREs from DNA sequence databases.
- FIGs. lb-c are schematic diagrams depicting cloning of candidate DREs via PCR-amplification from fully sequenced genomic DNA ( Figure lb) and from ESTs ( Figure lc) into expression vectors. Red arrows indicate external and nested primers for PCR amplification.
- FIGs. 2a-c are schematic diagrams depicting strategies of the present invention for identification and PCR cloning of unidirectional candidate DREs located within inter-contig region sequences 0.2-6 kb and > 6 kb in length, and of bidirectional DREs located within inter-contig region sequences 0.2-6 kb in length ( Figures 2a, 2b and 2c, respectively).
- Contig-defined sequence (CDS) transcription putatively regulated by amplified DREs, and the directionality thereof, are indicated by bent arrows.
- FIG. 3 is a data plot depicting numbers of unidirectional (navy lozenges) and bidirectional (pink squares) candidate DREs ( ⁇ 6 kb in length) versus length of candidate DREs.
- the y-axis numbers are only directly applicable for the unidirectional candidate DREs.
- the y- axis represents the proportion of these DREs in the entire candidate DRE population.
- FIG. 4 is a data plot depicting numbers of unidirectional (navy lozenges; left-hand side y-axis units) and bidirectional (pink squares; right-hand side y- axis units) candidate DRE regions (100- 1,500 bp) versus candidate DRE length.
- FIG. 5 is a composite agarose gel photograph/schematic diagram depicting a scheme for cloning of vectors for expression of luciferase reporter genes under the control of candidate DREs.
- FIG. 6 is a schematic diagram depicting generation of novel genotypes via combinatorial shuffling of heterologous DRE/expressed sequence pairs within a given genome.
- FIG. 8 is a data histogram depicting percentages of "True", “False” and
- the present invention can be used to (i) isolate non-coding sequences from a genome of an organism, (ii) uncover non-coding sequences participating in biological pathways and (iii) generate genotypic, and optionally phenotypic, variation in an organism.
- the phenotype of an organism is dictated by a combination of the amino acid sequences of its gene products and the spatial/temporal expression patterns thereof.
- DREs DNA regulatory elements
- non-coding region is a non-transcribed polynucleotide.
- isolation and characterization of such non-coding regions can provide insight into the regulatory mechanisms underlying phenotypic variation.
- DREs are well known to those of ordinary skill in the art to include, for example, promoters, enhancers, suppressors, silencers, locus control regions and the like.
- isolation of DNA regions or sequences can be effectively achieved by cloning of such regions in, for example, plasmid vectors. Since non-coding regions comprise the majority of the DNA sequences of genomes of organisms, isolation and characterization thereof can prove to be a time consuming task.
- a computing platform which can be utilized to identify non-coding sequences, such as, for example, inter-contig region sequences of an organism and to generate primer sequences for amplifying such sequences, thus enabling the rapid and efficient cloning of such sequences from an organism.
- the computing platform includes a processing unit which operates a software application designed for: (a) generating data pertaining to inter-contig sequences of the organism; (b) processing data and classifying sequences according to various biological parameters (see Example 2 of the Examples section) and (c) automatically generating primer sequences suitable for amplifying the inter-contig sequences of the organism.
- the data pertaining to inter-contig sequences of the organism is generated by clustering transcribed nucleic acid sequences of an organism to thereby generate contigs (for further description of clustering and contig generation, see the Examples section which follows).
- transcribed nucleic acid sequences refers to nucleic acid sequences being, or corresponding to, gDNA sequences, or gDNA sequences having the capacity to be transcribed, from gDNA into RNA. It will be appreciated that genomic sequences may comprise intronic sequences being absent from corresponding transcribed nucleic acid sequences as a result of, for example, intron splicing during RNA processing.
- transcribed nucleic acid sequences include unspliced RNA sequences, such as, for example, primary RNA transcript sequences; spliced RNA sequences, such as, for example, mRNA sequences; poly-adenylated (polyA) RNA sequences; expressed sequence tags (ESTs); cDNA sequences, computationally identified nucleic acid sequences, and the like.
- unspliced RNA sequences such as, for example, primary RNA transcript sequences
- spliced RNA sequences such as, for example, mRNA sequences
- polyA poly-adenylated
- ESTs expressed sequence tags
- cDNA sequences computationally identified nucleic acid sequences, and the like.
- nucleic acid sequences having the capacity to be transcribed, from gDNA into RNA can be identified by computational analysis using techniques well known to one of ordinary skill in the art.
- a "contig” is defined as a polynucleotide having a sequence being, or essentially corresponding to, a gDNA sequence represented by a cluster of transcribed nucleic acid sequences, as described in the Examples section which follows. It will be appreciated that gDNA sequences corresponding to a contig may comprise intronic sequences being absent from one or more transcribed sequences clustered to generate the contig, in cases, for example, where such transcribed nucleic acid sequences are obtained, either directly or indirectly from RNA sequences from which introns have been spliced out. such as, for example, cDNA sequences, as described in the Examples section which follows.
- inter-contig region refers to a nucleic acid sequence being, or corresponding to, a gDNA sequence located between consecutive contigs.
- expressed sequences such as ESTs
- ESTs are obtained from as many different types of libraries as possible, such as, for example, libraries from different laboratories, libraries from different tissue types, libraries from tissues growing under different growth condition and libraries in which expressed sequences were synthesized from both transcriptional orientations.
- Such variation in libraries can be used to optimize correlation of number of expressed sequences with the size of the clusters generated therewith as well as representation of the 5' region.
- libraries comprising at least 100 expressed sequences are employed.
- the contigs generated are aligned with genomic sequences of the organism, thus defining the inter-contig regions of the genome.
- primer sequences suitable for amplifying the inter-contig sequences of the organism are then automatically provided by the computing platform according to the sequences of the inter-contig regions defined thereby.
- the 5' and 3' sequences of the generated contigs can be utilized to generate primers suitable for amplifying the inter-contig sequences of the organism.
- the 5' portion of a contig can be used to define a primer sequence useful for directing amplification of upstream sequences, while the 3' portion can be utilized to define primer sequences useful for directing amplification of downstream sequences.
- the latter approach can be substantially improved by using primers derived from contigs representing syntenic, or homologous, chromosomal sequences of a related organism.
- extension of clusters of an Arabidopsis related plant, such as tomato can be facilitated using primers derived from syntenic or homologous Arabidopsis clusters.
- Methods of assigning synteny, or homology, to chromosomal sequences of a pair of related organisms are well known to those versed in the art.
- assignment of chromosomal synteny or homology between chromosomes from different genomes can be derived computationally by comparing batches of mRNA derived sequences derived from two species, such as Arabidopsis and tomato, using the TBLASTX method, as described in the Examples section which follows. Further descriptions of the above approaches and additional approaches for high throughput identification and isolation of inter-contig regions are provided in the Examples section which follows.
- inter-contig regions of an organism identified according to the teachings of the present invention can be sequenced and the sequence information stored as a database on a computer readable media (see the Examples section below for further detail).
- database information can be analyzed to uncover consensus sequences, non-redundant sequences or any other sequence characteristic which can provide information as to the function of the inter-contig region. Comparison of two or more databases of related or non-related species can also be effected in efforts to uncover yet additional information.
- inter-contig regions define non-transcribed regions of a genome
- inter-contig regions according to the method of the present invention comprise DREs.
- the present invention provides for a wide variety of means whereby contigs having desired characteristics can be identified or selected from databases.
- contigs generated by clustering the largest number of transcribed nucleic acid sequences are selected.
- contigs are assembled from at least 4 transcribed nucleic acid sequences are selected.
- contigs generated from clusters comprising the largest possible number of RNA sequences are selected.
- contigs are generated from clusters comprising at least two RNA sequences are selected.
- contigs from libraries expressing the largest possible number of contigs, preferably no less than 100, are selected so as to maximize the probability of selecting contigs representing with maximal accuracy and/or completeness expressed sequences.
- selection of such contigs is preferably effected by selecting contigs defined by clustered transcribed nucleic acid sequences generated by the largest possible number of different laboratories, techniques, etc. For example, contigs generated from libraries of expressed sequences transcribed from both coding strands are selected.
- Selection of contigs having maximal probability of representing a complete transcript is preferably effected by selecting contigs whose length is between 0.5 and 6 kb, more preferably between 1 and 3 kb.
- contigs representing a given type of protein such as a TF
- contigs having the highest possible level of nucleic acid sequence homology to genes encoding such proteins are selected.
- homology searches for contigs representing proteins, such as TFs using the GO database are performed with a cut-off e-score no larger than 10 "4 .
- homology searches using the Pfam database for identifying contigs comprising domains, such as TF domains are effected using a SCORE cut-off threshold of at least 30.
- selection of contigs representing constitutively expressed sequences is effected by selecting contigs which are represented by the largest possible number of expressed sequence libraries, by selecting contigs which are represented by the largest possible number of different types of expressed sequence, or preferably both.
- contigs generated using expressed nucleic acid sequence libraries derived from tissues of such a type, developmental stage and/or from tissues growing under such growth conditions are selected.
- selection or identification of DREs or non-coding sequences having desired regulatory properties is effected by selecting DREs defined by contigs displaying expression patterns characteristic of such regulatory properties, as demonstrated in the Examples section, below.
- non-coding sequences identified and/or cloned according to the teachings of the present invention can be utilized in several ways.
- a method of generating genotypic, and possibly phenotypic, variation in an organism is effected by isolating at least one non-coding nucleic acid sequence from a genome of the organism and genetically transforming the organism with the isolated sequence to thereby generate genotypic, and possibly phenotypic, variation in the organism.
- the method according to this aspect of the present invention alters gene expression by "repositioning" non-coding nucleic acid sequence(s) within a genome of the organism.
- the above described method is particularly useful for generating genotypic, and possibly phenotypic, variation in plants, in which such variations can provide commercially important traits.
- non-coding nucleic acid sequences functional in altering a phenotype or phenotypes
- regions derived from non-coding nucleic acid sequences functional in altering a phenotype or phenotypes can be used to engineer "Mixed" non-coding nucleic acid sequences including regions of several related or unrelated non-coding nucleic acid sequences.
- Such combinatorial shuffling of portions of non-coding nucleic acid sequences can be used to further increase their effect on a phenotype or phenotypes while also being useful in characterizing sequence "modules" which participate or contribute to the overall phenotypic effect of a particular non-coding nucleic acid sequence(s).
- the method of the present invention is preferably effected on a large scale using a plurality of plants and a plurality of isolated non-coding sequences.
- the method of generating phenotypic variation can employ high throughput approaches.
- the isolated sequences (e.g., DREs) of the present invention can be generated, cloned and introduced into an organism (e.g., a plant) using a "one tube" approach.
- a single reaction tube can be used to PCR amplify the DRE(s) of interest using specific primers or primers sets; to enzymatically digest PCR products, if necessary; to clone such digested PCR products into a suitable propagation/transformation vector (as described hereinbelow); and to directly transform an organism therewith.
- a “one-tube” approach enables the automation of the method of the present invention, thus enabling large scale screening of numerous DREs.
- Such a nucleic acid construct preferably further includes additional polynucleotide regions which provide a broad host range prokaryote replication origin and a prokaryote selectable marker.
- the construct will preferably also have a selectable marker gene suitable for determining if a plant cell has been transformed.
- Suitable prokaryote selectable markers include genes conferring resistance to antibiotics such as ampicillin, kanamycin or tetracycline.
- Other polynucleotide sequences providing additional functions may also be present in the nucleic acid construct, as is known in the art.
- Sequences suitable for permitting or enhancing integration of the polynucleotide sequence of the present invention into the plant genome are also recommended. These might include transposon sequences as well as Ti sequences which permit random insertion of a heterologous expression cassette into a plant genome.
- nucleic acid constructs suitable for use by the present invention are provided in the Examples section which follows.
- the Agrobacterium system includes the use of plasmid vectors that contain defined DNA segments that integrate into the plant genomic DNA. Methods of inoculation of the plant tissue vary depending upon the plant species and the Agrobacterium delivery system. A widely used approach is the leaf disc procedure which can be performed with any tissue explant that provides a good source for initiation of whole plant differentiation. Horsch et al.
- a supplementary approach employs the Agrobacterium delivery system in combination with vacuum infiltration.
- the Agrobacterium system is especially viable in the creation of transgenic dicotyledenous plants.
- electroporation the protoplasts are briefly exposed to a strong electric field.
- microinjection the DNA is mechanically injected directly into the cells using very small micropipettes.
- microparticle bombardment the DNA is adsorbed on microprojectiles such as magnesium sulfate crystals or tungsten particles, and the microprojectiles are physically accelerated into cells or plant tissues.
- glass fibers or silicon carbide whisker mediated transformation glass fibers or silicon carbide needles like structures are mixed with DNA and cells in a suspension to thereby induce fiber/whisker-cell collisions, which lead to cell impalement (by the fibers/whiskers) and DNA injection into the cell.
- the transformation methods described hereinabove are typically followed by propagation of transformed tissues.
- the most common method of plant propagation is by seed. Regeneration by seed propagation, however, has the deficiency that due to heterozygosity there is a lack of uniformity in the crop, since seeds are produced by plants according to the genetic variances governed by Mendelian rules. Basically, each seed is genetically different and each will grow with its own specific traits. Therefore, it is preferred that the transformed plant be produced such that the regenerated plant has the identical traits and characteristics of the parent transgenic plant. Therefore, it is preferred that the transformed plant be regenerated by micropropagation which provides a rapid, consistent reproduction of the transformed plants.
- Micropropagation is a process of growing new generation plants from a single piece of tissue that has been excised from a selected parent plant or cultivar. This process permits the mass reproduction of plants having the preferred tissue expressing the fusion protein.
- the new generation plants which are produced are genetically identical to, and have all of the characteristics of, the original plant.
- Micropropagation allows mass production of quality plant material in a short period of time and offers a rapid multiplication of selected cultivars in the preservation of the characteristics of the original transgenic or transformed plant.
- the advantages of cloning plants are the speed of plant multiplication and the quality and uniformity of plants produced.
- Micropropagation is a multi-stage procedure that requires alteration of culture medium or growth conditions between stages.
- the micropropagation process involves four basic stages: Stage one, initial tissue culturing; stage two, tissue culture multiplication; stage three, differentiation and plant formation; and stage four, greenhouse culturing and hardening.
- stage one initial tissue culturing
- stage two tissue culture multiplication
- stage three differentiation and plant formation
- stage four greenhouse culturing and hardening.
- stage one initial tissue culturing
- the tissue culture is established and certified contaminant-free.
- stage two the initial tissue culture is multiplied until a sufficient number of tissue samples are produced to meet production goals.
- stage three the tissue samples grown in stage two are divided and grown into individual plantlets.
- the transformed plantlets are transferred to a greenhouse for hardening where the plants' tolerance to light is gradually increased so that it can be grown in the natural environment.
- the method according to this aspect of the present invention is preferably effected on a large scale using a plurality of plants, each transformed with a specific nucleic acid construct harboring a specific non-coding sequence.
- the resultant transformants can then be tested for general phenotypic variation detected by visible morphological alterations.
- specific variations such as increased or acquired stress tolerance, and the like can be detected by cultivating the transformed plants under conditions suitable for detecting such phenotypes.
- non-coding sequences of plants exhibiting phenotypic variation can be isolated using, for example, construct specific primers and the isolated sequence can be further characterized.
- transformed plants which do not exhibit phenotypic variation are not readily utilizable, such plants can be genetically crossed with either wild type (w.t.) plants or closely related plant species in to generate progeny exhibiting phenotypic variation.
- nucleic acid constructs of the present invention can also include a coding sequence of a characterized gene positioned under the transcriptional control of the non-coding sequence.
- Such a characterized gene can encode, for example, diacylglycerol acyltransferase (Jako C. et al. Plant Physiol. 2001, 126(2):861-74), ATHB-8 HD-zip protein (Baima S. et al. Plant Physiol. 2001, 126(2):643-55), Leafy or Apetalal (Pena L. et al. Nat Biotechnol. 2001, 19(3):263-7), bacterio-opsin (Rizhsky L. and Mittler R. (Plant Mol Biol. 2001, 46(3):313-23), AtMYB23 (Kirik V. et al. Dev Biol.
- a coding sequence under the regulatory control of a non- coding sequence can also be utilized to uncover novel gene expression regulatory sequences and to identify regulatory sequences which regulate the expression of genes participating in biological pathways.
- a method of identifying novel gene expression regulatory sequences can also be utilized to uncover novel gene expression regulatory sequences and to identify regulatory sequences which regulate the expression of genes participating in biological pathways.
- the method is effected by transforming an organism with an expression cassette including a non-coding nucleic acid sequence covalently linked to a reporter nucleic acid sequence, encoding, for example, a fluorophore, such as green fluorescent protein (GFP) or a derivative thereof, or an enzyme capable of catalyzing reporter activity, such as, for example ⁇ -galactosidase.
- the method is further effected by monitoring reporter activity, the reporter activity indicating a presence of a regulatory sequence in the non-coding nucleic acid sequence.
- the expression cassette can further include a constitutive promoter sequence which can be positioned, for example, between the non-coding sequence and the reporter nucleic acid sequence.
- the method according to this aspect of the present invention preferably employs a stable transformation approach, transient transformation of, specific plant tissues such as, for example, flower tissue, leaf tissue, seeds or tubers, which can be utilized for identifying tissue specific regulatory sequences or transient transformation of the whole plant can also be utilized.
- a stable transformation approach (described hereinabove)
- the expression cassette of the present invention is integrated into the plant genome and as such it represents a stable and inherited trait.
- the expression cassette of the present invention is expressed by the cell transformed but it is not integrated into the genome and as such it represents a transient trait.
- Transient transformation can be effected by any of the direct DNA transfer methods described above or by viral infection using modified plant DNA viruses.
- Transformation of an organism such as a plant with the expression cassette described above can also be utilized to identify regulatory sequences which regulate the expression of genes participating in specific biological pathways.
- plants transformed with the expression cassette of the present invention are grown under conditions suitable for the induction of an uncharacterized biological pathway or are specifically stimulated by an agent capable of triggering the biological pathway.
- the non-coding region of the expression cassette includes a regulatory sequence (e.g. promoter) which participates in the biological pathway, then reporter activity is generated.
- a regulatory sequence e.g. promoter
- suitable controls such as identical transformants grown under non-inducing conditions must be employed.
- the present invention provides methods of identifying and isolating non-coding sequences present in, for example, inter-contig regions.
- the present invention also provides methods of utilizing isolated non- coding sequences for generating genotypic and possibly phenotypic variation as well as for uncovering novel regulatory sequences and regulatory sequences participating in specific biological pathways.
- databases of the nucleotide sequences of the transcribed and non- transcribed regions of the genomes of organisms would be of great value in industries and fields employing plants or plant products. Such databases would constitute a powerful bioinformatics tool for the processing of information regarding Arabidopsis transcribed nucleic acid sequences and for applied uses thereof. Furthermore, databases of sequences of transcribed regions of the genome would enable creation of databases of sequences of non-transcribed regions of the genome comprising DREs. The latter type of database could be used to efficiently isolate and clone DREs which could be used to produce genetically modified plants exhibiting desired characteristics. Such databases, however, are currently lacking. Thus, the present inventors have created databases of transcribed nucleic acid sequences of the Arabidopsis genome, as described below. Materials and Methods:
- Transcribed nucleic acid sequences were computationally clustered and assembled to create contigs defining the maximal contiguous stretches of transcribed nucleic acid sequences using LEADSTM software (Compugen). This software also recognizes vector sequences used for producing the ESTs, sequencing quality, frequency of low- complexity regions, repetitive sequences within transcribed regions, cases and frequencies of alternative splicing and intron retention and cases and frequencies of antisense RNAs.
- the minimal Arabidopsis EST library containing > 1 EST/gene should contain between 267,596 and 535,192 EST entries.
- Results Computational identification of transcribed nucleic acid sequences: Processing of Arabidopsis sequence databases using LEADSTM software generated 19,311 contigs based on clustering and assembly of EST and RNA sequences. This number is less than the 25,498 genes identified by the Arabidopsis genome initiative (The Arabidopsis Genome Initiative, Nature 2000, 408:796).
- sequence databases of the present invention have the capacity to effectively and accurately provide information regarding such sequences on a genome-wide scale.
- these databases constitute a potent bioinformatics tool applicable in industries or fields employing plants or plant products and which can furthermore be employed to create genome-scale databases of sequences of DRE comprising regions of the Arabidopsis genome.
- EST library selection for generation of an EST database annotated according to relevance with respect to biological parameters Out of 458 available EST libraries, 48 containing > 50 ESTs were selected for generation of an EST database annotated according to sets of prioritizable biological parameters.
- the libraries selected were derived from sources representing various combinations of anatomical locations and sublocations, developmental stages and growth treatments, as shown in Table 3.
- Contig quality Reflects a user-defined priority list.
- Constitutive expression Prioritizes contigs with constitutive expression. Numbers of clustered ESTs comprised in contigs and/or numbers of different libraries from which the same contigs are derived is/are correlated with the probability that such contigs are constitutively expressed
- TF status to contigs: High priority is assigned to TF genes on the assumption that they are responsible for large fraction of genetic and phenotypic variation. Contigs were classified as TFs if their sequences were found to be homologous to sequences listed in the Arabidopsis Gene Ontology (GO), GenBank non- redundant (nr) or Pfam databases. Contigs were classified as ozone responsive TFs on the basis of information provided to the present inventors by Nina Fedoroff.
- GO analysis Contigs were assigned TF status by performing homology comparison with the GO annotated database with TBLASTX using a cutoff e- score threshold of 10 "4 . Contigs scoring very high homologies were manually confirmed as being TFs.
- Pfam analysis Homology searches using the Pfam protein domains database was used to identify contigs comprising TF domains. Homology searches using Pfam were performed using a SCORE cut-off threshold of > 30.
- ESTs found in roots and seeds could be due to the fact that the plant organs used for the preparation of ESTs of these categories were absent from the biological material used to prepare the above-ground/whole plant EST libraries.
- ESTs belonging to the above-ground category include most of the other EST subgroups.
- Dry seed- and developing seed-derived ESTs were both found to express a high proportion of unique, developmental stage-specific sequences.
- Above-ground/whole plant-derived ESTs comprised 174 contigs containing 861 unique sequences, indicating that this subgroup included additional organs/tissues from other subgroups.
- Contig quality classification 345 contigs were found to have a quality score > 1 and these comprised a high proportion of ESTs from subgroups (flower tips, cell culture, etiolated seedlings, seed embryos, NaCl-treated seedlings, nematode-infected roots and nitrate-treated roots) given an interest score > 90 out of 100.
- Constitutively expressed genes 159 contigs were classified as being highly constitutively expressed (score > 500). For example, one of the highest scoring contigs was assembled from 736 ESTs derived from 14 different EST library subgroups. This sequence was found to have significant homology to that of the hsc 70 gene of Ly coper icon esculantum and has been found to be highly expressed in the vegetative tissues of this plant (Sun SW. et al. 1996. Gene 170:237).
- Transcription Factors Homology searches using GO database were used to assign putative TF status to 367 contigs, however this number was reduced to 59 contigs satisfying selection criteria. Homology searches using GenBank nr database were used to assign putative TF status to 472 contigs, with 285 of these meeting selection criteria. Homology searches using Pfam database was used to assign putative TF status to 798 contigs, however this number was reduced to 421 fulfilling selection criteria. On the basis of information provided to the present inventors by Nina Fedoroff, 4 contigs were assigned ozone induced TF status. In sum, 586 contigs satisfying filtration criteria out of a total of 1,460.
- Contigs defined as TFs according to selection criteria were comprised, of no more than 7 clustered ESTs and were derived, on average, from no more than 3 EST libraries. Contigs defined as TFs which did not satisfy selection criteria were comprised, on average, of 1.8 clustered ESTs and were derived, on average, from 1.1 EST libraries. The average TF gene of both groups is composed, on average, of 4 clustered ESTs derived from at least 2 EST libraries.
- a subgroup comprising 35 % of the initial set of 335 contigs classified as TFs was classified as being organ specific in various organs (Table 6).
- Genome-scale databases of DRE nucleotide sequences of organisms are highly desirable since these would constitute a valuable bioinformatics tool enabling analytical processing of DREs related information on a genomic scale.
- databases could enable the efficient cloning of such DREs and these could be used in a multitude of applications, such as, for example, to generate plants genetically modified to possess novel and/or selected characteristics.
- the capacities afforded by such databases could clearly be exploited to great benefit in industries employing plants and plant products.
- very little information regarding DREs of the model plant Arabidopsis is available.
- the present inventors have generated genome-scale databases of Arabidopsis DREs, as follows. Materials and Methods:
- Inter-contig regions located within regions 6 kb upstream of CDSs were defined as candidate DREs for such regions, based on the fact that promoters are generally located within the region 6 kb upstream of the coding sequences which they regulate. It was demonstrated that the minimal length of DREs is > 200 bp (see below), therefore no sequences ⁇ 200 bp in length were classified as candidate DREs.
- Strategies for PCR cloning of candidate DREs from fully sequenced gDNA and from ESTs are shown in Figures lb and lc, respectively.
- Size and orientation profile of computationally-identified Arabidopsis candidate DREs and their genomic frequency The total number of identified inter-contig regions was 16,176. The number of inter-contig regions comprising candidate DREs located upstream of contigs assembled from clusters composed of > 3 ESTs or > 1 RNA was found to be 4588.
- Database design The database of computationally identified plant candidate DRE sequences described above was annotated with the following data for each candidate DRE entry: DRE name, DRE origin, nucleotide sequence and orientation with respect to putatively regulated downstream sequences, biological parameter annotations corresponding to those of downstream CDSs (i.e., genes) and references to related patents.
- Biological annotations included corresponding gene name, gene products, functions of gene products, mutant phenotype, homologous genes, gene expression patterns, alternative splice variants and antisense RNA transcripts. This annotated database was employed to analyze the genomic distribution of DREs with respect to various biological characteristics and to identify candidate DREs having selected biological characteristics.
- Results Computational identification of Arabidopsis candidate DREs according to selected biological parameters. A total of 1,060 DREs were computationally selected and scored according to their probability of matching biological criteria including capacity to drive constitutive gene expression (i.e., strong expression in many parts of the plants), capacity to drive TF gene expression and capacity to drive organ specific expression.
- the databases of the present invention represent a novel and superior means with which to rapidly and efficiently identify novel DREs, such as plant DREs, having desired gene regulatory characteristics. Since the sequences of such DREs are provided by the databases of the present invention, these can be cloned and used to generate transgenic plants having desired characteristics which could be exploited to great benefit by industries plants or plant products having novel or desired characteristics.
- the databases of the present invention can be utilized to provide the nucleic acid sequences of plant DREs having desired biological characteristics.
- the ability to efficiently isolate and clone such DREs is highly desirable since this enables construction of nucleic acid constructs capable of being used to generate transgenic plants possessing desired characteristics.
- Such a capacity is of enormous potential impact in industries employing plants or plant products.
- the present inventors have computationally identified primer sequences suitable for the PCR amplification and cloning thereof and have annotated the DRE databases of the present invention with such primer sequences, as follows.
- PCR amplification of candidate DREs Candidate DREs listed in Table 7 were PCR amplified via nested PCR amplification of gDNA using two sets of primers computationally selected using PRIMER3 ® software (EMBL) with modifications. A first, "external" pair of primers was designed to amplify a secondary template PCR product corresponding to a sequence extending 300-2000 bp upstream and downstream beyond the ends of candidate DRE sequences.
- a second, "internal" pair of primers was designed to amplify from the secondary template PCR product a clonable PCR product comprising a sequence starting within 75 nucleotides upstream of the candidate DRE sequence and extending to within the first 100 nucleotides of the downstream- flanking transcribed contig sequence.
- Internal primers were each designed to contain a unique restriction site so as to enable cloning of the final PCR product in the proper orientation in an expression vector for driving reporter gene expression.
- Table 7 Examples of DREs computationally selected and prioritized according to biological parameters.
- Computational score a relative score enabling prioritization of contigs from most to least relevant with respect to specific parameters.
- RNA molecules comprised in set of clustered sequences used to define contig putatively regulated by candidate DRE.
- Tissue- and inductive condition-specific expression of a reporter gene under the control of a computationally selected DRE in Arabidopsis plants The ability to genetically modify plants with DREs is highly desirable since these can be used to generate novel and/or selected gene expression patterns, thereby greatly facilitating the production of plants having novel and/or selected characteristics.
- novel patterns of gene expression can be obtained by combinatorial shuffling of heterologous DRE-structural gene pairs within a genome ( Figure 6).
- the present inventors have created DRE databases annotated with computationally selected primers capable enabling the identification and cloning of DREs having selected regulatory properties and have generate transgenic plants expressing transgenes with a selected pattern of gene expression therewith, as described hereinbelow. Materials and Methods:
- Candidate DREs computationally selected for driving high and constitutive, organ specific, or TF gene expression (listed in Table 7, above) were PCR amplified and cloned into a luciferase reporter gene expression vector, as depicted in Figure 5, using the binary vector pBIlOl (Clontech, USA).
- Plant growth and transformation Arabidopsis plants were grown and transformed using the constructs described above via a high throughput dipping protocol, as previously described (Clough SJ. and Bent AF. (1998) The Plant J. 16(6):735; Desfeux C. Plant Physiology 2000, 123:895), with minor modifications. Briefly, soil mixtures were mixed and irrigated immediately prior to sowing single plants in 250 ml pots. After sowing, pots covered with aluminum foil and plastic covers were incubated at 4 °C for 3-4 days prior to being transferred to a growth chamber at 18-24 °C with a 16 h/8 h on/off light cycle.
- Transformations were performed using transformation medium at pH 5.7 containing 0.5 MS (2.15 g/1), 0.044 ⁇ M BAP, 112 ⁇ g B5 Gam strig vitamins, 5 % sucrose, 200 ⁇ l/L Silwet L-77, 18.2 EC double-distilled water.
- Luciferase imaging Transformed Arabidopsis plantlets at a development stage of 2-3 true leaves were subjected to luminescence assays for detection of luciferase activity, as previously described (Meissner R., Plant J. 2000, 22:265) in a darkroom using an ultra-low light detection camera (Princeton Instruments Inc., USA). Experimental Results:
- Luciferase assays of transformed Arabidopsis plantlets identified DRE #10179 as driving high and constitutive gene expression in all parts of the plantlets, DRE #3714 as driving high and constitutive gene expression in flower buds, DRE #1927 as driving strong gene expression in all leaves, and DRE #24584 as driving strong TF gene expression mainly in the cotyledons. These results, demonstrated the capability of the bioinformatics procedures of the present invention to correctly identify DREs, to create a database from which these can be selected according to desired biological criteria and to design primers capable of amplifying these DREs. These results further demonstrated the capacity afforded by the methods described herein to rationally design and generate plants having novel and desired phenotypes resulting from modifications in gene regulation.
- One-tube method of cloning DREs in binary vectors DREs are cloned as follows. Arabidopsis thaliana (var coll) leaf gDNA is extracted using DNAeasy Plant Mini Kit (Qiagen, Germany). Primers for PCR amplification of DREs are designed using PRIMER3 software and modified to contain restriction sites absent from the DRE sequence, for PCR product insertion into pVERl binary plasmid.
- Plasmid vector pVerl derived from binary vector pBIlOl (Clontech), is double-digested using the same restriction endonucleases used to excise cloned DREs from vector, purified using PCR Purification Kit (Qiagen, Germany), treated with alkaline-phophatase (Roche) according to the manufacturer's instructions and re-purified using PCR Purification Kit (Qiagen, Germany). Insertion of DRE into vector pVerlvector is performed by adding to DRE digests: 500 ng of double digested pVerl plasmid, 1 ⁇ l of T4 DNA ligase (40 U/ ⁇ l; Roche) and 6 ⁇ l of T4 buffer (Roche).
- Agrobacterium tumefaciens GV303 competent cells are transformed using 1 -2 ⁇ l of ligation reaction mixture by electroporation, using a MicroPulser electroporator (Biorad), 0.2 cm cuvettes (Biorad) and EC-2 electroporation program (Biorad).
- Agrobacterium cells are grown on LB at 28 °C for 3 h and plated on LB-agar plates supplemented with the antibiotics gentamycin 50 mg/L (Sigma) and kanamycin 50 mg/L (Sigma). Plates are then incubated at 28 °C for 48 h.
- the present invention enables computational identification of candidate DREs and assignment of regulatory capacities thereto.
- the present inventors utilize a broad range of verification methods, as follows.
- Promine analysis Computationally identified candidate DREs selected for biological validation are PCR amplified using computationally selected primers, cloned into binary vectors having luciferase reporter genes, and the resultant vectors are transformed into Arabidopsis plants (5 plants are transformed with each construct. Seeds from transformed plants are harvested and sown on plates using growth medium containing kanamycin as a selection marker. Antibiotic resistant transformants are grown and 10 TI plants are kept per construct. T2 seeds from each plant are collected and grown in the presence of kanamycin and mature plants are analysed with luciferin.
- Manual data annotation Accurate and exhaustive manual annotation of data is used to optimize annotation of expressed sequence databases and candidate DRE sequence databases. For example, accurate classification of tissue, developmental stage and/or growth condition specificity of libraries from which clustered ESTs are derived ensures accurate annotation of expressed contig and candidate DRE sequences with respect to such biological characteristics.
- Biological specificity of contigs, CDSs and candidate DREs The percentage of clustered ESTs which define a contig or a CDS, and are uniquely specific to a given type of EST library is correlated with the probability that such contigs or CDSs are specifically expressed in cells whose tissue-, developmental stage-, and/or growth condition-specificity correspond to those of the cells from which such a library is derived. Conversely, the percentage of clustered ESTs which define a contig or a CDS, and are specific to multiple types of EST libraries is correlated with the probability that such contigs or CDSs are constitutively expressed.
- Clustering quality The number of ESTs assigned high interest scores, and used to define contigs, CDSs or candidate DREs is correlated with the probability that the nucleotide sequences of such contigs, CDSs or candidate
- DREs are accurate and that these contigs, CDSs or candidate DREs are indeed specific to the tissue of interest.
- DRY analysis was performed via TBLASTX homology analysis of candidate DRE internal PCR product sequences (see Example 5, above).
- Candidate DRE quality assurance Searches of GenBank nr identified 3836 transcribed nucleic acid sequences comprising regions being homologous to those of inter-contig regions which could be sorted into 3 categories ("True”, “Mixed” and “False”), as follows.
- True" DREs Inter-contig regions comprise non-transcribed nucleic acid sequences as well as, at their upstream ends, portions of expressed sequences actually belonging to the flanking contig. This can result from such expressed sequences not being listed in external databases as a result of incomplete or unsuccessful sequencing. Hence, such candidate DREs (comprising regions of > 200 bp in length not homologous to transcribed nucleic acid sequences) retained their candidate DRE status and are used without modification to regulate heterologous gene expression.
- Inter-contig regions comprise, at their downstream ends, sequences found to be homologous to transcribed nucleic acid sequences listed in external databases. Such inter-contig regions retained their status as comprising candidate DREs but transcribed portions thereof are removed to regulate expression of heterologous genes fused downstream.
- Homology of contig sequences flanking candidate DREs with known RNA sequences was found to increase the percentages of inter-contig regions confirmed as comprising candidate DREs to 96 and 97 %, respectively, out of all inter-contig regions identified, and out of the inter-contig regions remaining after discarding inter-contig regions upstream of contigs assembled from clusters comprising ⁇ 4 ESTs or no mRNAs (Figure 7).
- DREs were selected according to the quality of downstream contigs, as described above; strong expression, constitutive pattern of expression, organ specificity, interest score of downstream contigs, bidirectionality and TF function.
- Homology of candidate DREs to known promoters Database homology searches identified candidate DREs having homology to known promoter sequences.
- database searches identified 55 candidate DREs having homology to known promoter sequences, such candidate DREs putatively regulating: 5 TFs (identified via Pfam), 7 tissue-specific contigs, 4 constitutively expressed contigs and 4 quality contigs (satisfying clustering criteria, as described above and having high subjective interest scores).
- the quality of the computational DRE databases of the present invention is verified via positive control homology searches against biologically validated Arabidopsis promoters to see whether these promoters exist in the DRE database of the present invention.
- Positive control homology searches of the computational DRE databases of the present invention are performed against all known sequences annotated as plant promoters in GenBank.
- homology searches of the computational DRE databases of the present invention against GenBank nr are performed so as to detect inter-contig regions which are actually coding regions.
Landscapes
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Wood Science & Technology (AREA)
- Organic Chemistry (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Zoology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Plant Pathology (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002442024A CA2442024A1 (en) | 2001-03-29 | 2002-03-31 | Methods, platforms and kits useful for identifying, isolating and utilizing nucleotide sequences which regulate gene expression in an organism |
EP02717022A EP1373885A4 (en) | 2001-03-29 | 2002-03-31 | Methods, platforms and kits useful for identifying, isolating and utilizing nucleotide sequences which regulate gene expression in an organism |
US10/471,606 US20040121360A1 (en) | 2002-03-31 | 2002-03-31 | Methods, platforms and kits useful for identifying, isolating and utilizing nucleotide sequences which regulate gene exoression in an organism |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US27942501P | 2001-03-29 | 2001-03-29 | |
US60/279,425 | 2001-03-29 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2002079487A2 true WO2002079487A2 (en) | 2002-10-10 |
WO2002079487A3 WO2002079487A3 (en) | 2003-03-20 |
Family
ID=23068910
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IL2002/000267 WO2002079487A2 (en) | 2001-03-29 | 2002-03-31 | Methods, platforms and kits useful for identifying, isolating and utilizing nucleotide sequences which regulate gene expression in an organism |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1373885A4 (en) |
CA (1) | CA2442024A1 (en) |
WO (1) | WO2002079487A2 (en) |
ZA (1) | ZA200307503B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7695968B2 (en) | 2003-03-12 | 2010-04-13 | Evogene Ltd. | Nucleotide sequences regulating gene expression and constructs and methods utilizing same |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020029394A1 (en) * | 1999-12-22 | 2002-03-07 | Allen Stephen M. | Homologs of starch synthase DU1 |
US6363399B1 (en) * | 1996-10-10 | 2002-03-26 | Incyte Genomics, Inc. | Project-based full-length biomolecular sequence database with expression categories |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4761367A (en) * | 1984-11-07 | 1988-08-02 | The University Of North Carolina At Chapel Hill | Vectors suitable for detection of eukaryotic DNA regulatory sequences |
WO1995004830A1 (en) * | 1993-08-06 | 1995-02-16 | Biotechnology Research And Development Corporation | Mycoplasma expression system |
WO2000029594A1 (en) * | 1998-11-16 | 2000-05-25 | Advanta Technology Limited | Root-specific promoter |
DE19941606A1 (en) * | 1999-09-01 | 2001-03-08 | Merck Patent Gmbh | Method for determining nucleic acid and / or amino acid sequences |
CA2398790A1 (en) * | 2000-01-28 | 2001-08-02 | The Scripps Research Institute | Methods of identifying synthetic transcriptional and translational regulatory elements, and compositions relating to same |
-
2002
- 2002-03-31 EP EP02717022A patent/EP1373885A4/en not_active Withdrawn
- 2002-03-31 WO PCT/IL2002/000267 patent/WO2002079487A2/en not_active Application Discontinuation
- 2002-03-31 CA CA002442024A patent/CA2442024A1/en not_active Abandoned
-
2003
- 2003-09-26 ZA ZA200307503A patent/ZA200307503B/en unknown
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6363399B1 (en) * | 1996-10-10 | 2002-03-26 | Incyte Genomics, Inc. | Project-based full-length biomolecular sequence database with expression categories |
US20020029394A1 (en) * | 1999-12-22 | 2002-03-07 | Allen Stephen M. | Homologs of starch synthase DU1 |
Non-Patent Citations (1)
Title |
---|
See also references of EP1373885A2 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7695968B2 (en) | 2003-03-12 | 2010-04-13 | Evogene Ltd. | Nucleotide sequences regulating gene expression and constructs and methods utilizing same |
Also Published As
Publication number | Publication date |
---|---|
ZA200307503B (en) | 2004-09-06 |
EP1373885A4 (en) | 2004-06-23 |
CA2442024A1 (en) | 2002-10-10 |
EP1373885A2 (en) | 2004-01-02 |
WO2002079487A3 (en) | 2003-03-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Curtin et al. | Validating genome-wide association candidates controlling quantitative variation in nodulation | |
Zeng et al. | Isolation and characterization of genes associated to cotton somatic embryogenesis by suppression subtractive hybridization and macroarray | |
Jung et al. | Towards a better bowl of rice: assigning function to tens of thousands of rice genes | |
Cheng et al. | An efficient reverse genetics platform in the model legume M edicago truncatula | |
Biswal et al. | CRISPR mediated genome engineering to develop climate smart rice: Challenges and opportunities | |
Brutnell | Transposon tagging in maize | |
Thole et al. | Distribution and characterization of more than 1000 T‐DNA tags in the genome of Brachypodium distachyon community standard line Bd21 | |
CN110218810B (en) | Promoter for regulating and controlling maize tassel configuration, molecular marker and application thereof | |
CN110862993B (en) | Gene ZKM89 for controlling plant height and ear position height of corn and application thereof | |
CN111741969A (en) | Corn gene KRN2 and application thereof | |
Gunadi et al. | Characterization of 40 soybean (Glycine max) promoters, isolated from across 5 thematic gene groups | |
Wang et al. | Identification and characterization of long noncoding RNA in Paulownia tomentosa treated with methyl methane sulfonate | |
Cui et al. | Advances in cis-element-and natural variation-mediated transcriptional regulation and applications in gene editing of major crops | |
Campos-de Quiroz | Plant genomics: an overview | |
US20040121360A1 (en) | Methods, platforms and kits useful for identifying, isolating and utilizing nucleotide sequences which regulate gene exoression in an organism | |
US20220251589A1 (en) | RHIZOBIAL tRNA-DERIVED SMALL RNAs AND USES THEREOF FOR REGULATING PLANT NODULATION | |
WO2002079487A2 (en) | Methods, platforms and kits useful for identifying, isolating and utilizing nucleotide sequences which regulate gene expression in an organism | |
US20220145312A1 (en) | Compositions and methods for driving t1 event diversity | |
Wang et al. | Genome variation and LTR-RT analyses of an ancient peach landrace reveal mechanism of blood-flesh fruit color formation and fruit maturity date advancement | |
CN111534536B (en) | Method for improving rice blast resistance and related biological material thereof | |
AU2002247942A1 (en) | Methods, platforms and kits useful for indentifying, isolating and utilizing nucleotide sequences which regulate gene expression in an organism | |
Boopathi et al. | Comparative miRNAome analysis revealed numerous conserved and novel drought responsive miRNAs in cotton (Gossypium spp.) | |
Zhang et al. | Construction and characterization of normalized cDNA library of maize inbred MO17 from multiple tissues and developmental stages | |
Zhang et al. | Tissue-specific transcriptomic profiling of sorghum propinquum using a rice genome array | |
CN110129359B (en) | Method for detecting gene editing event and determining gene editing efficiency and application thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ CZ DE DE DK DK DM DZ EC EE EE ES FI FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 10471606 Country of ref document: US Ref document number: 2442024 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2002717022 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 200307503 Country of ref document: ZA Ref document number: 2003/07503 Country of ref document: ZA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2002247942 Country of ref document: AU |
|
WWP | Wipo information: published in national office |
Ref document number: 2002717022 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
NENP | Non-entry into the national phase in: |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2002717022 Country of ref document: EP |