WO2011053864A2 - Bacterial metastructure and methods of use - Google Patents
Bacterial metastructure and methods of use Download PDFInfo
- Publication number
- WO2011053864A2 WO2011053864A2 PCT/US2010/054857 US2010054857W WO2011053864A2 WO 2011053864 A2 WO2011053864 A2 WO 2011053864A2 US 2010054857 W US2010054857 W US 2010054857W WO 2011053864 A2 WO2011053864 A2 WO 2011053864A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- genome
- transcription
- rna
- organism
- sequence
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1086—Preparation or screening of expression libraries, e.g. reporter assays
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
Definitions
- the invention relates generally to determining the organizational structure of bacterial genomes, and more specifically to methods for iteratively integrating multiple genome-scale measurements on the basis of genetic information flow to identify the organizational elements and mapping them onto the genome sequence.
- a transcription unit is defined as having one or more ORFs that are transcribed from one promoter into a single mRNA.
- the present invention is based on the finding that multiple genome-scale measurements may be used to determine the organizational structure of bacterial genomes.
- the invention provides a method that iteratively integrates multiple genome-scale measurements on the basis of genetic information flow to identify the organizational elements and map them onto the genome sequence.
- the method includes data generation steps and data integration steps to determine the metastructure of the organism under consideration.
- FIG. 1 A flowchart of the systematic iterative integration process is given in Figure 1. Genome-wide data generated by multiple high-throughput (HT) technology platforms, including RNA polymerase binding regions, transcripts, transcription start sites (TSSs) and peptides, re-integrated based on the work flow depicted.
- HT high-throughput
- the invention provides a method to determine the metastructure of a microbial genome.
- the method includes (a) the generation of multiple different omics data types (b) systematic integration in a biochemically structured setting and (c) determining the metastructure by finding transcription start sites, translation start sites, binding sites for RNA polymerase and key regulatory protein.
- the metastructure includes many genetic elements and genomic features elements, including; operons, sub-operons, alternative RNA polymerase binding sites, small RNAs and non-coding regions Importantly, the metastructure leads to important corrections of a sequence based annotation approaches.
- the metastructure is foundational to understanding the makeup, function and engineering of a microorganism.
- Engineered bacterial strains can produce chemical entities of commercial value, which are chemicals, antibiotics, therapeutic proteins, nucleotides and peptides.
- the systematically designed bacterial strains guided by the metastructure can be optimized by the use of adaptive evolution approach and/or computational optimization procedures.
- the method includes the steps of (a) obtaining the full genome sequence a target organism; (b) obtaining the genome-wide binding of RNA polymerase from the organism; (c) obtaining the transcription of RNA from the organism; (d) obtaining the 5' end sequence of the RNA molecules from the organism; (e) obtaining proteomic data from the total protein isolated from the organism; (f) obtaining the data described in (b) through (e) under a series of culture conditions for the organism; and (g) iteratively mapping the data sets described in (f) onto the DNA sequence in (a) to build the metastructure for the target organism.
- the method further includes obtaining transcription boundaries from the genome- wide binding of RNA polymerase and transcription of RNA; assigning the 5' end sequence of the RNA molecules to each transcription boundary; and assigning the open reading frames to each transcription boundary, thereby identifying modular units on a genome-scale for said target organism.
- the method further includes determining a change point in the DNA genomic sequence of RNA expression levels; combining the modular units based on the change points into TUs;
- the target organism may be any bacterial or archeal organism.
- Exemplary methods of obtaining the genome- wide binding of RNA polymerase include, but are not limited to chromatin immunoprecipitation coupled with a microarray, and deep sequencing of immunoprecipitated DNA.
- Exemplary methods of obtaining the transcription of RNA include, but are not limited to, use of tiled expression arrays and/or use of deep sequencing of the isolated RNA.
- the 5' end sequence of the RNA molecules is obtained by deep sequencing of RNA.
- the proteomic data from the total protein is obtained by mass spectrometry.
- a list of open reading frames is obtained from said proteomic data.
- the culture conditions are selected from the group consisting of oxygen levels, nutrient levels, temperature, pressure, light, metal, other chemicals, and other environmental stimuli.
- the invention provides a method for designing tunable promoters that function in the context of the entire organism to produce a protein in a culture condition specific manner.
- the method includes identifying a plurality of TUs that contain the same genes but different starting sites; selecting one of said TUs based on start site properties that are used in a culture condition specific manner; choosing said start site properties based on the start site itself and the UTR sequence and its associated regulatory function, thereby expressing the target gene to produce the specified protein under the chosen culture condition.
- the protein is a heterologus protein introduced into the modular unit(s) of the TU desired to be produced under the chosen cell culture condition.
- the UTR of specified properties is introduced upstream from the gene in a modular unit of interest such that the encoded protein is produced under the chosen cell culture condition.
- the invention provides a library of reporter vectors to specify the expression level of a protein in a TU.
- the library includes a plurality of different plasmids defined by a TSS and 5 'UTR derived from the metastructure of said target organism; and a reporter gene that produces a detectable protein product.
- a selectable marker gene is introduced to enable the isolating and cloning of a strain that harbors a particular plasmid in the library.
- Figure 1 shows a flowchart of the systematic iterative integration process.
- Figure 2 shows an integration of RNAP-binding maps and transcripts results in RNAP-binding regions (RBRs).
- Figure 3 shows that transcriptomic signals were transformed to binary calls and integrated with RBRs resulting in RNAP-guided transcript segments, that is, RTSs (RNAP- guided transcript segments).
- Figure 4 shows determination of TSS by mapping TSS reads to RTS, using a window size of 200 bp and cutoff of 60%.
- Figure 5 shows to address how many ORFs are within one RTS, peptide reads were mapped onto pORFs, which were determined independently of the current genome annotation.
- RTS can contain multiple pORFs.
- Figure 6 shows the genome-scale regulatory network of sigma factors.
- Figure 7 shows the determination of TUs and use of alternative TSSs.
- Modular units MU
- MU Modular units
- FWD-1 containing thrA
- FWD-2 containing thrBC
- Figure 8 shows the stpA gene and the UvKHMGF operon have multiple
- FIG. 9 shows the typical upstream region of a gene, which includes UP element, -35 and -10 region, +1 (TSS), ribosome-binding site (RBS), and translation start site codon (ATG).
- Figure 10 shows the plasmid map for the library.
- Figure 11 shows the overall scheme to construct the engineered strain.
- Figure 12 shows the path for wild-type strain to obtain the optimality.
- Figure 13 shows static and dynamic maps of RNA polymerase binding.
- RNA polymerase RNA polymerase
- binding locations i.e., promoter regions
- RNAP RNA polymerase
- Examples of RNA polymerase (RNAP) binding under different growth conditions log phase, red; heat-shocked, grey; stationary phase, orange). Binding of RNAP was determined by the static map although regions of log phase cells or log phase and heat-shocked cells did not show RNAP binding under the dynamic map. Regions of differential binding are highlighted, (c) Static RNAP-binding maps of log phase and leucine condition. It was observed differential RNAP-binding levels, however, the binding locations of RNAP was nearly identical.
- FIG 14 shows a comparison of RNAP-guided transcript segment (RTS) to change point algorithm and running-window approach.
- RTS RNA polymerase binding regions
- BT binary transcript calls
- RTS based on integration of two experimental derived genome- wide data sets, yielded the best results when compared to change point algorithm (CP) and running window approach (RW).
- CP change point algorithm
- RW running window approach
- Figure 15 shows an Increase of genomic coverage and accuracy by iterative integration. Iterative integration of transcripts, derived from various growth conditions, with RNA polymerase binding regions (RBRs) resulted in increased genomic coverage and accuracy (a, b, c), genes of interest are highlighted in red. Iteration of data from various growth conditions (log phase; heat-shocked; stationary phase shown) also allowed for determination of condition-specific transcripts, such as yjcC (b) and ybaE (c) from stationary growth phase, and soxR (b) from heat-shocked cells.
- RBRs RNA polymerase binding regions
- FIG 16 shows the discovery of new transcripts. New transcripts were determined by systematic and iterative integration of RNA polymerase binding regions (RBRs) with binary transcript calls (BT) resulting into RNAP-guided transcript segments (RTSs). New transcripts (highlighted in red) were discovered on opposite strands (a, b), as well as in intergenic regions (c, d).
- RBRs RNA polymerase binding regions
- BT binary transcript calls
- RTSs RNAP-guided transcript segments
- Figure 17 shows Flowcharts of the molecular biology tool box for the elucidation of the organizational components. Various genome-scale methods were deployed and developed to determine the meta-structure. Methods are depicted here include (a)
- transcription profiling (b) transcription start site (TSS) profiling, (c) chromatin
- Figure 18 shows Overlapping pORFs.
- Figure 19 shows the number of unique peptides from pORFs with accurate and inaccurate boundaries.
- 803 pORFs mapped to the validated ORFs (from EcoGene) a total of 507 pORFs showed accurate translation start/stop positions (filled circle).
- pORFs with non-matching translation start positions (296 pORFs) exhibited poor peptide coverage (open circle). Due to this coverage limitation, additional methods (e.g., proteomics with N- terminal modification) have to be applied to obtain a more comprehensive and accurate ORF map at a genome-scale.
- FIG. 20 shows use of alternative TSSs.
- the serA gene, serC-aroA operon, and gltBDF operon have multiple experimentally verified TSSs.
- the dominant TSS The dominant TSS
- Figure 21 shows 5'UTR length of various functional categories, (a) distribution of 5'UTR shows a median length maximum of ⁇ 36 bp, (b) comparison of 5'UTR length (in base pairs) showed no difference between different functional categories.
- the present invention provides the novel metastructure of bacterial genomes by integrating multiple genome-scale information yielded by high-throughput technologies.
- the metastructure of a bacterial genome is comprised of promoters, transcription start (TSSs) and termination sites, open reading frames (ORFs), regulatory noncoding regions (RNRs), untranslated regions (UTRs) and transcription units (TUs). All these elements measured at the genome scale and properly integrated comprise the metastructure of a genome.
- the term “genome” refers to the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA. Thus, a “gene” refers to a stretch of DNA that encodes for a functional polypeptide chain or RNA molecule. A gene is limited by a start codon and a stop codon. A codon is a sequence of three adjacent nucleotides in a nucleic acid that code for a specific amino acid. As used herein, the term “genetic” refers to the heritable information encoded in the sequence of DNA nucleotides.
- the term "genetic characterization” is intended to mean the sequencing, genotyping, comparison, mapping or other assay of the information encoded in DNA.
- the scope (e.g., extent, scale, etc.) of the genetic characterization is substantially genomic in scale so that a comprehensive assessment of all the genetic elements (known or unknown) can be simultaneously assessed.
- Substantially comprehensive evaluation ideally includes a full genome-scale re-sequencing of the organism's genome. In cases where full genomic sequencing is not possible, such as due to extensive sequence repeat regions, a
- genetic basis refers to the underlying genetic or genomic cause of a particular observation. Also included in the term is the most important reason for the occurrence of the observation.
- a "discrete genomic region” as used herein, is intended to mean a contiguous region or portion of a genome.
- a genome, or portion thereof, may be fractionated into any number of different discrete genomic regions to be analyzed.
- a discrete genomic region may be defined as a region of the genome including one or more probe sequences.
- a discrete genomic region may be defined as a region of the genome that includes two or more probe sequences separated by less than about 10,000, 5,000, 4,000, 3,000, 2,000 or 1,000 base pairs.
- “Tiling” refers to a process involving analyzing a particular discrete genomic region by moving along the genomic sequence in a frame- wise fashion to determine appropriate probe sequences used to generate probes that are used to manufacture the array.
- a genomic region may be tiled with different sizes of oligonucleotide sequences.
- oligonucleotide sequences may be about 15-20, 20-25, 25-30, 30-35, 35-40, 40-45, 45-50, 50-55, 55-60, 60-65, 65-70, 70-75, 75-80, 80-85, 85-90, 90-95 or 95-100 base pairs in length.
- the size of each frame may be determined by the length of the oligonucleotide used to tile the region and the frame of the frame-wise shift may overlap or skip regions of the genomic region by a specific number of base pairs.
- tiling of the genomic region is performed using
- oligonucleotide sequences of about 50 base pairs and about 35 base pairs apart.
- DNA or "deoxyribonucleic acid” refers to a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms. The main role of DNA molecules is the long-term storage of information.
- RNA refers to a molecule that consists of a long chain of nucleotide units.
- RNA is very similar to DNA, but differs in a few important structural details: in the cell, RNA is usually single-stranded, while DNA is usually double-stranded; RNA nucleotides contain ribose while DNA contains deoxyribose (a type of ribose that lacks one oxygen atom); and RNA has the base uracil rather than thymine that is present in DNA.
- RNA is transcribed from DNA by enzymes called RNA polymerases and is generally further processed by other enzymes.
- RNA polymerase refers to an enzyme that produces RNA. In cells, RNAP is needed for constructing RNA chains from DNA genes as templates, a process called transcription.
- the term "5 '-end” designates the end of the DNA or RNA strand that has the fifth carbon in the sugar-ring of the deoxyribose or ribose at its terminus.
- the genomes of complex organisms are known to vary in GC content along their length. That is, they vary in the local proportion of the nucleotides G and C, as opposed to the nucleotides A and T. Changes in GC content are often abrupt, producing well-defined regions. Such abrupt changes are referred to herein as "change points.”
- the term "metastructure” refers to the components of a genome, such as, but not limited to, promoters, transcription start (TSSs) and termination sites, open reading frames (ORFs), regulatory noncoding regions (RNRs), untranslated regions (UTRs) and transcription units (TUs) of an organism of interest.
- an "open reading frame” refers to a portion of an organism's genome which contains a sequence of bases that could potentially encode a protein.
- the start and stop ends of the ORF are not equivalent to the ends of the mRNA, but they are usually contained within the mRNA.
- ORFs are located between the start-code sequence (initiation codon) and the stop-code sequence (termination codon).
- a "transcription unit” refers to a stretch of DNA, which consists of a promoter site, 5' untranslated (5'-UTR) sequence, a transcription terminator, 3' untranslated (3'-UTR) sequence, and the stretch of DNA, which can be transcribed into an RNA molecule (can be mRNA, tRNA, rRNA, miscellaneous RNA).
- a gene or operon can be controlled by different promoters, hence, resulting in different TUs. Also, the operon length may vary depending on the transcriptional termination signal, yielding in different TUs.
- a "transcription start site” refers to the genomic position where transcription begins.
- Primer extension can be used to determine the start site of RNA transcription for a known gene.
- This technique requires a radiolabeled primer (usually 20 - 50 nucleotides in length) which is complementary to a region near the 5' end of the gene.
- the primer is allowed to anneal to the RNA and reverse transcriptase is used to synthesize complementary cDNA to the RNA until it reaches the 5' end of the RNA.
- By running the product on a polyacrylamide gel it is possible to determine the TSS, as the length of the sequence on the gel represents the distance from the start site to the radiolabeled primer.
- re-sequencing refers to a technique that determines the sequence of a genome of an organism using a reference sequence that has already been completely determined. It should be understood that resequencing may be performed on both the entire genome of an organism or a portion of the genome large enough to include the genetic change of the organism as a result of selection.
- genetic material refers to the DNA within an organism that is passed along from one generation to the next. Normally, genetic material refers to the genome of an organism. Extra-chromosomal, such as organelle or plasmid DNA, can also be a part of the 'genetic material' that determines organism properties. As used herein,
- regulatory region when used in reference to a gene or genome, refers to a DNA sequence that controls gene expression.
- a “gene product” refers to biochemical material, either RNA or protein, resulting from expression of a gene. Thus, a measurement of the amount of gene product is sometimes used to infer how active a gene is.
- the term “genetic change” or “genetic adaptation” refers to one or more mutations within the genome of an organism.
- mutation refers to a difference in the sequence of DNA nucleotides of two related organisms, including substitutions, deletions, insertions and rearrangements, or motion of mobile genetic elements, for example.
- introduction refers to the putting of something such as a genetic change into something else, such as an organism. As such, the term
- mutagenesis is intended to mean the introduction of genetic change(s) into an organism.
- polypeptide refers to two or more amino acid residues joined to each other by peptide bonds or modified peptide bonds.
- the terms apply to amino acid polymers in which one or more amino acid residue is an artificial chemical mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers, those containing modified residues, and non-naturally occurring amino acid polymer.
- Polypeptide refers to both short chains, commonly referred to as peptides, oligopeptides or oligomers, and to longer chains, generally referred to as proteins. Polypeptides may contain amino acids other than the 20 gene- encoded amino acids.
- protein refers to at least two covalently attached amino acids, which includes proteins, polypeptides, oligopeptides and peptides.
- a protein may be made up of naturally occurring amino acids and peptide bonds, or synthetic peptidomimetic structures.
- amino acid or “peptide residue”, as used herein means both naturally occurring and synthetic amino acids. For example, homo-phenylalanine, citrulline and noreleucine are considered amino acids for the purposes of the invention.
- Amino acid also includes imino acid residues such as proline and hydroxyproline. The side chains may be in either the (R) or the (S) configuration.
- proteomics refers to the large-scale study of proteins, particularly their structures and functions.
- the term "mass spectrometry” refers to an analytical technique that measures the mass-to-charge ratio of charged particles.
- Exemplary uses for the technique include, but are not limited to, determining masses of particles, determining the elemental composition of a sample or molecule, and elucidating the chemical structures of molecules, such as peptides and other chemical compounds.
- the technique consists of ionizing chemical compounds to generate charged molecules or molecule fragments and measurement of their mass-to-charge ratios.
- ChlP-on-chip or “ChlP-chip” refer to a technique that combines chromatin immunoprecipitation ("ChIP") with microarray technology (“chip”). Like regular ChIP, ChlP-on-chip is used to investigate interactions between proteins and DNA in vivo. Specifically, it allows the identification of the cistrome, sum of binding sites, for DNA-binding proteins on a genome-wide basis. Whole-genome analysis can be performed to determine the locations of binding sites for almost any protein of interest.
- the term "tiling array” refers to a subtype of a microarray wherein probes are short fragments that are designed to cover the entire genome or contiguous regions of the genome. Depending on the probe lengths and spacing, different degrees of resolution can be achieved. The number of features on a single array can range from 10,000 to greater than 6,000,000, with each feature containing millions of copies of one probe.
- Traditional DNA microarrays designed to look at gene expression use a few probes for each known or predicted gene. In contrast, tiling arrays can produce an unbiased look at gene expression because previously unidentified genes can still be incorporated.
- the term "deep sequencing” refers to the next-generation of sequencing technologies that generate huge numbers of sequencing reads per experiment or instrument run.
- These sequencing-based approaches have some distinct advantages over microarray-based approaches for genome-wide transcriptomics (the study of gene expression) and epigenomics (the study of chromatin organization and dynamics), such as avoiding complex intermediate cloning and microarray construction steps and the ability to generate a massive amount of sequence quickly.
- gene expression is assayed by directly sequencing cDNA molecules obtained from an mRNA sample and simply counting the number of molecules corresponding to each gene to assess transcript abundance.
- Deep sequencing includes, but are not limited to, massively parallel signature sequencing (MPSS), sequencing by synthesis (SBS), 454 Life Sciences' SBS pyrosequencing method, Applied Biosystems' SOLiD sequencing by ligation system, and Helicos Biosciences' single-molecule synthesis platform.
- MPSS massively parallel signature sequencing
- SBS sequencing by synthesis
- 454 Life Sciences' SBS pyrosequencing method 454 Life Sciences' SBS pyrosequencing method
- Applied Biosystems' SOLiD sequencing by ligation system and Helicos Biosciences' single-molecule synthesis platform.
- condition refers to any external property that causes an organism to genetically adapt, evolve, change or mutate for survival.
- exemplary “conditions” or “environments” include, but are not limited to, a particular medium, volume, vessel, temperature, mixing, aeration, gravity,
- condition or “environments” are substances that are toxic to the organism, such as heavy metals, antibiotics and chlorinated compounds. It should be understood that time may also be considered a "condition” since organisms are not static entities. Thus, a culture grown over an extended period of time (e.g., days, weeks, months, years) may produce different strains over the course of its genetic adaptation. An exemplary period of time is 4 to 180 days.
- clone refers to a single cell or population of cells that originated from a single cell.
- a clone is known to consist of cells with only one genotype or to have had a single genotype previously.
- population is intended to mean a group of individuals or cells.
- a “mixed population” therefore refers a group of cells from multiple species or to the collective genomes of naturally occurring organisms.
- the term “medium” or “media” refers to the chemical environment to which an organism is subjected or is provided access.
- the organism may either be immersed within the media or be within physical proximity thereto.
- Media are typically composed of water with other additional nutrients and/or chemicals that may contribute to the growth or maintenance of an organism.
- the ingredients may be purified chemicals (i.e., "defined” media) or complex, uncharacterized mixtures of chemicals such as extracts made from milk or blood. Standardized media are widely used in laboratories. Examples of media for the growth of bacteria include, but are not limited to, LB and M9 minimal medium.
- minimal when used in reference to media refers to media that support the growth of an organism, but are composed of only the simplest possible chemical compounds.
- M9 minimal medium is composed of the following ingredients dissolved in water and sterilized: 48 niM Na 2 HP0 4 , 22 mM H 2 P0 4 , 9 mM NaCl, 19 mM NH 4 C1, 2 mM MgS0 4 , 0.1 mM CaCl 2 , 0.2% carbon and energy source (e.g., glucose).
- energy source e.g., glucose
- the term "culture” refers to medium in a container or enclosure with at least one cell or individual of a viable organism, usually a medium in which that organism can grow.
- continuous culture is intended to mean a liquid culture into which new medium is added at some rate equal to the rate at which medium is removed.
- a batch culture is intended to mean a culture of a fixed size or volume to which new media is not added or removed.
- organism refers both to naturally occurring organisms and to non- naturally occurring organisms, such as genetically modified organisms.
- An organism can be a virus, a unicellular organism, or a multicellular organism, and can be either a eukaryote or a prokaryote. Further, an organism can be an animal, plant, protist, fungus or bacteria.
- Exemplary organisms include, but are not limited to bacterial organisms, which include a large group of single-celled, prokaryote microorganisms, and archeal organisms, which include a group of single-celled microorganisms.
- Archaea and bacteria are quite similar in size and shape.
- archaea possess genes and several metabolic pathways that are more closely related to those of eukaryotes: notably the enzymes involved in transcription and translation.
- the metastructure for a target bacterial organism is a universal metabolic engineering platform enabling a rational design through optimization of gene and protein expression.
- the engineered bacterial strains can produce chemical entities of commercial value, which are chemicals, antibiotics, therapeutic proteins, nucleotides and peptides.
- the systematically designed bacterial strains guided by the metastructure can be optimized by the use of adaptive evolution approach and/or computational optimization procedures (see U.S. Patent No. 7,127,379, incorporated herein by reference).
- a reporter DNA vector library comprising promoter and reporter gene, wherein each promoter comprises a nucleic acid, whose sequence represents a condition- specific alternative transcription start site and other promoter elements.
- the reporter system provides a "library kit" to screen novel bacterial strains as the producer of commercially valuable chemical entities.
- the present invention provides a method of building a metastructure for a target organism.
- the method includes iterative integration of multiple genome-scale measurements of RNA polymerase binding locations, mRNA transcript abundance, 5' sequences and translation into proteins on the basis of genetic information flow to determine the metastructure of a bacterial genome as a universal metabolic engineering platform.
- the invention includes obtaining the full genome sequence a target organism, obtaining the genome-wide binding of RNA polymerase from the organism, obtaining the transcription of RNA from the organism, obtaining the 5' end sequence of the RNA molecules from the organism, obtaining proteomic data from the total protein isolated from the organism, obtaining the data obtained above under a series of culture conditions for the organism, and iteratively mapping the data from the series of culture conditions onto the DNA sequence of the target organism to build the metastructure for the target organism.
- the metastructure provides experimentally verified genome-scale transcription units along with alternative TSSs and 5' UTR and methods to engineer biochemical reaction network of a bacterial cell using them.
- the level of gene expression is tightly connected to the use of alternative TSSs and the sequence of 5 'UTR in the promoter under specific growth conditions. Therefore, the method provided by this invention is to produce tunable (on/off) promoters regulating the level of targeted gene expression to engineer biochemical reaction network using deletion and/or alteration of the selected alternative TSSs and/or 5'UTR of transcription units.
- the tunable effect can not be produced by the conventional deletion and/or overexpression of the genes in the transcription unit.
- the modification of the alternative TSSs and/or 5'UTR produces regulatable or tunable promoters of interest.
- the regulatable promoters required expensive, toxic or difficult-to-use inducers such as galactose, doxycycline or heat under the targeted growth conditions to produce compounds. Since this invention provides the use of altered native promoters (i.e., deletion or alteration of selected TSSs in the targeted promoter region), the promoter can be controllable by the growth condition of interest. Therefore, the optimal conditions of gene expression can be achieved without additional exogenous inducers.
- the engineered strains obtained by the conventional gene deletion and/or overexpression method can be physiologically unstable under multiple conditions due to the loss of conditional essential genes.
- the engineered strains achieved by this invention are remarkably stable, since such conditional essential genes can be expressed through the use of alternative TSSs.
- the engineered strains can be optimized to the desired performance by culturing the cells for a sufficient period of time so that the strains evolve to. In this way, the physiologically stable bacterial strains expressing the engineered biochemical reaction network can be obtained, which have the regulatable, tunable or controllable promoters.
- none of systematic use of alternative TSSs at the genome- scale is available for designing novel bacterial strains as the producer of commercially valuable chemical entities.
- each vector comprises at least one gene of interest and a promoter operatively linked thereto wherein each promoter comprises a nucleic acid, whose sequence was randomly mutated with respect to that of the wild-type promoter and cells comprising the same.
- Methods utilizing either the vectors or cells of the invention, in optimizing regulation of gene expression, protein expression, or optimized gene or protein delivery were described (WO 2007/079428 A2; Alper et al. (2005) PNAS, 102, 12678-12683).
- the present invention also provides a reporter strain library comprising the vectors.
- Each vector comprises nucleic acids, whose sequences represent one reporter gene (e.g., fluorescence genes or galactosidase gene), antibiotic resistance genes, multiple cloning sites, and a specific promoter.
- the promoter contains single alternative TSS and 5'UTR.
- Each vector in the library provides a desired level of expression of the reporter gene under the targeted culturing conditions. Therefore, strains with higher expression levels of genes of interest are obtained from the vectors under the specific culturing conditions.
- Another aspect of this invention provides a method to obtain genome-scale TUs.
- the modular unit is different from the classic definition of an operon, since operons do not allow for nested TUs. Consequently, the TU architectures of bacterial genomes that result from condition-dependent combination of the modular units were determined.
- a TU in a bacterial genome is defined as having multiple ORFs that are transcribed from one promoter to synthesize a single mR A transcript.
- expression levels of multiple modular units within a single TU remain constant without an expression gap between them, assuming an absence of differential mRNA degradation.
- Another aspect of this invention provides a method to engineer
- tunable/controllable/regulatable promoters examples include tunable (on/off) promoters regulating the level of targeted gene expression are described herein.
- Conditional use of sigma factors - transcription units can be transcribed in a condition-dependent manner through alternative sigma factor use.
- the genome-scale location map of sigma factors provides basic information to design the
- tunable/controllable/regulatable promoters For example, the genome-scale location of all sigma factors in E. coli has been determined in this invention. The number of promoters found in this invention are 1,527 (rpoD), 1,364 (rpoS), 539 (rpoH), 161 (rpoN), 64 (rpoE), 78 (fliA), and 2 (feci) ( Figure 6).
- the thrLABC operon is regulated by
- transcriptional attenuation which is modulated by the availability of charged isoleucyl- and threonyl-tRNA.
- additional promoter that found by this invention is located in front of thrB separately regulate thrBC under stationary growth phase.
- the promoter is conditionally activated by ⁇ ⁇ holoenzyme under stationary growth phase ( Figure 7). Based on this finding, the native tunable/controllable/regulatable promoters working under six conditions (log, stationary, mild heat-shocked, extreme heat-shocked, glutamine, and iron condtions) can be designed.
- Conditional use of alternative TSSs - transcription units can be transcribed in a condition-dependent manner through alternative TSS use.
- the use of alternative TSS can be determined by the novel 5'-RACE-seq method using a unique RNA adapter and massive- scale sequencing. For example, 4,133 TSSs were determined in E. coli genome. 35% of promoters contain multiple TSSs, representing the presence of alternative TSSs for large portions of the E. coli transcription units.
- the stpA gene and the livKHMGF operon encoding an H-NS-like DNA-binding protein and the leucine ABC transporter complex both have multiple experimentally verified TSSs.
- the dominant TSS (2,796,558) was detected, which is highly activated by the transcription factor Lrp.
- the two other TSSs (2,796,578 and 2,796,600) are therefore likely to be less utilized under the growth conditions.
- two confirmed TSSs were observed from the promoter region of livKHMGF operon. While the TSS (3,595,753) is dominantly utilized to transcribe the operon, the transcription factor Lrp apparently represses the other TSS (3,595,778) ( Figure 8).
- the native tunable/controllable/regulatable promoters working under three conditions log, stationary, and mild heat-shocked conditions
- 5 'UTR - 5 ' UTR regions were defined as DNA sequences between each TSS and translation start site of the first gene in the transcription unit ( Figure 9).
- the native tunable/controllable/regulatable promoters can be designed using deletion and/or alteration of the 5'UTR sequences. For example, the median length of E. coli 5'UTR was around 36 bp. The majority of TSSs ( ⁇ 93%) fall within 300 bp from the translation start site.
- Another aspect of this invention provides the core promoter elements (e.g., -10 (or extended -10), -35, and a spacer region) at the genome-scale, which can be used to design the promoters.
- Another aspect of this invention provides a reporter vector library to obtain optimal uses of alternative sigma factors, alternative TSSs or 5'UTR for the desired levels of expression of the targeted genes.
- Construction of the vectors - Each vector comprises at least one reporter gene (e.g., green florescence protein, lacZ, etc), antibiotics gene (ampicillin, kanamycin, or chloramphenicol resistance), replication origin, T7 priming site and a promoter operatively linked thereto, wherein each promoter comprises nucleic acids, whose sequences are amplified from native promoter ( Figure 10).
- the promoter sequence is a DNA sequence which is important for transcription of gene (or transcription unit) under the appropriate conditions.
- the promoter sequence can be mutated by site-directed mutagenesis to represent single transcription start site and 5'UTR in each vector.
- the vector library can be derived from information on alternative sigma factors, alternative TSSs or 5'UTR from Escherichia, Salmonella, Bacillus, Pseudomonas, Helicobacter, Streptomyces, Streptococcus,
- Lactobacillus Geobacter, Thermotoga, Vibrio, Yersinia or other prokaryotic cells.
- at least 4,661 vectors can be constructed from E. coli sigma factors, transcription start sites and 5'UTR information described here.
- Each vector can be evaluated for its promoter strength and translation efficiency under certain culture conditions, in terms of the resulting levels of messenger RNAs and proteins of the reporter gene.
- the culture conditions can be oxygen levels, nutrient levels, temperature, pressure, light, metals, other chemicals, or other environmental stimuli.
- the levels of messenger RNAs of the reporter gene can be measured by quantitative PCR (qPCR), oligonucleotide microarray platforms, microfludic platforms, Sanger sequencing platforms, or massive-scale sequencing platforms.
- the translation level of the reporter gene can be measured by fluorescence level or ⁇ -galactosidase activity. Based on the evaluation of promoter strength and translation efficiency under certain culture conditions, the tunable/controllable/regulatable conditions can be determined.
- Another aspect of this invention provides a method to engineer biochemical reaction network using the tunable/controllable/regulatable promoters (i.e., use of the sigma factors, alternative TSSs, or 5'UTR sequences). Examples of use of the sigma factors, alternative TSSs or 5'UTR sequences to engineer biochemical reaction network of a bacterial cell are described herein (see Figure 11).
- Selection of sigma factors, TSSs or 5'UTR sequences - from the sigma factor interation network, the house-keeping sigma factor or alternative sigma factors can be selected for obtaining the optimal or suboptimal biochemical reaction network properties.
- the alternative TSSs or 5'UTR sequences can be selected for obtaining the optimal or suboptimal biochemical reaction network properties.
- the native promoters of the selected genes or transcription units in the genome can be genetically manipulated.
- the vectors comprising alternative TSSs and 5'UTR sequences can be used to achieve the optimal or suboptimal biochemical reaction properties.
- Another aspect of this invention provides a method to optimize the engineered strain to the desired performance using growing the cells in certain period of time ( Figure 12). Cultivating the cells for a sufficient period of time under conditions allows the cells to evlve to the desired performance. Since this adaptive evolution process may itself determine the best set of kinetic parameters to achieve the optimal design, the use of
- tunable/controllable/regulatable promoters will accelerate the adaptive evolution process.
- the remaining culture was transferred into pre-warmed (50°C) medium and incubated for 10 min.
- ammonium chloride in the minimal medium was replaced by glutamine (2 g/L).
- glutamine (2 g/L).
- rifampicin-treated cells rifampicin dissolved in methanol was added to a final concentration of 150 ⁇ g/mL and subsequently stirred for 20 min. Cultures were monitored by observing cell density at 600 nm to verify inhibitory effects of rifampicin.
- ChIP -chip - Cells at appropriate cell density were cross-linked by 1%
- the cross-linked cells were harvested and washed three times with 50 mL of ice-cold TBS (Tris Buffered Saline).
- the washed cells were re-suspended in 0.5 mL lysis buffer composed of 50 mM Tris-HCl (pH 7.5), 100 mM NaCl, 1 mM EDTA, 1 ⁇ g mL RNaseA, protease inhibitor cocktail (Sigma) and 1 kU Ready-LyseTM lysozyme (Epicentre).
- the remaining ChlP-chip procedures were performed as described previously.
- the high-density oligonucleotide tiling arrays used to perform ChlP-chip analysis consisted of 371,034 oligonucleotide probes spaced 25 bp apart (25 bp overlap between two probes) across the E. coli genome
- the results from this analysis were not the binding positions (i.e., single binding peaks) but binding regions.
- the median position of those regions was then calculated to avoid detecting skewed position by unwanted noises. Since the median positions do not necessarily match to the probe positions of the microarray, the nearest probe positions were assigned to the median positions.
- the approach of identifying the RNAP-binding regions was to first determine binding locations from each data set and then combine the binding locations from at least five of the six datasets to define a binding region. ChlP-chip experiments are usually performed using multiple replicates, and it is common to average these replicates to produce on enrichment signal that is then analyzed for binding event information.
- RNA samples were isolated using RNeasy Plus Mini kit (Qiagen) in accordance with manufacturer's instruction. Subsequently, 20 ⁇ g of the purified total RNA sample was reverse transcribed with 1,500 U Superscript II reverse transcriptase (Invitrogen), 30 U SUPERase ln (Ambion), 750 ng random primer, 10 mM dNTP mixture containing 4 mM amino-allyl dUTP, 10 mM DTT and 8 ⁇ g mL actinomycin D. Actinomycin D was used to remove antisense transcript artefacts during the cDNA synthesis.
- the amino-allyl labeled cDNAs were purified with QIAquick PCR purification columns (Qiagen). Phosphate wash (5 mM KP0 4 and 80% ethanol) and elution buffer (4 mM KP0 4 ) were used to protect amino-allyl residues instead of using PE and PB buffers, respectively. The amino-ally labeled cDNAs were subsequently incubated with Cy5
- RNAP-guided transcript segments were employed to determine probes expressed above background level.
- negative control probes that represent non-specific background hybridization were selected to evaluate the significance of expression of individual probes (p-value calculation).
- the negative control probes were randomly selected based on the median signal intensity.
- the purpose of negative control probes is to estimate the background, non-binding probe signal. This is because the nucleotide sequence of the negative control probes does not match any region of the genome, and so no hybridization should occur with the negative control probes. Lacking the negative control probes on the array, it was reasoned that there are probes on the array that effectively act as negative control probes since not all of the genome is expressed in any one condition, and by implication there are probes for which no complementary transcript exists in the cell.
- the orphan calls were manually removed based on the presence calls from the opposite strand (i.e., if there are dense calls from opposite strand, the orphan calls of the strand were removed). Then, genomic coordinates of the first and last presence calls between two RNAP-binding regions were assigned to the start and end genomic coordinates of RNAP-guided transcript segment. However, in some cases, the RNAP-binding regions did not allow us to select correct position of first expressed probes, since the median probe position was assigned to the RNAP-binding region. Therefore, the first probe position was manually assigned to the RNAP-guided transcript segment.
- RNAP-binding regions A minority (less than 2%) of transcribed regions lacked RNAP-binding regions (a total of 98 RNAP-guided transcript segments). Unlikely long RNAP-guided transcript segments and another RNAP-guided transcript segment at the opposite strand were detected. Without being bound by theory, these cases were considered due to the low gene expression and the failure to detect RNAP- binding regions. Therefore, the RNAP-guided transcript segments were manually divided into two segments. However, it was expected that expression of those regions might increase when different growth conditions are applied.
- RNAP-guided transcript segments genome-wide summary of piece-wise constant expression segments (i.e., RNAP-guided transcript segments) were obtained along with their genomic coordinates and potential promoter regions.
- TSSs transcription start sites
- rRNA ribosomal RNA
- 5' -RNA adapter (5'- GUUCAGAGAGUUCUACAGUCCGACGAUC) (SEQ ID NO: 1)
- the enriched mRNA samples were incubated with 100 ⁇ of the adapter and 4 U of T4 RNA ligase (NEB).
- cDNAs were then synthesized from the adapter-ligated mRNA samples using random primers extended with 3 '-adapter sequence (5'-
- CAAGCAGAAGACGGCATACGANNNNNNNNN The mRNA samples were then reverse transcribed as described above to obtain cDNA samples.
- the cDNA samples were amplified using a mixture of 1 ⁇ , of the cDNA, 10 ⁇ , of Phusion HF buffer (NEB), 1 ⁇ of dNTPs (10 mM), 1 ⁇ SYBR green (Qiagen), 0.5 of HotStart Phusion (NEB), and 5 pmole of primer mix (5 '-CAAGCAGAAGACGGCATACGA (SEQ ID NO: 3) and 5 ' -AATGATACGGCGACC ACCGAC AGGTTCAGAGTTCTAC AGTCCGA (SEQ ID NO: 4)).
- the PCR mixture was denatured at 98°C for 30 sec and cycled to 98°C for 10 sec, 57°C for 20 sec and 72°C for 20 sec.
- the amplification was monitored on a LightCycler (BioRad) and stopped at the beginning of the saturation point.
- Fraction of the amplified DNA between 100 bp and 200 bp was then extracted from a 6% TBE gel after electrophoresis. Gel slices were dissolved in two volumes of EB buffer (Qiagen) and 1/10 volume of 3 M sodium acetate (pH 5.2).
- the amplified DNA was ethanol-precipitated and resuspended in EB buffer.
- Second PCR amplification was carried out for amplifying the DNA libraries to a total final mass up to 1 ⁇ g with as few PCR cycles as possible.
- the final amplified DNA libraries were purified using QIAquick PCR purification column and eluted in 35 ⁇ EB buffer. The samples were then quantified on aNanoDrop 1000 spectrophotometer.
- ORFs Predicting potential ORFs (pORFs) and mapping them onto RNAP-guided transcript segments - Proteomics data, using cells grown under log phase, heat-shocked conditions, and stationary phase, were obtained by using LC-FTICR mass spectrometry as described before. These proteomics data were analyzed by SEQUEST to match MS/MS spectra against the stop-to-stop peptide database. To generate this database, the E. coli genome sequence (NC 000913) was computationally segmented into stop-to-stop fragments considering two adjacent stop codons in all six translational frames and translated into peptides. The peptides were then chunked into 10-mer oligopeptides, retaining genomic position and frame information.
- the maximally extendable ORFs containing at least one peptide (in frame) from proteomics data were considered as preliminary pORFs.
- a total of 131 peptides ( ⁇ 0.3%) were removed because they did not map to any maximally extendable ORFs.
- the 131 peptides were obtained as unique ones from the mass spectrometry analysis, the existence of false positives in the unique peptides should be considered. Therefore, the difference between the filtered observation count of mapped unique peptides and those of unmapped ones was examined.
- mRNA transcript profiles were used to infer the translation directionality (i.e., translated strand) of the overlapped pORFs.
- This stringent analysis removed a total of 790 unique peptides.
- a total of 921 peptides (131 peptides from mORF mapping + 790 peptides from the above stringent test) were considered as the false positives, suggesting that the false positive discovery rate (FDR) was ⁇ 2%.
- FDR false positive discovery rate
- This analysis yielded 2,542 pORFs (FDR ⁇ 2%).
- each pORF was mapped to RNAP-guided transcript segment using their genomic positions.
- TUs transcription units
- the modular units were first assembled based on the break point results obtained from the change point detection algorithm.
- a total of 61 modular units ( ⁇ 2%) obtained from the current annotation lacked any experimentally determined organizational components.
- These modular units indicate that specific growth conditions are required to determine their organizational components.
- one modular unit contains the rha operon that encodes metabolic enzymes related with rhamnose metabolism requiring rhamnose as an environmental cue.
- This example demonstrates data integration and analysis to determine the metastructure of the E. coli K-12 MG1655 genome.
- RNA polymerase binding regions at a genome-scale The first step is to establish a description of the flow of genetic information is its transfer into messenger RNA (mRNA) by the transcription process. Although this process is extensively regulated in response to external signals, mRNA is basically synthesized by RNA polymerase (RNAP) that initially binds to the promoter region. Therefore, RNAP- binding regions and mRNA transcript abundance were integrated to determine segments of contiguous transcription originating from promoter regions.
- RNAP-binding regions at a genome scale a ChlP-chip method was employed to E. coli K-12 MG1655 grown in the presence or absence of rifampicin under multiple growth conditions.
- RNAP-associated DNA fragments were obtained that were then fluorescently labelled and hybridized to a high-density oligonucleotide tiling microarray representing the entire E. coli genome.
- Rifampicin treatment generated a genome-wide static map of RNAP-binding regions compared to a dynamic map of RNAP-binding regions without rifampicin treatment.
- Each value in columns 3-7 indicates binding levels (log2 ratio) of RNA polymerase under log phase (log), heat-shocked (heat), stationary phase (stat), and glutamine (gin) growth conditions.
- log log phase
- stat stationary phase
- glutamine glutamine
- RNAP-binding regions and transcriptomic data In the second step, comprehensive information was obtained about the expression level of mRNA transcripts across the entire E. coli genome using tiling microarrays to profile transcriptomes under multiple growth conditions. These growth conditions included log-phase, heat-shocked, stationary phase, and a different nitrogen source. Negative control probes that represent nonspecific background hybridization were randomly selected based on the median signal intensity (depicted as a dotted line in Figure 3). The microarray signals were subsequently transformed to binary signals, representing presence (probes expressed above background) and absence probes (background). Transcription data obtained from multiple growth conditions were added cumulatively in a step-by-step approach.
- RNAP-binding regions and transcriptomic data were integrated to obtain a map of contiguous transcript segments (i.e., RNAP-guided transcript segments), which is independent of the current genome annotation.
- the binary signals i.e., presence (1) or absence (0) calls
- RNAP-binding regions determined above ( Figure 3).
- Figure 3 the RNAP-guided transcript segmentation method, i.e., integrating the binary transcript signals with the RNAP-binding information, circumvents the assembly of unrelated transcripts and greatly benefits further TU
- Rl log phase
- R2, log phase+heat_shocked condition R3, log phase+heat_shocked condition+stationary phase
- R4, log phase+heat_shocked condition+stationary phase+glutamine growth condition Len, Length (bp); Den, Density (%).
- a total of 98 segments were determined without RNAP -binding.
- the genomic coverage of the segments was ⁇ 81% with an average probe density of ⁇ 83% per segment. With each iteration, boundary accuracy and probe density of the segments increased (see, e.g., Table 3 on the world wide web at
- RNAP-guided transcript segments were integrated with genome-wide TSSs data ( Figure 4). TSSs were determined by a newly developed, modified 5 '-RACE method using a unique RNA adapter and massive-scale sequencing. Three cumulative iterations yielded > 4.4 million sequence reads of an average length of 30 bp corresponding to ⁇ 30x genome lengths (-133 Mb raw sequence data). Sequence reads were mapped back onto the reference E. coli genome (NC_000913) to determine the numbers of reads matching each genomic position.
- Table 4 provides data from the genome-scale determination fo transcription start sites (TSSs), mapping onto RTSs. Each promoter region (2,955 in total) averages 1.6 TSSs. For confirmation, the data was compared to currently validated TSSs and found that 87% (1,089 out of 1,252) of the validated TSSs agreed to TSSs obtained from this study (see, e.g., Table 5 on the world wide web at
- Table 5 provides comparison data of previously known TSSs to TSSs obtained from this study.
- the 13% of the validated TSS (corresponding to 146 TUs) not detected in this study could be due to low mRNA expression levels as well as condition specific use of TSSs.
- the validated TSSs for narK, a gene encoding a nitrate/nitrite antiporter expressed under anaerobic growth condition were not detected in this study. This could be explained by nearly background mRNA levels for this gene under the applied conditions.
- the ilvIH operon encoding acetolactate synthase involved in the amino acid biosynthesis.
- the ilvIH operon has four experimentally verified TSSs. Among those, only one TSS, which is highly regulated by the transcription factor Lrp under the herein described growth conditions was detected. On the other hand, it was found that ⁇ 2% of TSSs (97 out of 4,133) were from weakly transcribed genes and that ⁇ 5% of RNAP-guided transcript segments (145 out of 2,685) lacked TSSs. Consequently, integration of the TSSs with the RNAP-guided transcript segments allowed us to determine a total of 4,036 TSS- associated transcriptional segments.
- Table 6 provides genome-scale proteomic data obtained from log phase, heat-shocked stationary phase growth conditions (this study), and from publicly available sources.
- Table 7 provides maximally extendable ORFs predicted from all six possible translational frames. This analysis yielded 2,542 pORFs (FDR ⁇ 2%) ( Figure 5, see, e.g., Table 8 on the world wide web at systemsbiology.ucsd.edu/tables, current as of 10/29/10, herein incorporated by reference in its entirety). Table 8 provides genome- wide determination data of potential ORF from maximally extendable ORGs and proteomics data sets.
- proteogenomic mapping approach allows for the genome- scale determination of ORFs, however, due to limitation in peptide coverage, additional methods, e.g. proteomics with N-terminal modification, have to be applied to obtain a more comprehensive and accurate ORF map.
- Table 10 provides mapping data of pORFs to RTSs.
- the current genome annotation still contains 2,087 gene loci that are listed as "predicted”, i.e., without any experimental verification. Over 42% (878) of these predicted gene loci were mapped onto pORFs, suggesting they were translated into proteins under growth conditions applied (see, e.g., Table 9 on the world wide web at systemsbiology.ucsd.edu/tables, current as of
- each modular unit contains information on (i) promoter region, (ii) transcription start sites (TSSs), (iii) transcribed regions, and (iv) ORFs, consisting of pORFs and currently annotated ORFs (see, e.g., Table 11 on the world wide web at systemsbiology.ucsd.edu/tables, current as of 10/29/10, herein incorporated by reference in its entirety).
- Table 11 provides genome-scale determination of modular units (MUs) representing potential transcript unit (MU).
- TU transcription unit
- Table 13 provides comparison data of TUs to the previously experimentally determined TUs. While 72 TUs ( ⁇ 8%) were not determined in this analysis due to lacks of identified TSSs, a total of 1,786 TUs (-72%) were consistent with computationally predicted TUs (see, e.g., Table 14 on the world wide web at systemsbiology.ucsd.edu tables, current as of 10/29/10, herein incorporated by reference in its entirety).
- Each of the 4,661 TUs is comprised of an average of 1.1 modular units with the largest TU (TU-0061) containing nine modular units equivalent to 16 ORFs (see, e.g., Table 12 on the world wide web at systemsbiology.ucsd.edu/tables, current as of 10/29/10, herein incorporated by reference in its entirety).
- a total of 3,010 TUs (-65%) are monocistronic, while 1,652 TUs contain more than one ORF (polycistronic).
- 398 TUs (-9%) were comprised of multiple modular units that are nested within each other, defining a convoluted genome structure ( Figure 7). These nested TU architecture might therefore increase the flexibility of expression states of bacterial genomes without increasing genome size.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Organic Chemistry (AREA)
- Biotechnology (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Plant Pathology (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012537150A JP2013509198A (en) | 2009-10-30 | 2010-10-29 | Bacterial metastructure and methods of use |
EP10827574.4A EP2494052A4 (en) | 2009-10-30 | 2010-10-29 | Bacterial metastructure and methods of use |
US13/504,386 US20120302450A1 (en) | 2009-10-30 | 2010-10-29 | Bacterial Metastructure and Methods of Use |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US25671009P | 2009-10-30 | 2009-10-30 | |
US61/256,710 | 2009-10-30 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2011053864A2 true WO2011053864A2 (en) | 2011-05-05 |
WO2011053864A3 WO2011053864A3 (en) | 2011-10-06 |
Family
ID=43923030
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2010/054857 WO2011053864A2 (en) | 2009-10-30 | 2010-10-29 | Bacterial metastructure and methods of use |
Country Status (4)
Country | Link |
---|---|
US (1) | US20120302450A1 (en) |
EP (1) | EP2494052A4 (en) |
JP (1) | JP2013509198A (en) |
WO (1) | WO2011053864A2 (en) |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2307674C (en) * | 1997-10-30 | 2013-02-05 | Cold Spring Harbor Laboratory | Probe arrays and methods of using probe arrays for distinguishing dna |
EP1470241A2 (en) * | 2002-01-24 | 2004-10-27 | Ecopia Biosciences Inc. | Method, system and knowledge repository for identifying a secondary metabolite from a microorganism |
EP1428889A1 (en) * | 2002-12-10 | 2004-06-16 | Epigenomics AG | Method for monitoring the transition of a cell from one state into another |
JP3845416B2 (en) * | 2003-12-01 | 2006-11-15 | 株式会社ポストゲノム研究所 | Gene tag acquisition method |
DE602004029284D1 (en) * | 2003-12-24 | 2010-11-04 | Advanomics Corp | DIRECT IDENTIFICATION AND MAPPING OF RNA TRANSCRIPTS |
JP4557609B2 (en) * | 2004-06-08 | 2010-10-06 | 株式会社日立製作所 | How to display splice variant sequence mapping |
WO2006126292A1 (en) * | 2005-05-25 | 2006-11-30 | National University Corporation NARA Institute of Science and Technology | Apparatus for converting microarray data |
US8428882B2 (en) * | 2005-06-14 | 2013-04-23 | Agency For Science, Technology And Research | Method of processing and/or genome mapping of diTag sequences |
-
2010
- 2010-10-29 US US13/504,386 patent/US20120302450A1/en not_active Abandoned
- 2010-10-29 JP JP2012537150A patent/JP2013509198A/en active Pending
- 2010-10-29 WO PCT/US2010/054857 patent/WO2011053864A2/en active Application Filing
- 2010-10-29 EP EP10827574.4A patent/EP2494052A4/en not_active Withdrawn
Non-Patent Citations (1)
Title |
---|
See references of EP2494052A4 * |
Also Published As
Publication number | Publication date |
---|---|
JP2013509198A (en) | 2013-03-14 |
WO2011053864A3 (en) | 2011-10-06 |
US20120302450A1 (en) | 2012-11-29 |
EP2494052A2 (en) | 2012-09-05 |
EP2494052A4 (en) | 2013-08-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Stephenson et al. | Direct detection of RNA modifications and structure using single-molecule nanopore sequencing | |
Babski et al. | Genome-wide identification of transcriptional start sites in the haloarchaeon Haloferax volcanii based on differential RNA-Seq (dRNA-Seq) | |
Iosub et al. | Hfq CLASH uncovers sRNA-target interaction networks linked to nutrient availability adaptation | |
Cho et al. | The transcription unit architecture of the Escherichia coli genome | |
Payen et al. | High-throughput identification of adaptive mutations in experimentally evolved yeast populations | |
Henn et al. | Analysis of high-throughput sequencing and annotation strategies for phage genomes | |
McManus et al. | Ribosome profiling reveals post-transcriptional buffering of divergent gene expression in yeast | |
Lahens et al. | IVT-seq reveals extreme bias in RNA sequencing | |
Chen et al. | The genome of Sulfolobus acidocaldarius, a model organism of the Crenarchaeota | |
Vivancos et al. | Strand-specific deep sequencing of the transcriptome | |
Hör et al. | Grad‐seq in a Gram‐positive bacterium reveals exonucleolytic sRNA activation in competence control | |
Shao et al. | RNA G-quadruplex structures mediate gene regulation in bacteria | |
Romero et al. | A comparison of key aspects of gene regulation in S treptomyces coelicolor and E scherichia coli using nucleotide‐resolution transcription maps produced in parallel by global and differential RNA sequencing | |
Moqtaderi et al. | Extensive structural differences of closely related 3′ mRNA isoforms: links to Pab1 binding and mRNA stability | |
Diehl et al. | Minimized combinatorial CRISPR screens identify genetic interactions in autophagy | |
Peschek et al. | A conserved RNA seed‐pairing domain directs small RNA‐mediated stress resistance in enterobacteria | |
Frank et al. | Pseudomonas putida KT2440 genome update by cDNA sequencing and microarray transcriptomics | |
Lee et al. | A genome-wide survey of highly expressed non-coding RNAs and biological validation of selected candidates in Agrobacterium tumefaciens | |
Hesketh et al. | New pleiotropic effects of eliminating a rare tRNA from Streptomyces coelicolor, revealed by combined proteomic and transcriptomic analysis of liquid cultures | |
Espinar et al. | Promoter architecture determines cotranslational regulation of mRNA | |
Tran et al. | De novo computational prediction of non-coding RNA genes in prokaryotic genomes | |
Huch et al. | Atlas of mRNA translation and decay for bacteria | |
US20090111099A1 (en) | Promoter Detection and Analysis | |
Lalanne et al. | Spurious regulatory connections dictate the expression‐fitness landscape of translation factors | |
Zink et al. | Comparative CRISPR type III-based knockdown of essential genes in hyperthermophilic Sulfolobales and the evasion of lethal gene silencing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10827574 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2012537150 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REEP | Request for entry into the european phase |
Ref document number: 2010827574 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010827574 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13504386 Country of ref document: US |