WO2023070043A1 - Compositions and methods for targeted editing and evolution of repetitive genetic elements - Google Patents

Compositions and methods for targeted editing and evolution of repetitive genetic elements Download PDF

Info

Publication number
WO2023070043A1
WO2023070043A1 PCT/US2022/078446 US2022078446W WO2023070043A1 WO 2023070043 A1 WO2023070043 A1 WO 2023070043A1 US 2022078446 W US2022078446 W US 2022078446W WO 2023070043 A1 WO2023070043 A1 WO 2023070043A1
Authority
WO
WIPO (PCT)
Prior art keywords
intron
sequence
cell
polynucleotide
splicing
Prior art date
Application number
PCT/US2022/078446
Other languages
French (fr)
Inventor
Farren ISAACS
Felix RADFORD
Original Assignee
Yale University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yale University filed Critical Yale University
Publication of WO2023070043A1 publication Critical patent/WO2023070043A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]

Definitions

  • the invention is generally related to the field of gene editing technology, and more particularly to methods for targeted editing and continuous evolution of repetitive genomic elements.
  • Genome editing introduces targeted modifications in the chromosomes of living cells, permitting the elucidation of causal links between genotype and phenotype, global reprogramming of cellular behavior, and emerging applications for gene therapy ( Komor, et al., Cell, 168, 20-36 (2017)).
  • Nuclease-dependent approaches to genome engineering such as CRISPR/Cas9, generate DNA double stranded breaks (DSBs) to introduce modifications into the genome (Gaj, et al., Trends in biotechnology, 31, 397-405 (2013), Kim & Kim, Nature Reviews Genetics 15, 321-334 (2014)).
  • NHEJ non-homologous end joining
  • HDR homology-directed repair
  • Other approaches such as prime editing (Anzalone, et al., Nature, 576, 149-157 (2019)), base editing (Gaudelli, et al., Nature, 551, 464-471 (2017)), and multiplex automated genome engineering (Wang, et al., Nature, 460, 894-898 (2009)) (MAGE), are nuclease-independent genome editing techniques, and can be employed for multi-site genomic edits as well as continuous evolution of a single genomic locus.
  • a major limitation with all genome editing approaches is the inability to selectively edit or diversify a genetic element that possesses high sequence homology to other genetic loci.
  • Such repetitive genetic elements form large fractions of genomes across all domains of life. For example, repetitive elements constitute over two-thirds of the human genome (de Koning, et al., PLoS genetics, 1 (2011)).
  • the inability to modify such loci for functional characterization, precise editing, or targeted diversification remains a defining challenge. Modification of repetitive genetic elements would permit their functional characterization and establish new avenues to alter cellular physiology.
  • deletion of transposable elements enhances genome stability (Dymond, et al., Nature, 477, 471 (2011), Posfai, et al., Science, 312, 1044-1046 (2006)), mutagenesis of CRISPR arrays affects innate immunity (Sapranauskas, et al., Nucleic acids research, 39, 9275- 9282 (2011)) and genome editing (Hsu, et al., Cell, 157, 1262-1278 (2014)), and translational components (e.g., tRNAs, ribosomes) can be evolved for genetic code expansion (Young & Schultz, ACS chemical biology 13, 854- 870 (2016)).
  • compositions and improved methods for genomic engineering are described. It has been discovered that introns can be introduced into repetitive genomic sequences to form a unique genetic address that facilitates insertion or recombination of a template containing a desired sequence within the repetitive genomic sequence. Thus, the disclosed compositions and methods are especially useful for editing genomic target sites that possess high sequence similarity to other target sites (e.g., repetitive genomic elements).
  • the compositions and methods can be used for multi-site, targeted editing and/or continuous evolution of target sites (e.g., repetitive genomic elements) in tandem or in parallel, and in both prokaryotes and eukaryotes.
  • Nucleic acids and compositions thereof are described.
  • a polynucleotide that includes a sequence encoding an intron linked to a heterologous sequence.
  • the heterologous sequence does not include a sequence flanking the intron in its native context.
  • the heterologous sequence has a native context (e.g., sequences flanking the heterologous sequence)
  • such a context does not typically include an intron.
  • the intron is inserted or otherwise incorporated at a non-native locus in such a way that it disrupts the continuous nucleic acid sequence found at that locus (referred to herein as heterologous sequence(s)), and can serve as an anchor for targeted mutation of the heterologous sequence adjacent to the inserted intron.
  • the heterologous sequence is heterologous to the intron, but in its native, uninterrupted form may be present in the host cells.
  • the intron can be positioned upstream and/or downstream (i.e., 5’ and/or 3’) of the heterologous sequence targeted for mutation, according to the constraints of the gene editing technology with which it is used.
  • the intron can be in any orientation as long as it is transcribed with the same sequence.
  • the intron is preferably a self-splicing intron, particularly for prokaryotic systems, but in the case of eukaryotic systems, may alternatively be a spliceosomal intron.
  • the self-splicing intron is a Group I intron.
  • Suitable self-splicing introns include naturally occurring self-splicing introns from or derived from Tetrahymena thermophila, Nostoc punctiforme, Bacillus anthracis, Tilletiopsis flava, Azoarcus, Pneumocystis carinii, Agrobacterium, T7-like bacteriophage, Bacteriophage T4, or combinations thereof.
  • the self-splicing intron is a chimeric self-splicing or spliceosomal intron.
  • An exemplary chimeric self-splicing intron includes segments derived from Tetrahymena thermophila and Tilletiopsis flava.
  • the chimeric self- splicing intron is encoded by the sequence of SEQ ID NO:1 or a sequence having at least 85% identity to SEQ ID NO:1.
  • the heterologous sequence is or includes a repetitive element, such as, ribosomal, particularly a ribosomal RNA (rRNA) gene or portion thereof, a tRNA gene or portion thereof, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, a telomere, or a CRISPR array.
  • rRNA ribosomal gene or portion thereof
  • a tRNA gene or portion thereof a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, a telomere, or a CRISPR array.
  • the repetitive element can be a naturally or artificial introduced repetitive element.
  • the repetitive element is a recombinant or other non- native nucleic acid sequence.
  • Any foreign (or heterologous or recombinant) sequence can be introduced into a cell that already contains that sequence or one with high sequence similarity, rendering the native and introduced elements as repetitive.
  • the repetitive sequence can be one that is artificially or synthetically created.
  • the heterologous sequence, and thus the repetitive sequence can be within a cell’s genome or extrachromosomal.
  • the heterologous sequence(s) need not necessarily be repetitive.
  • the heterologous sequence(s) can be e.g., a part of, or fragment of, the coding or non-coding region of a non-repetitive gene, e.g., a gene encoding a protein.
  • the heterologous sequence(s) is or includes transcribable sequence within the host.
  • the self-splicing intron alone, or spliceosomal intron in combination with a spliceosome can be scarlessly removed from the transcript during or after transcription, such that the gene product of the edited heterologous sequence is expressed without the intron.
  • the polynucleotide is a plasmid or portion thereof, or a viral vector or portion thereof.
  • a prokaryotic cell e.g., a bacterium such as E. coli
  • eukaryotic cell contains the disclosed polynucleotide (e.g., plasmid or viral vector), which may or may not be integrated into the cell’s genome.
  • the intron replaces an endogenous intron in the cell.
  • Methods of modifying cellular genomes are also provided.
  • a method of modifying the genome of a cell at one or more target sites includes integrating (a sequence encoding) an intron adjacent to each of the one or more target sites, and subsequently inducing incorporation of a donor oligonucleotide at each of the one or more target sites via a gene editing technology.
  • the donor oligonucleotide(s) can include one or more mutations relative to the target sites where they are incorporated.
  • the donor oligonucleotide can be partially or completely homologous to the nucleotide sequence encoding the intron.
  • the donor oligonucleotide is DNA, such as single- stranded DNA (ssDNA) or double- stranded DNA (dsDNA). Any of the aforementioned naturally occurring or chimeric introns can be used in accordance with the method.
  • Suitable gene editing technologies that can be used to incorporate the donor oligonucleotide(s) include, without limitation, a CRISPR system (e.g., CRISPR/Cas9, base editors, prime editors, etc.) multiplex automated genome engineering (MAGE), ZFNs, TALENS, etc. In some embodiments, both a CRISPR system and MAGE are used.
  • CRISPR system e.g., CRISPR/Cas9, base editors, prime editors, etc.
  • MAGE automated genome engineering
  • ZFNs ZFNs
  • TALENS TALENS
  • both a CRISPR system and MAGE are used.
  • the one or more target sites to be modified can be or include a ribosomal gene, e.g., RNA (rRNA) gene, a tRNA gene, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, telomere, a CRISPR array, or any other desire repetitive or non-repetitive genetic target.
  • rRNA RNA
  • tRNA gene e.g., RNA (rRNA) gene, a tRNA gene, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, telomere, a CRISPR array, or any other desire repetitive or non-repetitive genetic target.
  • IS insertion sequence
  • the cell is modified at two or more target sites.
  • the self-splicing intron integrated adjacent to the first target site can be the same or distinct from the intron integrated adjacent to a second target site.
  • the donor oligonucleotide(s) are typically specific for each target site, and may be a plurality of single donor to induce a specific mutation(s) at the target site, or a pool of different donors to induce a random or semi-random mutation(s) at the target site or a library of cells containing different mutations at the target site.
  • the foregoing strategies can be employed separately or together to induce specific mutations at two or more target sites; random or semi-random mutation(s) at one or more target sites and/or a library of cells containing different mutations at one or more target sites; or a specific mutation(s) at one or more target sites in combination with a random or semi-random mutation(s) at one or more target sites and/or a library of cells containing different mutations at one or more target sites.
  • This allows for simultaneous induction of complex combinations of specific, semi- random, and random mutations at multiple target sites.
  • the donor oligonucleotide incorporated at the first target site originates from a plurality of identical donor oligonucleotides and/or the donor oligonucleotide incorporated at the second target site originates from a plurality of distinct oligonucleotides.
  • incorporation of the donor oligonucleotide at a first target site is mediated by the CRISPR system and/or incorporation of the donor oligonucleotide at a second target site is mediated by MAGE.
  • the CRISPR system and MAGE can be used in parallel or tandem.
  • a method of screening for one or more mutations that confer a desirable phenotype can include modifying the genome of a plurality of cells as described by the methods above and subsequently selecting for a cell exhibiting the desirable phenotype.
  • a desirable phenotype can be antibiotic resistance. Selecting for a cell that exhibits antibiotic resistance can include exposing the plurality of cells to an effective amount of one or more antibiotics.
  • an engineered bacterial ribosome contains one or more mutations in its 23S rRNA (e.g., compared to a wildtype ribosome), particularly within nucleotides 2030-2034 and/or nucleotides 2057-2061.
  • Exemplary mutations include those encoded by a sequence selected from TCACC, CGCCG, TAGCA, GCCTG, CATTG, AAGGT, ACCCG, TCCCG, GTACA, ATTCT, AATGT, and ACCGT.
  • the mutation(s) confers resistance to one or more antibiotics, such as, cholramphenicol, erythromycin, clindamycin, and lincomycin. In some embodiments, the mutation(s) confers the ability for the mutated ribosomes formed therefrom to accommodate synthetic or abiotic monomer such as non-L-amino acids, non-canonical L-alpha-amin acids, D-amino acids, etc., and/or facilitate the formation of polymer therefrom. In some embodiments, the engineered bacterial ribosome includes a linker tethering the rRNA of the small subunit (16S rRNA) with the rRNA of the large subunit (23S rRNA).
  • Polynucleotides encoding the rRNAs of the engineered ribosome are provided.
  • the polynucleotide is included in an expression vector.
  • cells containing the engineered ribosome and/or the polynucleotide encoding the rRNAs of the engineered ribosome are also provided.
  • Figure 1A is a schematic illustrating the seven native ribosomal operons in the E. coli genome, which share extensive sequence homology to an orthogonal tethered ribosome (oRiboT), which is also introduced into the genome.
  • oRiboT orthogonal tethered ribosome
  • Figure 1B is a schematic illustrating the secondary structure of oRiboT rRNA. Introns were introduced at four separate sites in oRibo-T. Intron insertion sites are designated with arrows, and areas that can be targeted with f-MAGE highlighted in purple.
  • Figures 1C-1D are illustrations showing the general approach to filtered CRISPR (Fig. 1C) and MAGE (Fig. ID).
  • An intron is introduced near the site targeted for mutagenesis to provide a unique address for hybridization of sgRNA to introduce a cas9-mediated double- stranded break (Fig. 1C), or a pool of mutagenic MAGE ssODNs (Fig. 1D). After the mutation is introduced into the DNA, the intron is spliced out of the transcribed RNA to produce the desired product.
  • Tetrahymena thermophila and a panel of other group 1 self-splicing introns were introduced into the gene encoding an orthogonal tethered ribosome (oRiboT) to distinguish it from the seven native ribosome genes; f-CRISPR and f-MAGE were then used to generate libraries of spliced oRiboT RNAs with targeted mutations.
  • oRiboT orthogonal tethered ribosome
  • Figure 2A is a scehmatic showing the oGFP reporter and oRibo-T constructs used for oGFP expression experiments.
  • Figure 2B is an illustration showing constructs used to validate in vivo intron function and verify the sequence of ligated exons. Sequencing primers are shown in black.
  • Figure 2D is a bar graph showing percentage editing and library complexity.
  • f-CRISPR was performed on genomic oRiboT-Tt2 to introduce dsDNA to replace a 832-bp region of oRibo-T and introduce a 7N mutagenic library; conversion-efficiency was determined by next-generation sequencing (where + conversion denotes a mutant, and - conversion denotes a WT ribosome).
  • Figure 2E is a line graph showing percentage editing as a function of amount of homology to the intron.
  • f-MAGE was performed with ssODNs having 0 to 70 nucleotides of homology to the intron. The conversion ratios of oRibo-T to WT ribosomes were determined with next generation sequencing.
  • Figure 2F is a table showing 23S RNA mutations and their effects on antibiotic resistance in an oRiboT and WT ribosome background.
  • Er erythromycin
  • Cl clindamycin
  • Cm chloramphenicol
  • Ln lincomycin
  • Figure 2G is a series of line graphs showing growth kinetic profiles of E. coli MG 1655 strain containing RiboT or WT ribosomes with mutations in chloramphenicol (7.74 ⁇ M).
  • Figures 3A-3E are bar graphs showing oGFP expression when group I self-splicing introns were introduced into oRibo-T at site 1 (Fig. 3A) or 2 (Fig. 3B), independently, or in combination with the Tetrahymena intron at site 2 (Fig. 3C), 3 (Fig. 3D), or 4 (Fig. 3E).
  • Intron abbreviations correspond to Table 3.
  • Figure 3F is a schematic showing construction of a chimeric CTt intron. The Tt intron was engineered to incorporate the Pl helix from the Tfa intron, in order to create an orthogonal intron that could be distinguished with unique 5’ homology for ssODNs or gRNAs.
  • the chimeric intron was inserted at site 2 in oRiboT and assayed for ability to self-splice and whether oRiboT with CTt intron would be functional (via oGFP production assay).
  • CTt chimeric intron
  • the illustrated sequences are agucaucgugacuacaagc (SEQ ID NO: 13) (Tfa Pl) and uuuccauuuauaacgauaaaa (SEQ ID NO: 14) (Tt Pl).
  • Figure 4A is a schematic showing multisite introduction of introns in the same oRibo-T construct.
  • the intron at site 2 is an engineered (CTt) intron
  • intron at site 4 is natural Tt.
  • CTt engineered
  • Tt natural Tt
  • the illustrated sequences are ugaacucgcugugAAGAUgcaguguacccgcggcaagacgGAAAGaccccg ga (SEQ ID NO: 15) showing the location of the intron insertion site, ugaacucgcugugCAUUGgcaguguacccgcggcaagacgAAGGUaccccg ga (SEQ ID NO: 16) for editing, and ugaacucgcugugNNNNNNNgcaguguacccgcggcaagacgNNNNNaccccg ga (SEQ ID NO: 17) for evolution at Site 2, and uuggaucaUUGUGGua (WT aSD) (SEQ ID NO:18) and uuggaucaCCUCCUua (O-aSD) (SEQ ID NO: 19) at Site 4.
  • WT aSD uuggaucaUUGUGGua
  • O-aSD uuggaucaCCUCCUua
  • Figure 4B is a graph showing oGFP expression in post- MAGE cultures from f-MAGE cycles 0-6.
  • Figure 4C is a graph showing survival rate in chloramphenicol (7.74 ⁇ M) from f-MAGE cycles 0-6. Chloramphenicol- selected variants were induced for oGFP expression and flow cytometry was performed. Percentage of oGFP-positive cells was quantified for all cycles.
  • Figure 4D is a graph showing oGFP expression in post-MAGE cultures from f-MAGE cycles 0-6.
  • Figure 4E is a graph showing survival rate in chloramphenicol (7.74 ⁇ M) from f-MAGE cycles 0- 6.
  • FIG. 4F is a schematic illustrating in vivo ribosome mutagenesis with f-MAGE to validate in vivo evolution at the aSD (site 4).
  • Six cycles of f-MAGE with ssODN to convert the anti-oRBS to WT- anti-SD sequence were performed on C321 strain with oRiboT-CTt2-Tt4 and oGFP reporter.
  • Post-MAGE cultures were induced for oGFP expression and flow cytometry was performed on cultures from f-MAGE cycles 0-6.
  • Group 1 self-splicing introns were introduced into repetitive sequences to construct unique genetic addresses that can be selectively modified. This was used in combination with CRISPR/Cas9 and filtered MAGE to enable targeted editing and evolution of ribosomes in vivo without making off-target edits to native genomic elements which share sequence homology.
  • the working Examples show that naturally occurring self- splicing introns as well as engineered chimeric introns can be used. Using these methods, multi-site evolution of repetitive genetic elements such as the ribosome can be performed.
  • the Examples demonstrate the ability to drive evolution of repetitive genetic elements, such as orthogonal tethered ribosomes (oRiboT), continuously in vivo without the need for laborious plasmid cloning and re- transformation, while at the same time allowing selective editing of only oRiboT and not the cell's native translational apparatus.
  • This can also allow for much larger ribosomal libraries to be created, including many mutations that would be otherwise toxic to the cell, but are targeted only to oRiboT.
  • Filtered Editing can be used to not only randomize certain portions of the ribosome but also create discrete mutations while randomizing others. This makes it more flexible than continuous evolution strategies such as evolvr or PACE, because rationally-determined mutations can be introduced in precise loci to evolve new functions, alongside diversification of other precise regions.
  • the ability to evolve oRiboT continuously in vivo without modifying the native ribosomes of the cell allows for much more efficient evolution of the ribosome, and can be applied to, for example, evolve ribosomes that can catalyze chemistries beyond the peptide bond, and thus create platforms for preparation of sequence-defined polymers in vivo. Polymers of potentially new functions could then be scaled-up in vivo and produced for industrial, military, and medical applications. Increasing interest in producing protein biomaterials incorporating nonstandard amino acids could also benefit from metabolic encapsulation by oRiboT. The ability to evolve oRiboT to be more efficient, and to increase protein yields, could allow an improved chassis for protein biomaterial production.
  • “Introduce” in the context of genome modification refers to bringing into contact.
  • a gene editing reagent e.g., a vector containing an intron or a Crispr effector protein
  • the term encompasses penetration of the contacted composition to the interior of the cell by any suitable means, e.g., via transfection, electroporation, transduction, gene gun, nanoparticle delivery, etc.
  • “Homologous” refers to the sequence similarity or sequence identity between two polypeptides or between two nucleic acid molecules. When a position in both of the two compared sequences is occupied by the same base or amino acid monomer subunit, e.g., if a position in each of two DNA molecules is occupied by adenine, then the molecules are homologous at that position.
  • the percent of homology between two sequences is a function of the number of matching or homologous positions shared by the two sequences divided by the number of positions compared X 100. For example, if 6 of 10 of the positions in two sequences are matched or are homologous, then the two sequences are 60% homologous.
  • the DNA sequences ATTGCC and TATGGC share 50% homology. Generally, a comparison is made when two sequences are aligned to give maximum homology.
  • operably linked refers to functional linkage between two elements (e.g., a regulatory sequence and a heterologous nucleic acid sequence) permitting them to function in their intended manner (e.g., resulting in expression of the latter).
  • the term encompasses positioning of a regulatory region and a sequence to be transcribed in a nucleic acid so as to influence transcription or translation of such a sequence. For example, to bring a coding sequence under the control of a promoter, the translation initiation site of the translational reading frame of the polypeptide is typically positioned between one and about fifty nucleotides downstream of the promoter.
  • a promoter can, however, be positioned as much as about 5,000 nucleotides upstream of the translation initiation site or about 2,000 nucleotides upstream of the transcription start site.
  • Endogenous refers to any material from or produced inside a specific organism, cell, tissue or system.
  • Exogenous refers to any material introduced from or produced outside an organism, cell, tissue or system. It is understood that the term is relative, but that a reference need not be specified. For example, a protein that is endogenous to a bacterial cell can be produced by the bacterial cell, but that same protein would be exogenous to a eukaryotic cell that does not natively express or produce that protein.
  • Heterologous is used herein in the context of two more elements having a different, non-native relation, relative position, or structure.
  • the elements can include, but are not limited to, naturally occurring elements from the same or different organisms, chimeric elements, synthetic or engineered elements, etc., provided that the elements are not found in nature in the same relation, relative position, or structure.
  • Heterologous sequence refers to a nucleic acid sequence element having a different, non-native relation, relative position, or structure to a second sequence element.
  • Each of the heterologous sequence and the second sequence element can be selected from, but are not limited to, naturally occurring elements from the same or different organisms, chimeric elements, synthetic or engineered elements, etc., provided that the elements are not found in nature in the same relation, relative position, or structure.
  • a second sequence element is a naturally occurring self-splicing or spliceosomal intron and the heterologous sequence linked thereto is not linked (e.g., directly) to the intron in nature, though it may also be a naturally occurring sequence from the same or different organism.
  • heterologous sequence(s) can refer to naturally or non-naturally occurring sequences that flank (e.g., are interrupted by) a self-splicing or spliceosomal intron that has been inserted into a non-native position in the same or a different organism.
  • Chimeric as used in the context of a nucleic acid describes a non- naturally occurring polynucleotide that is or has a sequence that is made by an artificial combination of two or more otherwise separated segments of sequence.
  • the sequences combined to form the chimeric nucleic acid are derived from two or more different organisms or species. This artificial combination is often accomplished by chemical synthesis or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques known in the art (e.g., to facilitate addition, substitution, or deletion of a portion of the nucleic acid).
  • target sequence refers to a nucleic acid sequence or region which is targeted for a specific manipulation or activity, such as, modification (e.g., gene editing), amplification, detection, and the like.
  • the target site can refer to a specific subsequence of a larger nucleic acid (e.g., an exon) or to the overall sequence (e.g., a gene). The difference in usage will be apparent from context.
  • locus is the specific physical location of a DNA sequence (e.g. of a gene) on a chromosome. It is understood that a locus of interest can include a nucleic acid sequence that exists in the main body of genetic material (e.g., in a chromosome) of a cell and also a portion of genetic material that can exist independently to said main body of genetic material such as plasmids, episomes, virus, transposons or in organelles such as mitochondria as non-limiting examples.
  • isolated means altered or removed from the natural state.
  • An isolated nucleic acid can exist in substantially purified form, or can exist in a non-native environment such as, for example, a host cell.
  • An “isolated nucleic acid” encompasses a nucleic acid segment or fragment which has been separated from sequences which flank it in a naturally occurring state, e.g., a DNA fragment which has been removed from the sequences which are normally adjacent to the fragment in a genome in which it naturally occurs.
  • the term also applies to nucleic acids which have been substantially purified from other components which naturally accompany the nucleic acid (e.g., RNA or DNA or proteins, which naturally accompany it in the cell).
  • the term therefore includes, for example, a recombinant DNA which is incorporated into a vector, into an autonomously replicating plasmid or virus, or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule (e.g., as a cDNA or a genomic or cDNA fragment produced by PCR or restriction enzyme digestion) independent of other sequences.
  • isolated refers to a cell altered or removed from its natural state. An isolated cell is thus in an environment different from that in which the cell naturally occurs, e.g., separated from its natural milieu such as by concentrating to a concentration at which it is not found in nature. “Isolated cell” is meant to include cells that are within samples that are substantially enriched for the cell of interest and/or in which the cell of interest is partially or substantially purified.
  • a “vector” is a composition of matter which comprises an isolated nucleic acid and which can be used to deliver the isolated nucleic acid to the interior of a cell.
  • vectors include but are not limited to, linear polynucleotides, polynucleotides associated with ionic or amphiphilic compounds, plasmids, and viruses.
  • the term “vector” includes an autonomously replicating plasmid or a virus. The term is also construed to include non-plasmid and non- viral compounds which facilitate transfer of nucleic acid into cells, such as, for example, polylysine compounds, liposomes, and the like.
  • viral vectors examples include, but are not limited to, adenoviral vectors, adeno-associated virus vectors, retroviral vectors, and the like.
  • “Expression vector” refers to a vector containing a polynucleotide having expression control sequences operatively linked to a nucleotide sequence to be expressed.
  • An expression vector contains sufficient cis-acting elements for expression; other elements for expression can be supplied by the host cell or in an in vitro expression system.
  • Expression vectors include all those known in the art, such as cosmids, plasmids (e.g., naked or contained in liposomes), phagemids, BACs, YACs, and viral vectors (e.g., vectors derived from lentiviruses, retroviruses, adenoviruses, and adeno-associated viruses) that incorporate the recombinant polynucleotide.
  • cosmids e.g., naked or contained in liposomes
  • phagemids e.g., naked or contained in liposomes
  • BACs e.g., naked or contained in liposomes
  • viral vectors e.g., vectors derived from lentiviruses, retroviruses, adenoviruses, and adeno-associated viruses
  • a “mutation” refers to a change in a nucleotide (e.g., DNA) sequence resulting in an alteration from a given reference sequence.
  • the mutation can be a deletion, insertion, duplication, rearrangement, and/or substitution of at least one deoxyribonucleic acid base such as a purine (adenine and/or guanine) and/or a pyrimidine (thymine, uracil and/or cytosine). Mutations may or may not produce discernible changes in the observable characteristics (phenotype) of a subject.
  • percent (%) sequence identity describes the percentage of nucleotides or amino acids in a candidate sequence that are identical with the nucleotides or amino acids in a reference nucleic acid sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity. Alignment for purposes of determining percent sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2, ALIGN, ALIGN-2 or Megalign (DNASTAR) software. Appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full-length of the sequences being compared can be determined by known methods.
  • % sequence identity of a given nucleic acid or amino acid sequence C to, with, or against a given nucleic acid or amino acid sequence D (which can alternatively be phrased as a given sequence C that has or includes a certain % sequence identity to, with, or against a given sequence D) is calculated as follows:
  • the term “effective amount” means a quantity sufficient to provide a desired pharmacologic and/or physiologic effect.
  • the sub-group of A-E, B-F, and C-E are specifically contemplated and should be considered disclosed from disclosure of A, B, and C; D, E, and F; and the example combination A-D.
  • each of the materials, compositions, components, etc. contemplated and disclosed as above can also be specifically and independently included or excluded from any group, subgroup, list, set, etc. of such materials.
  • Reagents and compositions thereof for use in the disclosed methods are provided.
  • nucleic acids and constructs thereof and gene editing technologies for use in methods of modifying the genome of a cell are provided.
  • Such modified cells, engineered ribosomes made according to the disclosed compositions and methods, and cells expressing the engineered ribosomes are also described.
  • polynucleotides including a nucleotide sequence encoding an intron, such as a self-splicing or spliceosomal intron.
  • an intron such as a self-splicing or spliceosomal intron.
  • the intron can be naturally occurring or non- naturally occurring, such as a chimeric intron containing sequences derived from two or more organisms or species.
  • the polynucleotide includes a sequence encoding an intron operably linked to a heterologous sequence in such a manner that the intron alone or in combination with the heterologous sequence can serve as an anchor for targeting gene editing technology, thus facilitating specific gene editing at a site in or adjacent to the heterologous sequence.
  • the polynucleotide can be single stranded or double stranded.
  • the polynucleotide can be composed of DNA, RNA, one or more synthetic nucleotides, or any combination thereof.
  • the polynucleotide can be integrated into the genome of a cell or can be extrachromosomal.
  • the polynucleotide is a vector, such as an expression vector.
  • Expression vectors include all those known in the art, such as cosmids, plasmids (e.g., naked or contained in liposomes), phagemids, artificial chromosomes (e.g., BACs, YACs), and viral vectors (e.g., vectors derived from lentiviruses, retroviruses, adenoviruses, and adeno- associated viruses) that incorporate the polynucleotide.
  • the polynucleotide is a plasmid or portion thereof, or a viral vector or portion thereof.
  • the polynucleotide is present in the genome of the cell at, for example, a target locus.
  • a nucleotide sequence encoding the intron and/or the heterologous sequence is operably linked to a control element, e.g., a transcriptional control element, such as a promoter.
  • the transcriptional control element may be functional in either a eukaryotic cell, e.g., a mammalian cell, or a prokaryotic cell (e.g., bacterial or archaeal cell),
  • a nucleotide sequence encoding the intron and/or the heterologous sequence is operably linked to multiple control elements that allow expression of the nucleotide sequence encoding the intron and/or the heterologous sequence in either prokaryotic or eukaryotic cells.
  • any of a number of suitable transcription and translation control elements including constitutive and inducible promoters, transcription enhancer elements, transcription terminators, etc. may be used in the expression vector (e.g., U6 promoter, HI promoter, etc.).
  • control elements are endogenous to a cell harboring the polynucleotide.
  • the polynucleotide can form part of a larger unit of DNA, such that when the larger unit is transcribed, the intron sequence is scarlessly removed from the mature transcript.
  • RNA splicing is a process during which precursor messenger RNA (pre-mRNA) transcript is transformed into a mature messenger RNA (mRNA). Introns (non-coding regions of RNA) are removed and so joining together exons (coding regions). For nuclear- encoded genes, splicing occurs in the nucleus either during or immediately after transcription. For many eukaryotic introns, splicing occurs in a series of reactions which are catalyzed by the spliceosome, a complex of small nuclear ribonucleoproteins (snRNPs). However, there also exists self- splicing introns, that is, ribozymes that can catalyze their own excision from their parent RNA molecule.
  • pre-mRNA precursor messenger RNA
  • mRNA messenger RNA
  • Introns non-coding regions of RNA
  • splicing occurs in the nucleus either during or immediately after transcription.
  • splicing occurs in a series of reactions which are catalyze
  • the intron(s) utilized in the disclosed compositions, methods, and strategies are self-splicing introns.
  • Self-splicing introns have the capacity to splice themselves out from a precursor RNA.
  • Self-splicing introns are preferred because they do not require a spliceosome to scarlessly exit the disclosed constructs.
  • Group I There are three kinds of self-splicing introns, Group I, Group II and Group III.
  • the initial discovery of self-splicing ability was in the protozoan Tetrahymena thermophila.
  • the self-splicing introns found in T. thermophila are now referred to as Group I introns.
  • Group I introns are widespread but sporadically distributed in nature, and they are present in the genomes of some bacteria, protozoa, fungi, mitochondria, chloroplasts, bacteriophages, and eukaryotic viruses, and in the nuclei of eukaryotic microorganisms.
  • Group I introns all fold into a complex secondary structure with nine loops and employ transesterification reactions to facilitate self- splicing.
  • self-splicing of Group I introns depends on two consecutive transesterification reactions initiated by a nucleophilic attack of the 3'OH of an exogenous guanosine cofactor (exoG) at the 5' splice site (SS).
  • ExoG is specifically bound to the P7 catalytic core segment of the splicing ribozyme prior to the first splicing step. This reaction leaves exoG covalently attached to the 5' end of the intron RNA as well as a free 5' exon with an available 3'OH group.
  • exoG is replaced by the terminal guanosine at P7, and the reaction is initiated when the 5' exon attacks the 3' SS, resulting in ligated exons and the released linear intron.
  • the reaction is initiated when the 5' exon attacks the 3' SS, resulting in ligated exons and the released linear intron.
  • Group II self-splicing introns are excised by a mechanism that bears similarities to pre-mRNA splicing, including the production of lariats.
  • Group II introns catalyze two transesterification reactions to excise themselves from pre- messenger RNA.
  • the 2'-OH of a bulged adenosine residue is used as the nucleophile to attack the 5' splice site.
  • This is followed by a second step in which the free 3'-OH of the 5' exon attacks the 3' splice site to form ligated exons.
  • Group II introns have been found in bacteria and in the mitochondrial and chloroplast genomes of fungi, plants, protists, and an annelid worm.
  • Group II intron RNAs are characterized by a conserved secondary structure, which spans 400-800 nucleotides and is organized into six domains, DI- VI, radiating from a central “wheel”. These domains interact to form a conserved tertiary structure that brings together distant sequences to form an active site.
  • the active site binds the splice sites and branch-point nucleotide residue and uses specifically bound Mg++ ions to activate the appropriate bonds for catalysis. See Lammbowitz AM, et al., Cold Spring Harb Perspect Biol., 3(8):a003616 (2011).
  • Group III introns perform self-splicing via a lariat structure mechanism, similar to intron excision as catalyzed by the spliceosome.
  • a 2’- OH of a defined residue initiates the splicing by attack of the 5’ splice site to form the lariat, which is followed by a second reaction which joins the 3-OH of the 5’ splice site and the 3’ splice site.
  • any type of the foregoing self-splicing introns may be included in the polynucleotide. Particularly preferred for use are Group I and Group II introns; and of these, Group I introns are most preferred. Group I introns do not require any protein factors to splice. In the context of editing the ribosome, the complex 3 -dimensional structure of the ribosome may interfere with splicing of a Group II intron, thus creating a preference for Group I introns. Group II intron may be effective for editing other smaller noncoding RNAs.
  • Group I introns and Group II self-splicing introns are known in the art.
  • Exemplary Group I introns include: Tetrahymena thermophila rRNA intron, Neurospora crassa cytochrome b gene intron 1 , Neurospora crassa mitochondrial RRNA, Neurospora crassa cytochrome oxidase subunit 1 gene oxi3 intron, phage T4 thymidylate synthase intron, Clamydoronas reinhardtrii 23S rRNA Cr.LSU intron, phage T4 nrdB intron, and Anabaena pre tRNA(Leu) intron.
  • Group II self-splicing introns include yeast mitochondrial oxi3 gene intron5 ⁇ and Podospora anserina cytochrome c oxidase I gene.
  • the self-splicing intron is a naturally occurring intron, e.g., a naturally occurring group I intron.
  • Suitable self-splicing introns include naturally occurring self-splicing introns from or derived from Tetrahymena thermophila, Nostoc punctiforme, Bacillus anthracis, Tilletiopsis flava, Azoarcus, Pneumocystis carinii, Agrobacterium, or T7-like bacteriophage, Bacteriophage T4.
  • An exemplary suitable Tetrahymena thermophila self-splicing intron is the intron encoded by the following sequence:
  • the polynucleotide includes the sequence of SEQ ID NO:2 or a sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO:2.
  • the polynucleotide includes the sequence of any sequence of Table 1 (i.e., any one of SEQ ID NOS:2-12), or a sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity to thereto (i.e., any one of SEQ ID NOS:2-12).
  • the self-splicing intron is a non-naturally occurring intron.
  • chimeric self-splicing introns can be used.
  • An exemplary chimeric self-splicing intron includes segments derived from Tetrahymena thermophila and Tilletiopsis flava, whose sequence is provided below:
  • the polynucleotide includes the sequence of SEQ ID NO:1 or a sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO:1.
  • spliceosomal introns instead of self-splicing (e.g., group I and II) introns.
  • self-splicing e.g., group I and II
  • Spliceosomal introns which have been found in most eukaryotic genes, are non-coding sequences excised from pre-mRNAs by a special complex called spliceosome during mRNA splicing.
  • Introns occur in both protein- and RNA-coding genes and can be found in coding and untranslated gene regions. Such introns can be (e.g., as naturally occurring introns), or used to derive (e.g., as chimeric introns) intron(s) utilized in the disclosed compositions, methods, and strategies. See, e.g., Poverennaya and Roytberg, “Spliceosomal Introns: Features, Functions, and Evolution,” Biochemistry (Moscow), volume 85, pages 725-734 (2020), which is specifically incorporated by reference herein in its entirety.
  • the cells have functional spliceosomes.
  • this embodiment is most typically reserved for eukaryotic cells.
  • the polynucleotides include a sequence heterologous to the nucleotide sequence encoding the intron.
  • the heterologous sequence preferably does not include a sequence endogenous to or flanking the intron in its native context, such as the exon(s) or other coding or non- coding sequence(s) (e.g., 5, 10, 25, 50, 100, etc., bases) upstream and/or downstream of the intron in the organism from which it is derived.
  • the heterologous sequence is a sequence from an organism distinct from the source of the intron, e.g., a self-splicing intron.
  • the self-splicing intron can be a Tetrahymena thermophila intron and the heterologous sequence can be a bacterial or human exon sequence or other coding or non-coding sequence(s).
  • the heterologous sequence can be positioned upstream or downstream, or preferably, upstream and downstream, of the intron.
  • the intron is preferably inserted into or otherwise interrupts the heterologous sequence at its native genomic locus, thus providing an anchor or target for gene editing of the heterologous sequence.
  • the heterologous sequence is or includes a repetitive element (also referred to as “repeat element”).
  • a repetitive element also referred to as “repeat element”.
  • the initial sequencing of the human genome revealed that repetitive DNA sequences accounts for -55% of the genome. More recent computational approaches indicate the proportion of repetitive elements in the human genome may be as high as two-thirds. Repetitive elements differ in their position in the genome, sequence, size, number of copies, and presence or absence of coding regions within them. Identified repetitive DNA sequences can be characterized using five broad categories. Four minor categories, accounting for -10% of genomic DNA, include simple sequence repeats, segmental duplications, tandem repeats and satellite DNA sequences, and processed pseudogenes. The fifth category is transposable elements, accounting for -45% of genomic DNA.
  • Microsatellites are tandemly repeated sequences, containing units that are 1-6 base pairs long, repeated up to a length of 100 bp or more. Minisatellites form arrays of several hundred units of 7 to 100 bp in length. They are present everywhere with an increasing concentration toward the telomeres. They differ from satellites in that they are found only in moderate numbers of tandem repeats and because of their high degree of dispersion throughout chromosomes.
  • transposable elements can be divided into DNA transposons and retrotransposons. The latter are predominant in most mammals. Transposable elements are primarily composed of retrotransposons. Retro transpos able elements (RTEs) are parasitic DNA sequences that can proliferate by a “copy and paste” mechanism and insert themselves into new genomic positions. RTEs are classified into Long Terminal Repeat (LTR) elements, whose structure and mechanism of retrotransposition resembles that of retroviruses, and non-LTR elements, which do not contain LTRs, resemble integrated mRNAs, and have a distinct mechanism of retrotransposition. The non-LTR elements can be classified as either Long Interspersed Nuclear Elements (LINEs) or Short Interspersed Nuclear Elements (SINEs), predominantly represented by the L1 and Alu families, respectively.
  • LTR Long Terminal Repeat
  • SINEs Short Interspersed Nuclear Elements
  • the heterologous sequence is or includes a repetitive element, such as, a ribosomal, particular a ribosomal RNA (rRNA), gene or portion thereof, a tRNA gene or portion thereof, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, a telomere, a CRISPR array, etc.
  • rRNA ribosomal
  • IS insertion sequence
  • any foreign or heterologous or recombinant sequence can be introduced into a cell that already contains that sequence or one with high sequence similarity, rendering the native and introduced elements as repetitive.
  • the repetitive element is artificially or synthetically created.
  • the heterologous sequence(s) need not be repetitive elements.
  • the heterologous sequence(s) can be non-repetitive elements.
  • the heterologous sequence(s) can be coding or non-coding regions.
  • the heterologous sequence(s) can encode proteins or RNA or other functional or non-functional genetic elements.
  • the heterologous sequence(s) form part or all of a coding region(s), a non-coding region(s), or a combination thereof, of a gene.
  • part or all of the heterologous sequence e.g., a gene
  • the intron e.g., a self-splicing intron, is scarlessly removed during or after transcription.
  • the cells contain any of the polynucleotides disclosed herein.
  • a polynucleotide including a sequence encoding an intron e.g., a self-splicing intron or spliceosomal intron, which can be naturally occurring or a chimeric, for example a chimeric self-splicing intron such as the self- splicing intron encoded by the sequence of SEQ ID NO:1 or a sequence having at least 85% identity to SEQ ID NO:1).
  • the cells harbor a polynucleotide including the sequence of any one of SEQ ID NOS:2-12 or a sequence having at least 85% identity to any one of SEQ ID NOS:2-12.
  • the cells harbor a polynucleotide including a sequence encoding an intron operably linked to a heterologous sequence (e.g., a repetitive element, such as, an rRNA gene or portion thereof, a tRNA gene or portion thereof, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, a telomere, or a CRISPR array).
  • a heterologous sequence e.g., a repetitive element, such as, an rRNA gene or portion thereof, a tRNA gene or portion thereof, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, a telomere, or a CRISPR array.
  • the intron replaces an endogenous intron in the cell, such as an intron endogenous to or contained within the heterologous sequence
  • the heterologous sequence may be present in its uninterrupted form in the host cells prior to introduction of the intron alone or in combination with other uninterrupted, native instances of the sequences, particularly where the heterologous sequences are a repetitive element.
  • the heterologous sequence can be, and preferably it, derived from the host cell’s genome.
  • the genomic sequence can be a native (e.g., endogenous) sequence or it can be a foreign or recombinant sequence.
  • the heterologous sequence is exemplified with an orthogonal rRNA (oRiboT) in a recombinant E. coli genome.
  • the polynucleotides can be introduced into the cell by any suitable approach known in the art, including transformation, transduction, gene gun, microinjection, transfection, electroporation, and nucleofection.
  • Transfection techniques are known in the art. See, e.g., Angel and Yanik PLoS ONE 5(7): el 1756. doi: 10.1371/journal.pone.0011756. (2010), the commercially available TransMessenger® reagents from Qiagen, StemfectTM RNA Transfection Kit from Stemgent, and TransIT®-mRNA Transfection Kit from Mirus Bio LLC, and Clonetegration (e.g., St-Pierre, F.o. et al. ACS synthetic biology 2, 537-541 (2013).
  • the polynucleotide can be in the form of a vectors. Many vectors, e.g. plasmids, cosmids, minicircles, phage, viruses, etc., useful for transferring nucleic acids into target cells are available.
  • the polynucleotide e.g., plasmid or viral vector
  • the vector may be maintained episomally, e.g. as plasmids, minicircle DNAs, viruses such cytomegalovirus, adenovirus, etc., or they may be integrated into the target cell genome, through homologous recombination or random integration, e.g. retrovirus-derived vectors such as MMLV, HIV-1, ALV, etc.
  • the polynucleotide Upon introduction into the cell, the polynucleotide can be expressed by the cellular machinery.
  • Suitable expression vectors include, but are not limited to, viral vectors (e.g. viral vectors based on vaccinia virus; poliovirus; adenovirus; adeno- associated virus; SV40; herpes simplex virus; human immunodeficiency virus; a retroviral vector (e.g., Murine Leukemia Virus, spleen necrosis virus, and vectors derived from retroviruses such as Rous Sarcoma Virus, Harvey Sarcoma Virus, avian leukosis virus, a lentivirus, human immunodeficiency virus, myeloproliferative sarcoma virus, and mammary tumor virus); and the like.
  • viral vectors e.g. viral vectors based on vaccinia virus; poliovirus; adenovirus; adeno- associated virus; SV40; herpes simplex virus; human immunodeficiency virus; a retroviral vector (e.g., Murine Leukemia
  • Suitable expression vectors are known to those of skill in the art, and many are commercially available, including, pXT1, pSG5 (Stratagene), pSVK3, pBPV, pMSG, and pSVLSV40 (Pharmacia). However, any other vector may be used so long as it is compatible with the host cell.
  • the cell is a prokaryotic cell (e.g., an archaeal or bacterial cell). In some embodiments, the cell is E. coli.
  • the cell is a eukaryotic cell.
  • the cell can be a cell of a single-cell eukaryotic organism, a plant, cell, an algal cell, a fungal cell (e.g., a yeast cell).
  • the cell can be a mammalian cell.
  • the mammalian cell can be human or non-human mammal, e.g., primate, bovine, ovine, porcine, canine, rodent, monkey, rat, or mouse cell.
  • the cell is a human cell including, but not limited to, skin cells, lung cells, heart cells, kidney cells, pancreatic cells, muscle cells, neuronal cells, human embryonic stem cells, blood cells (e.g., white blood cells), fibroblasts, bone cells, hepatocytes, pancreatic cells, and pluripotent stem cells.
  • the cell can be a T cell (e.g., CD8+ T cells, CD4+ T cells), hematopoietic stem cells (HSC), macrophages, natural killer cells (NK), B cells, dendritic cells (DC), or other immune cells.
  • T cell e.g., CD8+ T cells, CD4+ T cells
  • HSC hematopoietic stem cells
  • NK natural killer cells
  • DC dendritic cells
  • the cell is from an established cell line or primary cells, where “primary cells,” refers to cells and cells cultures that have been derived from a subject and allowed to grow in vitro for a limited number of passages or splittings of the culture.
  • primary cells may have been passaged 0 times, 1 time, 2 times, 4 times, 5 times, 10 times, or 15 times, but not enough times go through the crisis stage.
  • the primary cell lines of are maintained for fewer than 10 passages ex vivo.
  • the cells may be harvested from an individual by any convenient method.
  • leukocytes may be conveniently harvested by apheresis, leukocytapheresis, density gradient separation, etc., while cells from tissues such as skin, muscle, bone marrow, spleen, liver, pancreas, lung, intestine, stomach, etc. are most conveniently harvested by biopsy.
  • An appropriate solution may be used for dispersion or suspension of the harvested cells.
  • Such solution can be, for example, a balanced salt solution, e.g.
  • the cells may be used immediately, or they may be stored, frozen, for long periods of time, being thawed and capable of being reused.
  • Gene editing technologies are preferably used to mediate incorporation of a donor oligonucleotide at one or more target sites in the disclosed methods for modifying cellular genomes.
  • Exemplary gene editing technologies include, without limitation, a CRISPR system (e.g., CRISPR/Cas9, base editing, prime editing, etc.), MAGE, zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), triplex-forming compositions, pseudocomplementary oligonucleotides, intron encoded meganucleases small fragment homologous replacement (e.g., polynucleotide small DNA fragments (SDFs)), single- stranded oligodeoxynucleotide-mediated gene modification (e.g., ssODN/SSOs), and intron encoded meganucleases, etc.
  • the gene editing technologies is a CRISPR system, MAGE, zinc finger nucleases (ZFNs), or
  • MAGE refers to multiplex automated genome evolution, and generally includes introducing multiple nucleic acid sequences into one or more cells such that the entire cell culture approaches a state involving a set of changes to each genome or targeted region (Wang et al., Nature, 460:894 (2009)).
  • the method can be used to generate one specific configuration of alleles or can be used for combinatorial exploration of designed alleles optionally including additional random, i.e., not-designed, changes. This can be used with any of a variety of devices that allow the cyclic addition of many DNAs in parallel in random or specific order, with or without use of one or more selectable markers.
  • MAGE-based methods typically include introducing multiple nucleic acid sequences into a cell including the steps of transforming or transfecting a cell(s) using transformation medium or transfection medium including at least one nucleic acid oligomer (also referred to herein as a “donor oligonucleotide”) containing one or more mutations, replacing the transformation medium or transfection medium with growth medium, incubating the cell in the growth medium, and repeating the steps if necessary or desired until multiple nucleic acid sequences have been introduced into the cell.
  • transformation medium or transfection medium including at least one nucleic acid oligomer (also referred to herein as a “donor oligonucleotide”) containing one or more mutations
  • the one or more nucleic acid oligomers is a pool of oligomers having a diversity of different random or non-random mutations at the location(s) of desired mutagenesis.
  • Cells are transfected with a variety of combination of nucleotides leading to the formation of a diverse genomic library of mutants.
  • the diversity of the library can be increased by increasing the number of MAGE cycles.
  • the oligomers can be single-stranded DNA.
  • multiple mutations are generated in a chromosome or in a genome.
  • the oligos are incorporated into the lagging strand of the replication fork during DNA replication, creating a new allele that will spread through the population as the bacteria divide.
  • the efficiency of oligo incorporation depends on several factors, but the frequency of the allele can be increased by performing multiple rounds of MAGE on the same cell culture.
  • genetic diversity of the mutants can be tuned by the number of cycles of mutagenesis.
  • increasing the number of cycles of mutagenesis generally increases the diversity of the library.
  • a library is prepared by one or more cycles of MAGE, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more cycles, with or without intervening cycles of selection.
  • a library of mutants is prepared by, for example, between 1 and 50, between 3 and 15, between 5 and 9 cycles of MAGE. The cycles can occur without intervening rounds of selection to increase the diversity of the library prior to selection.
  • the methods can also be modified to include additional or alternative steps to improve genetic diversity. See, for example, Carr, et al., Nucleic Acids Research, l;40(17):el32, 12 pages (2012), and Gregg, et al., Nucleic Acids Research,' 42(7):4779-90 (2014).
  • Genetic diversity can also be tuned by selecting the number and diversity of the oligonucleotides introduced during any step of the mutagenesis processes. It will be appreciated that the number of oligonucleotides can be increased, that the oligonucleotides can include one or multiple mutations per oligonucleotide and therefore target multiple position (e.g., amino acid positions encoded by the target DNA); that the oligonucleotides can introduce various types of mutations (mismatches, insertions, deletions and with varying degrees of degeneracy (4N — A, T, G, C, 2 selected therefrom, or 3 selected therefrom) or specificity (N equals specific nt).
  • MAGE experiments can be divided into three classes, characterized by varying degrees of scale and complexity: (i) many target sites, single genetic mutations; (ii) single target site, many genetic mutations; and (iii) many target sites, many genetic mutations.
  • MAGE has been used to recode all 321 instances of the TAG stop codon for the synonymous TAA codon using 321 discrete ssDNAs. This project yielded a strain of E. coli with only 63 ‘active’ codons and a 64th ‘blank’ codon available for site- specific incorporation of nonstandard amino acids.
  • MAGE can be used to explore the effects of all possible amino acid substitutions at a single target locus.
  • MAGE can be used to construct diverse cell populations containing combinations of alleles across many loci involved, for example, in a biosynthetic pathway.
  • discrete oligos designed to knockout competing pathways by deletion can be mixed with degenerate oligos designed to randomize target positions in the coding sequence or regulatory regions of key pathway enzymes.
  • the highly diverse population resulting from a MAGE experiment can be used downstream to screen or select for mutants with a prescribed phenotype (e.g., overproduction of a metabolite or small molecule).
  • MAGE has also been developed in eukaryotic systems in the form of eMAGE, see, e.g., Barbieri, et al, Cell 171, 1-15 (2017). Like MAGE in bacteria, eukaryotic MAGE directs the annealing of synthetic ssDNA at the lagging strand of DNA replication. The mechanism is independent of Rad51- directed homologous recombination and avoids the creation of double-strand DNA breaks, allowing precise chromosome modifications at single base-pair resolution with an efficiency of >40%, without unintended mutagenic changes at the targeted genetic loci. Simultaneous incorporation of up to 12 oligonucleotides with as many as 60 targeted mutations have been observed in one transformation. Iterative transformations of a complex pool of oligonucleotides rapidly produced large combinatorial genomic diversity >10 5 .
  • This method was used to diversify a heterologous b-carotene biosynthetic pathway that produced genetic variants with precise mutations in promoters, genes, and terminators, leading to altered carotenoid levels.
  • the approach of engineering the conserved processes of DNA replication, repair, and recombination can be automated and establishes a general strategy for multiplex combinatorial genome engineering in eukaryotes. Given the analogous mechanism of annealing ssDNA, the disclosed filtered editing approach can be easily applied for the modification of repetitive genetic elements in eukaryotes.
  • MAGE-based mutagenesis is one example, suitable alternative methods of mutagenesis which are well known in the art can be used to create a library of variants.
  • Exemplary methods include, but are not limited to, error prone PCR, PCR or overlap-elongation PCR with degenerate primers, custom DNA synthesis of degenerate DNA fragments encoding the library of interest. ii. CRISPR/Cas
  • the gene editing technology is the CRISPR/Cas system.
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeats
  • the prokaryotic CRISPR/Cas system has been adapted for use as gene editing (silencing, enhancing or changing specific genes) for use in eukaryotes (see, for example, Cong, Science, 15 :339(6121) : 819— 823 (2013) and Jinek, et al., Science, 337(6096):816-21 (2012)).
  • the organism's genome can be cut and modified at any desired location.
  • Methods of preparing compositions for use in genome editing using the CRISPR/Cas systems are described in detail in WO 2013/176772 and WO 2014/018423.
  • CRISPR system refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g., tracrRNA or an active partial tracrRNA), a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a “spacer” in the context of an endogenous CRISPR system), or other sequences and transcripts from a CRISPR locus.
  • a tracr trans-activating CRISPR
  • tracr-mate sequence encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system
  • guide sequence also referred to as a “spacer” in the context of an endogenous CRISPR system
  • One or more tracr mate sequences operably linked to a guide sequence can also be referred to as pre-crRNA (pre-CRISPR RNA) before processing or crRNA after processing by a nuclease.
  • pre-crRNA pre-CRISPR RNA
  • a tracrRNA and crRNA are linked and form a chimeric crRNA-tracrRNA hybrid where a mature crRNA is fused to a partial tracrRNA via a synthetic stem loop to mimic the natural crRNA:tracrRNA duplex as described in Cong, Science, 15:339(6121): 819— 823 (2013) and Jinek, et al., Science, 337(6096):816-21 (2012)).
  • a single fused crRNA-tracrRNA construct can also be referred to as a guide RNA or gRNA (or single- guide RNA (sgRNA)).
  • the crRNA portion can be identified as the “target sequence” and the tracrRNA is often referred to as the “scaffold.”
  • the gRNA or sgRNA are designed to target Cas endonuclease cuts in the intron and/or the heterologous sequence adjacent thereto in a manner sufficient to increase or otherwise direct mutagenesis (preferably by recombination of a donor oligonucleotide) at a target site, typically also in the heterologous sequence(s) adjacent to the intron.
  • one or more vectors driving expression of one or more elements of a CRISPR system are introduced into a target cell such that expression of the elements of the CRISPR system direct formation of a CRISPR complex at one or more target sites. While the specifics can be varied in different engineered CRISPR systems, the overall methodology is similar.
  • a practitioner interested in using CRISPR technology to target a DNA sequence can insert a short DNA fragment containing the target sequence into a guide RNA expression plasmid.
  • the sgRNA expression plasmid contains the target sequence (about 20 nucleotides), a form of the tracrRNA sequence (the scaffold) as well as a suitable promoter and necessary elements for proper processing in eukaryotic cells.
  • Such vectors are commercially available (see, for example, Addgene). Many of the systems rely on custom, complementary oligomers that are annealed to form a double stranded DNA and then cloned into the sgRNA expression plasmid. Co-expression of the sgRNA and the appropriate Cas enzyme from the same or separate plasmids in transfected cells results in a single or double strand break (depending of the activity of the Cas enzyme) at the desired target site.
  • a vector includes a regulatory element operably linked to an enzyme-coding sequence encoding a CRISPR enzyme, such as a Cas protein.
  • Cas proteins include Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csn1 and Csx12), Cas1O, Csy1, Csy2, Csy3, Cse1, Cse2, Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx1O, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, Cpf1, homologues thereof, or modified
  • the unmodified CRISPR enzyme has DNA cleavage activity, such as Cas9.
  • the CRISPR enzyme directs cleavage of one or both strands at the location of a target sequence, such as within the target sequence and/or within the complement of the target sequence. In some embodiments, the CRISPR enzyme directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence.
  • the CRISPR/Cas system may contain an enzyme that is mutated with respect to a corresponding wild-type enzyme such that the mutated CRISPR enzyme lacks the ability to cleave one or both strands of a target polynucleotide containing a target sequence.
  • the Cas9 nickase was developed. For example, an aspartate-to-alanine substitution (D10A) in the RuvC I catalytic domain of Cas9 from S. pyogenes converts Cas9 from a nuclease that cleaves both strands to a nickase (cleaves a single strand).
  • residues D10, G12, G17, E762, H840, N854, N863, H982, H983, A984, D986, and/or A987 can be substituted.
  • Specific mutations that render Cas9 a nickase include, without limitation, H840A, N854A, and N863A. Mutations other than alanine substitutions are also suitable.
  • Two or more catalytic domains of Cas9 (RuvC I, RuvC II, and RuvC III) can be mutated to produce a mutated Cas9 substantially lacking all DNA cleavage activity.
  • a D10A mutation may be combined with one or more of H840A, N854A, or N863A mutations to produce a Cas9 enzyme substantially lacking all DNA cleavage activity (e.g., when activity of the mutated enzyme is less than about 25%, 10%, 5%>, 1%>, 0.1 %>, 0.01%, or lower with respect to its non-mutated form).
  • variants of Cas9 such as for example, a Cas9 nickase are employed in the gene editing technologies containing a CRISPR/Cas system.
  • Nickases can lower the probability of off-target editing, for example, when used with two adjacent gRNAs.
  • a Cas9 nickase having a D10A mutation cleaves only the target strand.
  • a Cas9 nickase having an H840A mutation in the HNH domain creates a non-target strand-cleaving nickase.
  • WT Cas9 and one gRNA one can create a staggered cut using a Cas9 nickase and two gRNAs.
  • the gene editing technology is a Crispr/Cas9 or Crispr/Cas9 nickase (e.g., D10A, H840A, N854A, and N863A nickase).
  • the gene editing technology is, or includes, base editing or prime editing.
  • base editing or prime editing See, e.g., Kantor, et al., Int J Mol Sci. 2020 Sep; 21(17): 6240, which is specifically incorporated by reference herein in its entirety. Due to reliance on homologous recombination, HDR-mediated editing is restricted to dividing cell types, limiting the range of diseases that can be targeted. CRISPR/Cas-mediated single-base-pair editing systems have been devised to bypass these limitations.
  • DNA base-editors encompass two key components: a Cas enzyme for programmable DNA binding and a single- stranded DNA modifying enzyme for targeted nucleotide alteration.
  • BEs DNA base-editors
  • CBEs cytosine base-editors
  • ABEs adenine base-editors
  • Prime-editors are the latest addition to the CRISPR genome- engineering toolkit and represents an approach to expand the scope of donor- free precise DNA editing to not only all transition and transversion mutations, but small insertion and deletion mutations as well.
  • Prime-editing does not rely on DSBs.
  • Prime-editors use an engineered reverse transcriptase fused to Cas9 nickase and a prime-editing guide RNA (pegRNA).
  • PegRNA differs from regular sgRNAs and plays a major role in the system’s function.
  • the pegRNA contains not only (a) the sequence complimentary to the target sites that directs nCas9 to its target sequence, but also (b) an additional sequence spelling the desired sequence changes.
  • the 5' of the pegRNA binds to the primer binding site (PBS) region on the DNA, exposing the non- complimentary strand.
  • PBS primer binding site
  • the unbound DNA of the PAM-containing strand is nicked by Cas9, creating a primer for the reverse transcriptase (RT) that is linked to nCas9.
  • the nicked PAM-strand is then extended by the RT by using the interior of the pegRNA as a template, consequently modifying the target region in a programmable manner.
  • the result of this step is two redundant PAM DNA flaps: the edited 3' flap that was reverse transcribed from the pegRNA and the original, unedited 5' flap.
  • the choice of which flap hybridizes with the non-PAM containing DNA-strand is an equilibrium process, in which the perfectly complimentary 5' would likely be thermodynamically favored.
  • the 5' flaps are preferentially degraded by cellular endonucleases that are ubiquitous during lagging- strand DNA synthesis.
  • the resulting heteroduplex containing the unedited strand and edited 3' flap is resolved and stably integrated into the host genome via cellular replication and repair process.
  • DNA base-editing and prime-editing tools support precise nucleotide substitutions in a programmable manner, without requiring a donor template.
  • the gene editing technology is a zinc finger nuclease (ZFNs) that is engineered to specifically recognize the intron address sequence.
  • ZFNs zinc finger nuclease
  • ZFNs are typically fusion proteins that include a DNA-binding domain derived from a zinc-finger protein linked to a cleavage domain.
  • the most common cleavage domain is the Type IIS enzyme Fok I.
  • Fok I catalyzes double-stranded cleavage of DNA, at 9 nucleotides from its recognition site on one strand and 13 nucleotides from its recognition site on the other. See, for example, U.S. Pat. Nos. 5,356,802; 5,436, 150 and 5,487,994; as well as Li et al. Proc., Natl. Acad. Sci. USA 89 (1992):4275- 4279; Li et al. Proc. Natl. Acad. Sci.
  • Exemplary Type IIS restriction enzymes are described in International Publication WO 07/014275. Additional restriction enzymes also contain separable binding and cleavage domains. See, for example, Roberts et al. Nucleic Acids Res., 31:418-420 (2003).
  • the cleavage domain includes one or more engineered cleavage half-domain (also referred to as dimerization domain mutants) that minimize or prevent homodimerization, as described, for example, in U.S. Published Application Nos. 2005/0064474, 2006/0188987, and 2008/0131962.
  • the cleavage half domain is a mutant of the wild type Fok I cleavage half domain.
  • the cleavage half domain is a wild type Fok I mutant where one or more amino acid residues at positions 446, 447, 479, 483, 484, 486, 487, 490, 491, 496, 498, 499, 500, 531, 534, 537, and 538 is substituted. See, e.g., Example 1 of WO 07/139898, with amino acid residues in the Fok I protein numbered according to Wah et al, (1998) Proc. Natl. Acad. Sci. USA 95: 10564-10569.
  • the cleavage half domains are modified to include nuclear or other localization signals, peptide tags, or other binding domains.
  • the DNA-binding domain which can, in principle, be designed to target any genomic location of interest, can be a tandem array of Cys 2 His 2 zinc fingers, each of which generally recognizes three to four nucleotides in the target DNA sequence.
  • the Cys 2 His 2 domain has a general structure: Phe (sometimes Tyr)-Cys-(2 to 4 amino acids)-Cys-(3 amino acids)- Phe(sometimes Tyr)-(5 amino acids)-Leu-(2 amino acids)-His-(3 amino acids)-His.
  • Another type of zinc finger that binds zinc between 2 pairs of cysteines has been found in a range of DNA binding proteins.
  • the general structure of this type of zinc finger is: Cys-(2 amino acids)-Cys-(13 amino acids)-Cys-(2 amino acids)-Cys. This is called a Cys 2 Cys 2 zinc finger. It is found in a group of proteins known as the steroid receptor superfamily, each of which has 2 Cys 2 Cys 2 zinc fingers.
  • the DNA-binding domain of a ZFN can be composed of two to six zinc fingers. Each zinc finger motif is typically considered to recognize and bind to a three-base pair sequence and as such, a protein including more zinc fingers targets a longer sequence and therefore may have a greater specificity and affinity to the target site.
  • Zinc finger binding domains can be “engineered” to bind to a predetermined nucleotide sequence. See, for example, Beerli et al. Nature Biotechnol. 20: 135-141 (2002); Pabo et al. Ann. Rev. Biochem. 70:313-340 (2001); Isalan et al., Nature Biotechnol. 19:656-660 (2001); Segal et al. Curr. Opin.
  • zinc finger binding domains can be engineered to have a different binding specificity, compared to a naturally-occurring zinc finger protein.
  • Standard ZFNs fuse the cleavage domain to the C-terminus of each zinc finger domain.
  • the two individual ZFNs In order to allow the two cleavage domains to dimerize and cleave DNA, the two individual ZFNs must bind opposite strands of DNA with their C-termini a certain distance apart. As discussed above, the most commonly used linker sequences between the zinc finger domain and the cleavage domain requires the 5' edge of each binding site to be separated by 5 to 7 bp.
  • fusion polypeptides are used for targeted double-stranded DNA cleavage.
  • fusion proteins target a single-stranded cleavage in a double- stranded section of DNA. Fusion proteins of this type are sometimes referred to as nickases, and can in some embodiments be preferred to limit undesired mutations.
  • a nickase is created by blocking or limiting the activity of one half of a fusion half-domain dimer.
  • Rational design includes, for example, using databases including triplet (or quadruplet) nucleotide sequences and individual zinc finger amino acid sequences, in which each triplet or quadruplet nucleotide sequence is associated with one or more amino acid sequences of zinc fingers which bind the particular triplet or quadruplet sequence. See, for example, U.S. Pat. Nos. 6, 140,081; 6,453,242; 6,534,261; 6,610,512; 6,746,838; 6,866,997; 7,067,617; U.S. Published Application Nos.
  • the gene editing technology is a transcription activator-like effector nuclease (TALEN) that is engineered to specifically recognize the intron address sequence.
  • TALENs have an overall architecture similar to that of ZFNs, with the main difference that the DNA-binding domain comes from TAL effector proteins, transcription factors from plant pathogenic bacteria.
  • the DNA-binding domain of a TALEN is a tandem array of amino acid repeats, each about 34 residues long. The repeats are very similar to each other; typically they differ principally at two positions (amino acids 12 and 13, called the repeat variable diresidue, or RVD).
  • Each RVD specifies preferential binding to one of the four possible nucleotides, meaning that each TALEN repeat binds to a single base pair, though the NN RVD is known to bind adenines in addition to guanine.
  • TAL effector DNA binding is mechanistically less well understood than that of zinc-finger proteins, but their seemingly simpler code could prove very beneficial for engineered-nuclease design.
  • TALENs also cleave as dimers, have relatively long target sequences (the shortest reported so far binds 13 nucleotides per monomer) and appear to have less stringent requirements than ZFNs for the length of the spacer between binding sites.
  • Monomeric and dimeric TALENs can include more than 10, more than 14, more than 20, or more than 24 repeats.
  • TALENs using the +63 C-terminal truncation have been shown to cleave over a wide range of spacers. This makes design of TALENs easier and increases the number of potential sequences that can be targeted, but it also increases the number of potential regions of the genome that could be cleaved through off-target activity.
  • cleavage domain is obtained when the zinc finger proteins bind to target sites separated by approximately 5-6 base pairs.
  • a linker typically a flexible linker rich in glycine and serine, is used to join each zinc finger binding domain to the cleavage domain See, e.g., U.S. Published Application No. 2005/0064474 and PCT Application WO 07/139898.
  • the engineered nuclease may use modified linkers, linkers that are longer or shorter, more or less rigid, etc.
  • the linker may form a stable alpha helix linker. See, e.g., Yan et al. Biochemistry, 46:8517-24 (2007) and Merutka and Stellwagen, Biochemistry, 30:4245-8 (1991). Although the methods described herein are flexible to produce nucleases having a range of linkers, in some embodiments the linkers will be preferentially less than 50 base pairs, less than 30 base pairs, less than 20 base pairs, less than 15 base pairs, or less than 10 base pairs in length. E. Donor oligonucleotides
  • a donor oligonucleotide is incorporated at one or more target sites in a cell’s genome.
  • the donor oligonucleotide can include a sequence that can correct a mutation(s) in the genome, though in some embodiments, the donor introduces one or more mutations.
  • the donor oligonucleotide may also contain synonymous (silent) mutations, which can facilitate detection of the corrected target sequence using allele- specific PCR of genomic DNA isolated from modified cells.
  • the donor oligonucleotide can exist in single stranded (ss) or double stranded (ds) form (e.g., ssDNA, dsDNA).
  • the donor oligonucleotide can be of any length.
  • the size of the donor oligonucleotide may be between 1 to 1000 nucleotides.
  • the donor oligonucleotide is between 25 and 200 nucleotides.
  • the donor oligonucleotide is between 100 and 150 nucleotides.
  • the donor nucleotide is about 50 to 100 nucleotides in length.
  • the donor oligonucleotide may be about 60 nucleotides in length.
  • ssDNAs of length 25-200 are active, e.g., ssDNAs of length 60-90.
  • the preferred length is about 90 nucleotides.
  • Donor oligonucleotides are also referred to as donor fragments, donor nucleic acids, donor DNA, or donor DNA fragments. It is understood in the art that a greater number of homologous positions within the donor fragment will increase the probability that the donor fragment will be inserted or recombined into the target sequence, target region, or target site.
  • Target sequences can be within the coding DNA sequence of a gene or within introns. Target sequences can also be within DNA sequences which regulate expression of the target gene, including promoter or enhancer sequences or sequences that regulate RNA splicing.
  • the donor sequence can contain one or more nucleic acid sequence alterations compared to the sequence of the region targeted for recombination, for example, a point mutation, a substitution, a deletion, or an insertion of one or more nucleotides. Deletions and insertions can result in frameshift mutations or deletions. Point mutations can cause missense or nonsense mutations. These mutations may disrupt, reduce, stop, increase, improve, or otherwise alter the expression of a gene contained in the target region or site.
  • the donor oligonucleotide may correspond to the wild type sequence of a gene (or a portion thereof), for example, a mutated gene involved with a disease or disorder.
  • One or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) different donor oligonucleotide sequences may be used in accordance with the disclosed methods. This may be useful, for example, to create a heterozygous target gene where the two alleles contain different modifications or to create libraries of cells harboring different sequences at one or more target sites.
  • Donor oligonucleotides are preferably DNA oligonucleotides, composed of the principal naturally-occurring nucleotides (thymine, cytosine, adenine and guanine) as the heterocyclic bases, deoxyribose as the sugar moiety, and phosphate ester linkages.
  • Donor oligonucleotides may include modifications to nucleobases, sugar moieties, or backbone/linkages, depending on the desired structure of the replacement sequence at the site of recombination or to provide some resistance to degradation by nucleases.
  • the terminal two or three inter-nucleoside linkages at each end of a ssDNA oligonucleotide may be replaced with phosphorothioate linkages in lieu of the usual phosphodiester linkages, thereby providing increased resistance to exonucleases.
  • Modifications to the donor oligonucleotide should not prevent the donor oligonucleotide from successfully integrating at the target sequence.
  • the donor oligonucleotide includes 1, 2, 3, 4, 5, 6, or more optional phosphorothioate internucleoside linkages. In some embodiments, the donor includes phosphorothioate internucleoside linkages between first 2, 3, 4 or 5 nucleotides, and/or the last 2, 3, 4, or 5 nucleotides in the donor oligonucleotide.
  • Donor oligonucleotides can be either single stranded or double stranded, and can target one or both strands of the genomic sequence at a target locus.
  • the donor oligonucleotides are typically single stranded DNA sequences for MAGE.
  • the reverse complement of each donor, and double stranded DNA sequences, based on the provided sequences may also be used.
  • the donor oligonucleotide is a functional fragment of the disclosed sequence, or the reverse complement, or double stranded DNA thereof.
  • the nuclease activity of some of the gene editing systems described herein cleave target DNA to produce single or double strand breaks in the target DNA.
  • Double strand breaks can be repaired by the cell by non-homologous end joining or homology-directed repair.
  • non- homologous end joining NHEJ
  • the double-strand breaks are repaired by direct ligation of the break ends to one another. As such, no new nucleic acid material is inserted into the site, although some nucleic acid material may be lost, resulting in a deletion.
  • homology-directed repair a donor polynucleotide with homology to the cleaved target DNA sequence is used as a template for repair of the cleaved target DNA sequence, resulting in the transfer of genetic information from a donor polynucleotide to the target DNA.
  • new nucleic acid material can be inserted/copied into the site.
  • the modifications of the target DNA due to NHEJ and/or homology-directed repair can be used to induce gene correction, gene replacement, gene tagging, transgene insertion, nucleotide deletion, gene disruption, gene mutation, etc.
  • the donor polynucleotide typically contains sufficient homology to a genomic sequence at the cleavage site, e.g., 70%, 80%, 85%, 90%, 95%, or 100% homology with the nucleotide sequences flanking the cleavage site, e.g., within about 50 bases or less of the cleavage site, e.g., within about 30 bases, within about 15 bases, within about 10 bases, within about 5 bases, or immediately flanking the cleavage site, to support homology-directed repair between it and the genomic sequence to which it bears homology.
  • sufficient homology to a genomic sequence at the cleavage site e.g., 70%, 80%, 85%, 90%, 95%, or 100% homology with the nucleotide sequences flanking the cleavage site, e.g., within about 50 bases or less of the cleavage site, e.g., within about 30 bases, within about 15 bases, within about 10 bases, within about 5 bases, or immediately flanking
  • the donor sequence may or may not be identical to the genomic sequence that it replaces.
  • the donor sequence may correspond to the wild type sequence (or a portion thereof) of the target sequence (e.g. , a gene).
  • the donor sequence may contain at least one or more single base changes, insertions, deletions, inversions or rearrangements with respect to the genomic sequence, so long as sufficient homology is present to support homology-directed repair.
  • the donor sequence includes a non-homologous sequence flanked by two regions of homology, such that homology-directed repair between the target DNA region and the two flanking sequences results in insertion of the non-homologous sequence at the target region.
  • the donor oligonucleotides can be used to add, i.e., insert or replace, nucleic acid material to a target DNA sequence (e.g., to “knock in” a nucleic acid that encodes for a protein, an siRNA, an miRNA, etc.), to add a tag (e.g., 6xHis, a fluorescent protein (e.g., a green fluorescent protein; a yellow fluorescent protein, etc.), hemagglutinin (HA), FLAG, etc.), to add a regulatory sequence to a gene (e.g., promoter, polyadenylation signal, internal ribosome entry sequence (IRES), 2A peptide, start codon, stop codon, splice signal, localization signal, etc.), insert random nucleotides (e.g., NNNNN, where N is any nucleotide), or to otherwise modify a nucleic acid sequence (e.g., introduce a mutation).
  • a target DNA sequence e.
  • the donor oligonucleotide should possess sufficient sequence homology to both the nucleotide sequence encoding the intron and the heterologous sequence/target site.
  • the donor oligonucleotide contains the following components: a 5’ homology arm, a replacement sequence (e.g., the sequence desired to be integrated into the genome), and a 3’ homology arm.
  • the homology arms provide for insertion or recombination into the chromosome (e.g., at the target site), thus replacing a portion of the endogenous genomic sequence with the replacement sequence.
  • the 3’ end of the 5’ homology arm is the position next to the 5 ’ end of the replacement sequence.
  • the 5’ end of the 3’ homology arm is the position next to the 3’ end of the replacement sequence.
  • the 5’ homology arm of the donor oligonucleotide is homologous to the intron and the 3’ homology arm of the donor oligonucleotide is homologous to the target site. In some embodiments, the 5’ homology arm of the donor oligonucleotide is homologous to the target site and the 3 ’ homology arm of the donor oligonucleotide is homologous to the intron. The extent of homology to the intron and target site can vary.
  • the 5’ homology arm of a donor oligonucleotide can include about 10, 15, 20, 25, 50, 75, 100, 150, 200, 300, 400, 500, or more nucleotides homologous to the intron sequence or the target site.
  • the 3’ homology arm of a donor oligonucleotide can include about 10, 15, 20, 25, 50, 75, 100, 150, 200, 300, 400, 500, or more nucleotides homologous to the intron sequence or the target site.
  • the 5’ and/or 3’ homology arms of the donor oligonucleotide can overlap with one or more (e.g., about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 or more) nucleotides of the intron sequence or target sequence.
  • Donor oligonucleotides should be designed based on the target site and the requirements and/or preferences of the gene editing technology with which it/they will be used.
  • MAGE donor oligonucleotides are typically about 90 bases long. The first four 5’ bases can be phosphorothioated.
  • the oligonucleotide is designed to match the sequence of the region of interest (with the exception of the desired mutations) such that it will be incorporated into the lagging strand during replication. To determine which genomic strand to use as the template, it is necessary to determine whether the gene is in replichore 1 or 2 and whether it is on the + or - strand. The mismatches, insertions, and/or deletions in the sequence must be centered on the oligonucleotide, and there should be as few alterations as possible, since each change will lower the efficiency of incorporation into the genome.
  • the extent of homology to the intron and target site can depend on the total length of the donor oligonucleotide, which in turn is impacted by the gene editing technology being used.
  • CRISPR/Cas donor oligonucleotides are typically about 1000 nucleotides long, thus the 5’ and/or 3’ homology arms may be longer for such oligos compared to MAGE oligos.
  • the 5’ and/or 3’ homology arm of a CRISPR/Cas donor oligonucleotide is about 10, 15, 20, 25, 50, 75, 100, 150, 200, 300, 400, 500, or more nucleotides in length.
  • MAGE donor oligonucleotides are typically about 90 nucleotides long.
  • the 5’ and/or 3’ homology arm of a MAGE donor oligonucleotide is about 25, 30, 35, 40, 45, 50, or more nucleotides in length.
  • the Gibbs Free Energy of homodimer formation less than 12 kcal/mol.
  • compositions can be used in methods of genome engineering as well as modification of extragenomic targets. Such methods include targeted editing and/or continuous evolution of target sites. Modification may be performed in vivo, ex vivo, and/or in vitro (e.g., to produce genetically modified cells that can be reintroduced into a living organism). Engineering can be performed at one target site or multiple target sites, either in parallel or tandem. In preferred embodiments, the modification is genomic modification.
  • An exemplary method for modifying the genome or extragenomic site of a cell at one or more target sites includes integrating a sequence encoding an intron adjacent to each of the one or more target sites. Any of the disclosed introns can be used in accordance with the method.
  • a donor oligonucleotide is incorporated (e.g., by insertion or recombination (HDR)) at one or more target sites via a gene editing technology.
  • HDR insertion or recombination
  • a CRISPR system e.g., CRISPR/Cas9
  • MAGE multiplex automated genome engineering
  • Other gene editing systems including but limited to those discussed elsewhere herein can also be used.
  • the methods are especially useful for editing of repetitive elements, e.g., repetitive genomic elements. Because the integration of the intron at a specific locus containing a repetitive element in effect constitutes a unique genetic address, the donor oligonucleotide can be preferably incorporated at the specific locus as compared to other loci where copies of the repetitive genomic element may be present.
  • the integration can be accomplished by any suitable means including, for example, traditional cloning methods, CRISPR/Cas, etc. In some embodiments, the integration is confirmed by allele specific PCR, sequencing, etc.
  • the one or more target sites to be modified can be or include a ribosomal gene, for example a native or non- native ribosomal RNA (rRNA) gene, a tRNA gene, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, a telomere, a CRISPR array, or another endogenous, exogenous, foreign, recombinant, etc., repetitive element.
  • rRNA native or non- native ribosomal RNA
  • the donor oligonucleotide can be partially or completely homologous to the nucleotide sequence encoding the intron.
  • Working Example 2 shows that when introducing a 90 nt ssODN at a target site using MAGE, a preferred range of about 30-50 nucleotides of homology to the self-splicing intron increased efficiency without significant off-target integration (see Fig. 2E).
  • the donor oligonucleotide can be partially or completely homologous, except for any nucleotide(s) to be mutated, to the one or more target sites to be modified.
  • the donor oligonucleotide can be partially or completely homologous to both the intron and the target site to be modified (e.g., a ribosomal gene such as an rRNA gene, a tRNA gene, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, a telomere, a CRISPR array, etc.), or a segment thereof.
  • a ribosomal gene such as an rRNA gene, a tRNA gene, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, a telomere, a CRISPR array, etc.
  • the 5’ arm of the donor oligonucleotide can be partially or completely homologous to the intron and the 3 ’ arm of the donor oligonucleotide can be partially or completely homologous to the target site to be modified (i.e., except for the nucleotide(s) to be mutated).
  • the 3’ arm of the donor oligonucleotide can be partially or completely homologous to the intron and the 5 ’ arm of the donor oligonucleotide can be partially or completely homologous to the target site to be modified (i.e., except for the nucleotide(s) to be mutated).
  • One of ordinary skill in the art would be able to determine the suitable amount of homology needed to maximize the efficiency of integration.
  • the donor oligonucleotide can include one or more mutations relative to the target sites where it is incorporated.
  • the mutations can be targeted (e.g., a specific desirable sequence) or random.
  • the donor oligonucleotide is single- stranded DNA (ssDNA) or double- stranded DNA (dsDNA).
  • the methods for genome modification can be used to modify a genome at two or more target sites (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, or more).
  • the cell’s genome is modified at two or more target sites.
  • the intron integrated adjacent to the first target site can be the same or distinct (e.g., having a different sequence) from the intron integrated adjacent to the second and/or subsequent target site.
  • one or more introns e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more
  • targeted editing is performed at a first target site (e.g., a specific alteration is introduced), while randomized diversification is performed at a second site.
  • the cell to be genetically modified can be contacted with a plurality of identical donor oligonucleotides such that the donor oligonucleotide incorporated at the first target site is selected therefrom, and/or the cell can be contacted with a plurality of distinct donor oligonucleotides such that the donor oligonucleotide incorporated at the second target site can be selected from a plurality of distinct oligonucleotides (e.g., having a different sequences, such as random mutations relative to the genomic sequence to be replaced).
  • the targeted editing and randomized diversification can be performed in parallel or in tandem in any desirable order.
  • targeted editing at the first target site is mediated by a CRISPR system and/or randomized diversification at the second target site is mediated by MAGE.
  • the CRISPR system and MAGE can be used in parallel or tandem.
  • the screens are designed to identify genetic alterations involved in one or more phenotypes of interest.
  • the screens can be loss of function or gain of function.
  • the screens can be performed in vitro (e.g., in cultured cells) or in vivo (e.g., in a subject such as a mouse or rat).
  • a method of screening for one or more mutations that confer a desirable phenotype includes modifying the genome of a plurality of cells in accordance with any of the gene editing methods described herein and subsequently selecting for a cell exhibiting the desirable phenotype.
  • the step of selecting can include applying selective pressure to the cells in order to enrich for cells that exhibit a desired phenotype.
  • the selection step involves negative selection, positive selection step, or both negative and positive selection.
  • multiple rounds of genome modification and selection can be used.
  • MAGE is used to perform genome modification in screening methods
  • about 1-10 (e.g., 4, 5, 6, or 7), or 1-100, or 1-1,000 or more MAGE cycles can be performed.
  • Selective pressure can be applied after every cycle to enrich for cells exhibiting the desired phenotype.
  • Different cycles can utilize the same or different pools of donor oligonucleotides targeting the same or different locations.
  • a desirable phenotype particularly in the case of improved ribosomes, can be antibiotic resistance.
  • Selecting for a cell that exhibits antibiotic resistance can include exposing the plurality of cells to an effective amount of one or more antibiotics.
  • the bacteria can be plated onto an agar containing an effective amount of one or more antibiotics.
  • Variants that are resistant to the antibiotic(s) can be isolated, propagated, and/or characterized. Results of the screen can be validated by independently generating cells containing the one or more genomic modifications (e.g., mutations) identified by the screen.
  • other desired phenotypes can include, but are not limited to, the ability to translate difficult-to-translate amino acid sequences, the ability to catalyze translation of non-natural polymers, and improved orthogonal mRNA recognition and translation.
  • Sequencing and allele- specific PCR can be used for determining if gene modification has occurred.
  • PCR primers may be designed to distinguish between the original allele, and the new predicted sequence following recombination. Other methods of determining if a recombination event has occurred are known in the art and may be selected based on the type of modification made.
  • Methods include, but are not limited to, analysis of genomic DNA, for example by sequencing, allele- specific PCR, droplet digital PCR, or restriction endonuclease selective PCR (REMS -PCR); analysis of mRNA transcribed from the target gene for example by northern blot, in situ hybridization, real-time or quantitative reverse transcriptase (RT) PCR; and analysis of the polypeptide encoded by the target gene, for example, by immunostaining, ELISA, or FACS. In some cases, modified cells will be compared to parental controls. Other methods may include testing for changes in the function of the RNA transcribed by, or the polypeptide encoded by the target gene. For example, if the target gene encodes an enzyme, an assay designed to test enzyme function may be used. IV. Ribosomes
  • the disclosed compositions, methods, and strategies can be utilized for directed mutagenesis and evolutions.
  • rRNA was targeted for mutagenesis and improved ribosomes were engineered.
  • engineered ribosomes are also disclosed.
  • the engineered ribosome is a prokaryotic ribosome, e.g., a bacterial ribosome.
  • the engineered bacterial ribosome includes a linker tethering the rRNA of the small subunit (16S rRNA) with the rRNA of the large subunit (23S rRNA), herein referred to as tethered ribosomes.
  • Tethered ribosomes and methods of making thereof are known in the art. See, for example, International Published Application No. WO 2015/184283 and Orelle, C., et al., Nature 524: 119-124 (2015), which are hereby incorporated by reference in their entirety.
  • the engineered ribosome contains one or more mutations relative to a reference, such as a naturally occurring or non-engineered ribosome, or a known, previously engineered ribosome such oRiboT.
  • the mutation is a gain-of-function mutation or a loss- of-function mutation.
  • a gain-of- function mutation may be any mutation that confers a new function.
  • a loss-of-function mutation may be any mutation that results in the loss or reduction of a function possessed by the parent.
  • the mutation may be in the peptidyl transferase center of the ribosome.
  • the mutation may be in an A-site of the peptidyl transferase center.
  • the mutation may be in the exit tunnel of the engineered ribosome.
  • the ribosome is a ribonucleoprotein machine responsible for protein synthesis. In all kingdoms of life it is composed of two subunits, each built on its own ribosomal RNA (rRNA) scaffold. The independent but coordinated functions of the subunits, including their ability to associate at initiation, rotate during elongation, and dissociate after protein release, are an established paradigm of protein synthesis.
  • the ribosome is an extraordinary complex machine. This large particle, in which RNA is the main structural and functional component, is invariably composed of two subunits that coordinate distinct but complementary functions: the small subunit decodes the mRNA, while the large subunit catalyzes peptide-bond formation and provides the exit tunnel for the polypeptide. The association of the subunits is tightly regulated throughout the cycle of translation.
  • Bacterial 70S ribosomes are composed of two subunits, a small 30S subunit and a large 50S subunit, both of which are ribonucleoprotein particles.
  • the small subunit is assembled from 21 ribosomal proteins and a single 16S ribosomal RNA (rRNA) of 1541 nucleotides
  • the large subunit is assembled from 33 ribosomal proteins and two rRNAs, a 5S rRNA of 115 nucleotides, and a 23S rRNA of 2904 nucleotides.
  • rRNA 16S ribosomal RNA
  • the engineered ribosome contains one or more mutations in the large and/or the small subunit, such as in the rRNA therein.
  • the engineered ribosome contains one or more mutations in its 23S rRNA.
  • one or more mutations can be present at or within nucleotides 2030-2034 and/or nucleotides 2057-2061 of the 23S rRNA.
  • Exemplary mutations within nucleotides 2030-2034 and/or nucleotides 2057-2061 of the 23S rRNA that can be used include, without limitation, mutations encoded by a sequence selected from TCACC, CGCCG, TAGCA, GCCTG, CATTG, AAGGT, ACCCG, TCCCG, GTACA, ATTCT, AATGT, and ACCGT.
  • the one or more mutations confer resistance to one or more antibiotics.
  • the antibiotics reduce or prevent bacterial protein synthesis.
  • antibiotics include, without limitation, tetracyclines (e.g., doxycycline), aminoglycosides (e.g., streptomycin, kanamycin and tobramycin), erythromycin, roxithromycin, clarithromycin, lincomycin, lincosamides (e.g., clindamycin), puromycin, phenicols (chloramphenicol), oxazolidinones (linezolid), pleuromutilins (tiamulin), hygromycin A, and hygromycin B.
  • tetracyclines e.g., doxycycline
  • aminoglycosides e.g., streptomycin, kanamycin and tobramycin
  • erythromycin roxithromycin
  • clarithromycin lincomycin
  • the one or more mutations may render the engineered ribosome resistant to an aminoglycoside, a tetracycline, a pactamycin, a streptomycin, an edein, or any other antibiotic that targets the small ribosomal subunit.
  • the one or more mutations may render the engineered ribosome resistant to a macrolide, a chloramphenicol, a lincosamide, an oxazolidinone, a pleuromutilin, a streptogramin, or any other antibiotic that targets the large ribosomal subunit.
  • the one or more mutations confer resistance to one or more antibiotics selected from cholramphenicol, erythromycin, clindamycin, and lincomycin.
  • the engineer ribosome can accommodate synthetic, abiological monomers. Such monomer include, but are not limited to, non L-alpha-amino acids, noncanonical L-alpha-amino acids and/or D- amino acids.
  • the ribosomes can polymerize polymers formed from the monomers.
  • f-MAGE e.g., as exemplied in the experiments below
  • PTC peptidyl transferase center
  • the ribosome is a eukaryotic ribosome.
  • Such evolution of ribosomes can be used to create an orthogonal ribosomal system in eukaryotic cells and use this, for example, for production of protein biomaterials in yeast incorporating nonstandard amino acids.
  • the engineered ribosome may be prepared by expressing a polynucleotide encoding the rRNA of the engineered ribosome.
  • polynucleotides encoding the rRNAs forming the engineered ribosomes are provided.
  • the polynucleotide is included in a vector, such as an expression vector.
  • kits useful for performing, or aiding in the performance of, the methods. It is useful if the kit components in a given kit are designed and adapted for use together in the method.
  • the kits may include instructions for dosages and dosing regimens.
  • kits containing polynucleotides e.g., plasmid, expression vector
  • the kits contain instructional material for use thereof.
  • the kit can contain a population of cells, such as prokaryotic or eukaryotic cells to be genetically modified or harboring a disclosed polynucleotide (e.g., plasmid, expression vector).
  • the instructional material can include a publication, a recording, a diagram, or any other medium of expression which can be used to communicate the usefulness of the compositions and methods of the kit.
  • the instructional material may provide instructions for methods using the kit components, such as performing transfections, transductions, infections, and conducting screens.
  • a polynucleotide comprising a nucleotide sequence encoding an intron flanked on one or both of the 5’ and 3’ ends by heterologous sequence(s).
  • polynucleotide of any one of paragraphs 2-4 wherein the self-splicing intron is derived from an organism selected from the group comprising Tetrahymena thermophila, Nostoc punctiforme, Bacillus anthracis, Tilletiopsis flava, Azoarcus, Pneumocystis carinii, Agrobacterium, T7-like bacteriophage, Bacteriophage T4, or combinations thereof.
  • nucleotide sequence encoding the intron comprises the sequence of any one of SEQ ID NOS: 1-12 or a sequence comprising at least 85% identity to any one of SEQ ID NOS: 1-12.
  • nucleotide sequence encoding the chimeric self-splicing intron comprises the sequence of SEQ ID NO: 1 or a sequence comprising at least 85% identity to SEQ ID NO:1.
  • rRNA ribosomal RNA
  • tRNA tRNA gene or portion thereof
  • a microsatellite a minisatellite
  • IS insertion sequence
  • transposable element a pseudogene, a prophage, telomere, or a CRISPR array.
  • the polynucleotide of any one of paragraphs 1-14 comprised in a plasmid or viral vector.
  • a cell comprising the polynucleotide of any one of paragraphs 1-15.
  • An isolated cell comprising a polynucleotide encoding a self- splicing intron comprising the sequence of any one of SEQ ID NOS: 1-12 or a sequence comprising at least 85% identity to any one of SEQ ID NOS:1- 12, optionally integrated into the genome of the cell.
  • the isolated cell of paragraph 25 comprising the sequence of SEQ ID NO:1 or a sequence comprising at least 85% identity to SEQ ID NO:1, optionally integrated into the genome of the cell.
  • a method of modifying the genome of a cell at one or more target sites comprising integrating a nucleotide sequence encoding an intron adjacent to each of the one or more target sites, and subsequently inducing incorporation of a donor oligonucleotide at each of the one or more target sites via a gene editing technology.
  • a method of modifying the genome of a cell comprising integrating one or more of the polynucleotides of any one of paragraphs 1- 15, and subsequently inducing incorporation of a donor oligonucleotide at or adjacent to the one or more sites of integration via a gene editing technology.
  • a method of modifying a nucleic acid in a cell comprising inducing incorporation of a donor oligonucleotide at or adjacent to the one or more sites of the polynucleotide in the cell of any one of paragraphs 16-26 via a gene editing technology.
  • the donor oligonucleotide is ssDNA or dsDNA.
  • the intron is a self-splicing intron derived from one or more organisms selected from the group comprising Tetrahymena thermophila, Nostoc punctiforme, Bacillus anthracis, Tilletiopsis flava, Azoarcus, Pneumocystis carinii, Agrobacterium, T7-like bacteriophage, Bacteriophage T4, or combinations thereof.
  • intron is a self-splicing intron that is a chimeric self-splicing intron, optionally derived from Tetrahymena thermophila and Tilletiopsis flava.
  • nucleotide sequence encoding the chimeric self-splicing intron comprises the sequence of any one of SEQ ID NOS: 1-12 or a sequence comprising at least 85% identity to any one of SEQ ID NOS: 1-12.
  • the gene editing technology comprises one or more of a CRISPR system, multiplex automated genome engineering (MAGE), zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), triplex- forming oligonucleotides, pseudocomplementary oligonucleotides, intron encoded meganucleases small fragment homologous replacement, single- stranded oligodeoxynucleotide-mediated gene modification, and intron encoded meganucleases.
  • MAGE multiplex automated genome engineering
  • ZFNs zinc finger nucleases
  • TALENs transcription activator-like effector nucleases
  • triplex- forming oligonucleotides pseudocomplementary oligonucleotides
  • intron encoded meganucleases small fragment homologous replacement, single- stranded oligodeoxynucleotide-mediated gene modification, and intron encoded meganucleases.
  • the CRISPR system comprises an engineered reverse transcriptase fused to Cas9 nickase and a prime-editing guide RNA (pegRNA).
  • pegRNA prime-editing guide RNA
  • the one or more target sites is selected from a RNA (rRNA) gene, a tRNA gene, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, a telomere, a CRISPR array, or another native or non-native repetitive element.
  • a method of screening for one or more mutations that confer a desirable phenotype comprising modifying the genome of a plurality of cells by the method of any one of paragraphs 27-48 and selecting for a cell exhibiting the desirable phenotype.
  • a ribosome comprising the rRNA and/or protein encoded by the one or more ribosomal genes can accommodate synthetic, abiological monomers optionally non L-alpha- amino acids, noncanonical L-alpha-amino acids and/or D-amino acids, and optionally polymerize polymers formed therefrom.
  • An engineered bacterial ribosome comprising one or more mutations at nucleotides 2030-2034 and/or 2057-2061 in its 23S rRNA, wherein the one or more mutations are encoded by a sequence selected from TCACC, CGCCG, TAGCA, GCCTG, CATTG, AAGGT, ACCCG, TCCCG, GTACA, ATTCT, AATGT, and ACCGT.
  • the engineered ribosome of paragraphs 53 and 54 comprising a linker tethering the rRNA of the small subunit (16S rRNA) with the rRNA of the large subunit (23S rRNA).
  • a cell comprising the engineered ribosome of any of paragraphs 53-55 and/or the polynucleotide of paragraph 56.
  • Example 1 Integration of a self-splicing intron into oRiboT does not compromise catalytic functionality of oRiboT.
  • the C321. ⁇ A strain is derived from strain EcNR2 ( ⁇ mutS:cat ⁇ (yhhB- bioAB): ⁇ cl857 ⁇ (cro-ea59):tetR-bla ⁇ ), modified from E. coli K-12 substr. MG 1655 as previously described.
  • C321. ⁇ A was modified by replacing the carbenicillin gene with spectinomycin using standard ⁇ -Red recombination to permit compatibility with plasmid constructs containing the engineered ribosomes.
  • C321. ⁇ A_spec was grown in low salt LB-min medium (10 g tryptone, 5 g yeast extract, 5 g NaCl in 1L dH 2 O) at 34°C. Variants of this strain were constructed to contain the chromosomal- and plasmid-based oRiboT and wild type ribosome constructs described below. All strain variants were grown under the same conditions with the exception of supplementation with inducers (aTc, IPTG) or antibiotics as described.
  • Plasmids containing the tethered ribosome (RiboT, pRibo-T), the orthogonal-tethered ribosome (oRiboT, poRibo-T), and oGFP (poGFP) were obtained.
  • the promoters driving the expression of RiboT and oRiboT were replaced with PL-tetO such that transcription of the ribosome variants can be controlled by the TetR protein and anhydrotetracycline (aTc, Sigma).
  • aTc anhydrotetracycline
  • oRiboT-Tt4 and oRiboT-Tt4 ⁇ were constructed by amplifying oRiboT-Tt1 and oRiboT-Tt1 ⁇ plasmids, respectively, with primers containing desired mutations and assembled using Gibson Assembly.
  • Clonetegration (St-Pierre, F.o. et al. ACS synthetic biology 2, 537-541 (2013)) was used to introduce oRiboT-Til into the genome of C321.A_spec. pOSIP-CH (Addgene plasmid # 45980) and pE-FLP (Addgene plasmid # 45978). Briefly, oRiboT-Til was amplified by PCR and cloned into pOSIP- CH plasmid using Gibson assembly. After electroporation and overnight recovery, cells were plated on chloramphenicol plates. Following overnight growth, correct integration was verified by colony PCR.
  • RNA was purified using a DNeasy Blood and Tissue Kit (Qiagen) following manufacturer’s instructions.
  • RT-PCR was performed using SuperScript OneStep RT-PCR System with Platinum Taq DNA Polymerase on purified RNA from oRibo-T, oRiboT-Tt1, oRiboT-Tt1 ⁇ , oRiboT-Tt4, oRiboT-Tt4 ⁇ .
  • RNA from cells containing oRiboT-Tt4 and oRiboT-Tt4 ⁇ were treated with or without DNase. It was believed that if DNA template were present, the amplified PCR product would match the non-spliced intron instead of the size corresponding to the post-spliced RNA or WT ribosomes. Results showed that samples containing DNase have the expected products for oRiboT-Tt4 and oRiboT-Tt4 ⁇ whereas the same samples lacking DNase show an unspliced band consistent with genomic DNA contamination. All other RT-PCR reactions were performed in the presence of DNase to eliminate genomic DNA contamination. Products of RT-PCR were analyzed by agarose gel electrophoresis and sequenced by Sanger sequencing.
  • A_spec cells were transformed with the following plasmids: (1) oRibo-T and poGFP (2) oRiboT-Tt1 and poGFP (3) oRiboT-Tt1 ⁇ and poGFP (4) oRiboT-Tt2 and poGFP (5) oRiboT-Tt2 ⁇ and poGFP (6) oRiboT- Tt3 and poGFP (7) oRiboT-Tt3 ⁇ and poGFP. All transformed strains were grown at 34°C in LB-min medium supplemented with 50 mg/mL carbenicillin and 30 mg/mL of kanamycin.
  • Wells of a 96-well plate were filled with 150 ⁇ L of LB media supplemented with 50 mg/mL carbenicillin and 30 mg/mL kanamycin. The wells were inoculated with colonies from each plasmid combination above (in triplicate), and incubated at 34°C for 16 h with shaking. Clear bottom wells of another 96-well plate were filled with 150 ⁇ L of LB-min medium supplemented with 50 mg/mL carbenicillin and 30 mg/mL of kanamycin, and 1 mM IPTG and 100 ng/mL anhydrotetracycline (aTc).
  • aTc anhydrotetracycline
  • the plate was inoculated with 2 ⁇ L of saturated initial inoculation plate, and incubated with linear shaking (731 cycles per min) for 16 h at 34°C on a Biotek Synergy Hl plate reader, with continuous monitoring of cell density (GD600) and sfGFP fluorescence (excitation at 485 nm and emission 528 nm with sensitivity setting at 70).
  • GD600 cell density
  • sfGFP fluorescence excitation at 485 nm and emission 528 nm with sensitivity setting at 70.
  • oligonucleotides were purchased from Integrated DNA Technologies or the Yale University W.M. Keck Oligonucleotide Synthesis Facility with standard purification. MAGE oligonucleotides were 90 nucleotides in length and contained two phosphorothioated bases on the 5’ end. Depending on the oligonucleotides described in the paper, degenerate bases or mutations were placed within the oligo. Additional primers were purchased for cloning and RT-PCR on oRibo-T constructs. Primers for next- generation sequencing (NGS) were designed with five degenerate bases at the 5’ end.
  • NGS next- generation sequencing
  • genomic DNA of each of ⁇ 2 x 10 9 cells after f-MAGE was extracted using a Qiagen Genomic DNA purification kit and PCR was used for targeted amplification of the sequencing region. Up to two libraries were pooled for sequencing using an Illumina MiSeq. Data was analyzed with open source software; briefly, after quality filtering, reads were searched for primer sequence, the site of mutagenesis was determined, and WT and mutant reads were quantified.
  • a self-splicing intron was inserted into the rRNA gene encoding an orthogonal tethered ribosome (oRiboT) (Orelle, et al., Nature, 524, 119-124 (2015)) to break the sequence redundancy with the seven homologous rRNA genes native to E. coli (Figs. 1A-B), and to act as a unique addressable site to target genome editing to the desired locus and exclude all others in the cell.
  • oRiboT orthogonal tethered ribosome
  • Tetrahymena thermophila type 1 self- splicing intron can be stably integrated and post-transcriptionally removed scarlessly to maintain the sequence and function of oRiboT.
  • the Tetrahymena intron was chosen because it is spliced naturally from the T. thermophila rRNA and functions effectively in both in vitro and in vivo contexts (Kruger, et al., Cell, 31, 147-157 (1982)).
  • the intron was inserted into oRiboT immediately after position U1926 of the 23S rRNA (Site 1 in Fig. 1B), as this position represents the location of the intron within the T.
  • thermophila ribosome has been previously demonstrated to function in Escherichia coli (Zhang, et al., Rna, 1, 284 (1995)). Two independent C321. ⁇ A spec strains (Lajoie, et al., Science, 342, 357-360 (2013)) of E. coli containing wild type (WT) oRiboT (+ control) or oRiboT with the Tetrahymena thermophila intron inserted (oRiboT-Tt1) were cultured, and RT-PCR was performed on total purified RNA from each population using primers that amplified the region spanning the intron-exon junctions. A single band at 128 nucleotides (nt) was observed in both cases, indicating the complete and scarless splicing of the intron from oRiboT-Tt1.
  • oRiboT-Tt1 ⁇ lacking the internal guide sequence (IGS) necessary for intron function was created.
  • RT-PCR of total purified RNA from this strain revealed two bands, one at 128 nt indicating the presence of native ribosomes, and a second at 535 nt, indicating an unspliced product resulting from the IGS deletion. In this case, two bands were observed because the total purified RNA contained both the WT 23S sequence and oRiboT-Tt1 ⁇ .
  • oRiboT-Tt1 and oRiboT-Tt1 ⁇ were used as templates to construct two other mutants, oRiboT-Tt1b (+IGS) and oRiboT-Tt1b ⁇ ( ⁇ IGS), which contain a unique sequence following the intron-exon splice junction (Fig. 2B) such that a primer can selectively amplify oRiboT-Tt1b or oRiboT-Tt1b ⁇ among the native ribosomes.
  • RNA purification and RT-PCR from cell culture a single band at 128 nt for oRiboT-Tt1b, corresponding to the expected spliced product, and a single band at 535 nt for oRiboT-Tt1b ⁇ , corresponding to the expected un- spliced product from the ⁇ IGS mutant were observed.
  • oRiboT contains an orthogonal anti-Shine Dalgarno sequence in its 16S rRNA, (Hui & De Boer, Proceedings of the National Academy of Sciences 84, 4762-4766 (1987), Rackham & Chin, Nat Chem Biol, 1, 159-166 (2005))
  • a compatible GFP reporter orthogonal GFP, oGFP
  • oRBS orthogonal ribosome binding site sequence
  • oGFP was driven from the IPTG — LacR inducible PL-lacO promoter and oRibo-T from the aTc — TetR inducible PL-tetO promoter (Fig. 2A).
  • Steady-state GFP expression was first assayed in cells containing either oRibo-T or oRiboT- Tt1 under all four induction conditions (-IPTG -aTc, +IPTG -aTc, -IPTG +aTc, +IPTG +aTc) and in each case equivalent GFP expression in cells containing oRiboT or oRiboT-Tt1 was observed (Fig. 2C).
  • Example 2 Integration of a self-splicing intron permits targeted modification of oRiboT via CRISPR/Cas9 and MAGE.
  • the ssODN-cell mixture was transferred to a pre- chilled 1 mm gap electroporation cuvette (Bio-Rad) and electroporated using the following parameters: 1.8 kV, 200 V and 25 mF.
  • LB-min medium (3 mL) was immediately added to the electroporated cells. The cells were recovered from electroporation and grown at overnight with induction of both cas9 and CRISPR plasmids.
  • MAGE was carried out as previously described (Wang, H.H. et al. Nature 460, 894-898 (2009)). Liquid cultures were inoculated from single colonies, and grown to mid- logarithmic growth in a shaking incubator at 34°C. To induce expression of the lambda-red recombination proteins (Exo, Beta and Gam), cell cultures were shifted to 42°C for 15 min and then immediately cooled on ice. In a 4°C environment, 1 mL of cells was centrifuged at 16,000g for 30 s. The supernatant was removed and the cells resuspended in Milli-Q water. The cells were spun down, the supernatant was removed, and the cells were washed a second time.
  • ssODNs prepared at a concentration of 5-6 ⁇ M in DNase-free water were added to the cell pellet.
  • the ssODN- cell mixture was transferred to a pre-chilled 1 mm gap electroporation cuvette (Bio-Rad) and electroporated using the following parameters: 1.8 kV, 200 V and 25 mF.
  • LB-min medium (3 mL) was immediately added to the electroporated cells. The cells were recovered from electroporation and grown at 30°C for 3-3.5 h. Once cells reached mid- logarithmic growth they were used in additional MAGE cycles.
  • the ssODNs used for Filtered-MAGE were designed to possess between 0 to 70 nt of homology to the intron in order to determine optimum design parameters for targeted mutagenesis and reduction of off-target mutations. All mutagenesis was performed on a genomically integrated oRiboT-Til integrated with clonetegration into C321. ⁇ A_spec (containing a 22-nt mutation 108-nt upstream of the intron-exon junction to distinguish it from native ribosomes). For subsequent f-MAGE experiments, 44 nt overlap to intron was used, following the general MAGE protocol as above.
  • intron-ribosome junction could serve as a unique addressable site for targeted modification of oRiboT by commonly used gene editing methods - CRISPR/Cas9 (Halperin, et al., Nature, (2016)) and MAGE (Gaudelli, et al., Nature, 551, 464-471 (2017)).
  • CRISPR array plasmid was designed having two spacers with homology to the 5’ region of the intron, and the linker between the 16S and 23S rRNA unique to oRibo-T, respectively.
  • a dsDNA was created containing 400 bp of homology directly upstream and downstream of the cut-sites, and having a 7-bp degenerate region to allow both the quantification of allelic replacement frequency (ARF) and the complexity of the generated library by deep sequencing.
  • the CRISPR plasmid and dsDNA were introduced to a strain with genomically integrated oRiboT-Tt2 with upstream distinguishing mutation for sequencing (oRiboT-Tt2-ed). Cells were grown to saturation and induced with Cas9 to select for cells with dsDNA replacing double- stranded breaks introduced into the E. coli chromosome.
  • the ARF for oRiboT-Tt2-ed and WT ribosomes was quantified. It was found that there was extensive editing of oRiboT-Tt2-ed and almost no mutagenesis of WT ribosomes, even though the region mutagenized was identical to all seven native ribosomes (Fig. 2D). The ARF was 98.27% for oRiboT and 0.29% for all seven native ribosomes. Furthermore, 14,147 unique mutants were obtained out of a theoretical complexity of 16,384 (86.35% library efficiency), demonstrating that complex libraries of mutants can be easily generated with this method while avoiding editing of unintended genomic sites sharing sequence similarity with the target locus.
  • MAGE introduces mutations using single- stranded DNA (ssDNA) oligodeoxynucleotides (ssODNs) that complex with ssDNA annealing proteins (e.g., ⁇ . Red Beta recombinase (Costantino, Proceedings of the National Academy of Sciences, 100, 15748-15753 (2003)) and hybridize to the lagging strand of the replicating chromosome (Wang, et al., Nature, 460, 894-898 (2009), Barbieri, et al., Cell, 171, 1453-1467.
  • ssDNA single- stranded DNA
  • ssODNs oligodeoxynucleotides
  • MAGE permits higher depth and breadth of mutation by avoiding toxicity associated with DNA DSB inherent in other genome editing methods (Komor, et al., Cell, 168, 20-36 (2017), Gaj, et al., Trends in biotechnology, 31, 397-405 (2013), Kim & Kim, Nature Reviews Genetics 15, 321-334 (2014)).
  • MAGE generates multisite genome modifications and has been used for the molecular evolution of proteins (Amiram, et al., Nature biotechnology, 33, 1272 (2015)), pathway diversification (Wang, et al., Nature, 460, 894-898 (2009), Barbieri, et al., Cell, 171, 1453-1467. el413 (2017)), and whole- genomic recoding (Lajoie, et al., Science, 342, 357-360 (2013)).
  • ssODN filtered MAGE
  • ssODNs ten 90-mer ssODNs were designed to target the intron- ribosome junction, containing varying homology to the intron and exon and harboring mismatch mutations targeting the 23 S oRiboT sequence of a chromosomally integrated oRiboT-Tt variant (Fig. 2E).
  • One cycle of MAGE was performed for each ssODN and then deep sequencing was performed to quantify the frequency of conversion at the oRiboT-Tt locus and at the seven wild-type ribosome genes.
  • ssODNs targeting exclusively the 23S rRNA sequence with no homology to the intron demonstrated allelic replacement frequencies (ARFs) of 4% and 5% at the native ribosomes and oRiboT-Tt1, respectively (Fig. 2E).
  • ARFs allelic replacement frequencies
  • the measured ARFs at the native ribosomes represent a frequency shared across all seven sites, rendering the frequency of a mutation at any one of those sites ⁇ 1%.
  • optimal parameters for f-MAGE ssODN design may be context specific, in which 44 nt of homology to the intron (46 nt to the ribosome) maximizes conversion of oRiboT whereas 70 nt of homology to the intron (20 nt to the ribosome) renders off-target conversions at the seven native ribosome loci effectively undetectable (Fig. 2E, Table 2).
  • ARF allelic replacement frequency
  • ARF allelic replacement frequency.
  • Ribo-T-intron or WT-ribosome-intron plasmids were diversified with six cycles of f-MAGE, with degenerate ssODNs targeting regions 2030 - 2034 and 2057 - 2061, and having homology to the 5’ or 3’ portion of the intron, respectively Cells were grown for 16 h with aTc to induce ribosome expression.
  • each culture was seeded into 3 mL of LB-min medium containing aTc and antibiotic (273 ⁇ M erythromycin, 1.3 mM clindamycin, 7.74 ⁇ M chloramphenicol, or 28.22 mM lincomycin) and grown overnight.
  • Plasmid DNA was isolated from each culture (Qiagen) and re-transformed into unselected C321 and MG1655 strains to confirm that the plasmid was causal to antibiotic resistance.
  • the cells were plated on carbenicillin plates. Individual clones were grown in a 96-well plate after overnight induction in LB + aTc, with aTc and one of four antibiotics included at the concentrations specified above.
  • F-MAGE was first used to recreate G2032A, G2057A, and A2058G mutations in non- orthogonal wild type (WT-Tt2) and tethered (RiboT-Tt2) ribosomes containing the Tetrahymena intron at Site 2.
  • WT-Tt2-derived mutants at the three published sites displayed antibiotic resistance phenotypes when challenged with the panel of four antibiotics. These results demonstrate cell survival from WT-Tt2 ribosomes under antibiotic conditions that render the native ribosomes non-functional, and validate two key aspects of this study. First, new antibiotic-resistant ribosomes could be evolved by f-MAGE in the presence of the native translational machinery. Second, cells are capable of surviving solely from ribosomes transcribed from intron-containing genes.
  • F-MAGE was next applied to generate a complex library of RiboT- Tt2 ribosomes in order to discover new mutations that confer antibiotic resistance.
  • mutagenic MAGE ssODNs were designed containing five degenerate nucleotides to target two 23S rRNA regions: Region 1: 2030 - 2034 and Region 2: 2057 - 2061.
  • Six cycles of MAGE was performed with this complex pool of ssODNs, followed by liquid selections in the four antibiotics to isolate individual mutants after plating on solid media. Seven mutant ribosomes were identified that showed varying degrees of resistance to the antibiotics (Fig. 2F).
  • each mutant conferred resistance to a subset of the antibiotics (Fig. 2G). For example, some of the identified mutants exhibited broad resistance to the panel of four antibiotics assayed (e.g., M4, M6), while others (e.g., M5, M7) showed resistance exclusive to one or two antibiotics.
  • Example 4 Integration of distinct self-splicing introns across multiple sites in the ribosome.
  • RT-PCR was also performed on total purified RNA from strains with Tetrahymena intron at sites 1-4, as above, and a single band indicating complete splicing at each site was observed. Sequencing of the RT-PCR products confirmed the scarless ligation of the ribosome at the intron-exon junction post-splicing in all four insertion sites.
  • oRiboT-Tt1, oRiboT-Tt2, and oRiboT-Tt3 were used as templates to construct ⁇ IGS mutants of each, yielding oRiboT-Tt1 ⁇ , oRiboT-Tt2 ⁇ , and oRiboT-Tt3 ⁇ , respectively.
  • ribosomes with newly- tested introns at site 2 showed reduced function.
  • coli strains containing wild type (WT) oRiboT (+ control), oRiboT-Tt2 (+ control), or oRiboT-CTt2 was performed using primers that amplified the region spanning the intron-exon junction. A single band was observed in all cases, indicating the complete and scarless splicing of the engineered intron from oRiboT-CTt2, just as the natural intron was spliced from oRiboT-Tt2. Sequencing of oRiboT-CTt2 RT-PCR products confirmed the scarless ligation of the ribosome at the intron-exon junction post-splicing.
  • strains with an oGFP reporter controlled by the IPTG — LacR inducible PL-lacO promoter and oRibo-T from the aTc — TetR inducible PL- tetO promoter, containing WT oRibo-T, oRiboT-Tt2, or oRiboT-CTt2 were constructed.
  • IPTG +aTc IPTG +aTc
  • Example 5 Multi-site intron integration permits targeted editing and randomized diversification in vivo.
  • Cells containing oRiboT-CTt2-Tt4 plasmid were diversified with six cycles of f-MAGE, with ssODN (AntiSD-WT) that switches the orthogonal aSD to the WT E. coli aSD sequence targeting region in the 16S rRNA.
  • Cells recovered from each of the cycles of f-MAGE were grown for 16 h with aTc and IPTG to induce ribosome and oGFP expression, respectively.
  • the oGFP fluorescence of all members of the population was quantified with the BD FACS Aria.
  • cells containing oRiboT-CTt2-Tt4 plasmid were diversified with six cycles of f-MAGE with ssODNs to make the M4 mutation in the 23 S rRNA, and having homology to the 5’ or 3’ portion of the CTt intron, respectively, as well as an ssODN to switch the orthogonal aSD to the WT E. coli aSD sequence targeting region in the 16S rRNA.
  • cells containing oRiboT-CTt2- Tt4 plasmid were diversified with six cycles of f-MAGE with ssODNs targeting regions 2030 - 2034 and 2057 - 2061 in the 23S rRNA, and having homology to the 5’ or 3’ portion of the CTt intron, respectively, as well as an ssODN to switch the orthogonal aSD to the WT E. coli aSD sequence targeting region in the 16S rRNA.
  • cells from each of cycles of f- MAGE were grown for 16 h with 100 ng/mL aTc to induce ribosome expression.
  • 50 ⁇ L of confluent culture was plated on LB-min agar plates containing 100 ng/mL aTc and 15.48 ⁇ M chloramphenicol, or LB-min agar plates containing 50 mg/mL carbenicillin and 30 mg/mL of kanamycin (non- selective plates).
  • CFUs were quantified for the selective and non-selective plates to calculate survival ratios.
  • this system was employed for continuous in vivo multi-site evolution.
  • An intron was positioned at site 2 in order to diversify the PTC/exit tunnel, and also an intron was positioned near the anti-Shine Delgarno Sequence (aSD) in the 16S rRNA (designated as site 4) (Fig. 4A, Fig. 1B), in order to evolve orthogonality of the anti-Shine-Delgarno sequence (aSD) simultaneously in vivo (Fig. 4A).
  • the rRNA was expressed from the strong inducible PL-tetO promoter on plasmids to enhance ribosome expression, in a strain containing an oGFP reporter under the IPTG — LacR inducible PL-lacO promoter.
  • the chimeric CTt intron was inserted into site 2 of oRiboT, and natural Tt intron into site 4 to form oRiboT-CTt2-Tt4.
  • f-MAGE to was performed to validate in vivo evolution at the aSD (site 4).
  • Site 4 of oRiboT-CTt2-Tt4 was targeted with an ssODN that would switch the orthogonal aSD to the WT E. coli aSD sequence.
  • Six cycles of MAGE was performed. The cells were then induced for oGFP expression and the cell populations from each MAGE cycle were visualized with flow cytometry (Fig. 4F).
  • multi-site continuous editing was performed simultaneously on the large and small subunits in the same ribosome in vivo to enable evolution of antibiotic resistance and tuning of orthogonality.
  • Six cycles of f-MAGE was performed at two positions in the large subunit (PTC/exit tunnel, site 2) and one position in the small subunit (anti-oSD, site 4) simultaneously in a strain containing oRiboT-CTt2-Tt4 (Fig. 4A).
  • the 5’ and 3’ of the site 2 chimeric intron were targeted with ssODNs encoding the M4 mutation previously identified to confer chloramphenicol resistance (Fig. 2F), and site 4 intron with ssODN that switched the orthogonal aSD to the WT E.
  • coli aSD sequence The oGFP fluorescence was quantified by flow cytometry after each cycle of f-MAGE and growth curves were performed in chloramphenicol and plated on chloramphenicol plates for CFUs. A decrease in fluorescence with each cycle was observed as the ribosome was switched to no longer being orthogonal (Fig. 4B). A gradual increase in liquid growth and CFUs was observed with successive cycles of f-MAGE (no growth without f-MAGE cycling, and peaking at 0.02 survival ratio at cycle 6) (Fig. 4C), which indicates that two phenotypes could be evolved continuously in vivo in a repetitive genetic element while reducing off-target edits in the cell’s native context.
  • f-MAGE was performed in a strain containing oRiboT-CTt2-Tt4 with degenerate ssODNs targeting 5’ and 3’ of the site 2 chimeric intron to create a diverse population (1.04 x10 6 theoretical complexity) with the aim of evolving chloramphenicol resistance while simultaneously tuning the orthogonality of this population via an ssODN that switched the orthogonal aSD to the WT E. coli aSD sequence at site 4 (Fig. 4D). As before, a decrease in fluorescence was observed with each cycle until the population became non-fluorescent at cycle 6.
  • Liquid growths of populations after each cycle of f-MAGE in 7.74 ⁇ M chloramphenicol was also performed, and a gradual improvement in growth in both the discrete editing experiment and the evolution experiment were observed.
  • the observed accumulation of surviving mutants with each f-MAGE cycle demonstrates the rapid evolution of a population that can be achieved by targeting edits to a single repetitive genetic element at the exclusion of others in the cell.
  • oGFP and the RiboT ribosomes of the selected populations enriched for non-orthogonal ribosomes (Fig. 4C-4D, panels 2 and 3), evident by the populations no longer being fluorescent. This is consistent with the RiboT mutants, containing chloramphenicol-resistant mutations near site 2, needing to be non-orthogonal to support cell survival under chloramphenicol selection.
  • oGFP expression was induced in populations from f-MAGE cycles 0-6 (Fig. 4B, Fig. 4D) which were sorted by FACS into low or high bins, respectively.
  • the cycle 6 data for the negative bin can be explained by the increase of escape mutants due to accumulated off-target mutations, but does not off-set the greater population-level trends observed in this or previous cycles. Importantly, these data demonstrate that one can functionally dissect phenotypic differences within complex populations as a result of both discrete edits and evolution of complex population via Filtered Editing. Furthermore, multi-site editing can be used to evolve desired traits due to the combined functional role of each one of the multiple sites being targeting. It is anticipated that this will be of importance in efforts to evolve the ribosome and other complex repetitive genetic elements.
  • Filtered editing permits the co-evolution of multiple, distal sites of a single repetitive genetic element directly in the genome. This allows for iterative introduction of precise edits that drive continuous evolution of dynamic genotypic diversity, while leaving the remainder of the cell’s genome unperturbed. Such capabilities hold promise for current challenges in synthetic biology, such as the systematic repurposing of the cell’s translational apparatus, which spans multiple components (e.g., tRNAs, aaRS, EF-Tu, and the ribosome).
  • f-CRISPR can expand the space over which mutations can be introduced. While f-MAGE can be used to introduce deep edits near a chosen intron, f-CRISPR can be used to make distributed edits between two introns at a distance of Ikb or more. This is ideal for evolving complex populations for desired phenotypes, where the mutagenic landscape of the population can be continuously refined and assayed in vivo.
  • f-CRISPR can be ported to eukaryotic genome engineering to edit and evolve repetitive genetic elements such as tRNAs, ncRNAs, and ribosomes.
  • the previous toolkit available involved either a compromise of specificity, time, and/or depth of library complexity. It is believed that the filtered editing compositions and methods described herein, for the first time, allow for the application of genome editing technologies to precisely edit and evolve repetitive genetic elements in vivo.

Abstract

Compositions and methods for genomic engineering are described. It has been discovered that introns, preferably self-splicing introns, can be introduced into repetitive genomic sequences to form a unique genetic address that facilitates insertion or recombination of a template containing a desired sequence within the repetitive genomic sequence. Nucleic acid compositions such as polynucleotides including a sequence encoding an intron, donor oligonucleotides, gene editing technologies, and methods of use thereof are described. In particular, methods of performing targeted editing and randomized diversification to enable continuous evolution of target sites (e.g., repetitive genomic elements) are disclosed. The compositions and methods are especially useful for engineering, selection, and identification of cell variants exhibiting a desired phenotype.

Description

COMPOSITIONS AND METHODS FOR TARGETED EDITING AND EVOLUTION OF REPETITIVE GENETIC ELEMENTS
CROSS-REFERENCE TO RELATED APPLICATION
This application claims the benefit of and priority to U.S.S.N. 63/257,961 filed October 20, 2021, and which is specifically incorporated by reference herein in its entirety.
REFERENCE TO SEQUENCE LISTING
The Sequence Listing submitted as an .xml file named “YU8074PCT.xml”, created on October 20, 2022, and having a size of 22,517 bytes, is hereby incorporated by reference pursuant to 37 C.F.R. § 1.834(c)(1).
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
This invention was made with government support under EF- 1935120 by the National Science Foundation. The Government has certain rights in the invention.
FIELD OF THE INVENTION
The invention is generally related to the field of gene editing technology, and more particularly to methods for targeted editing and continuous evolution of repetitive genomic elements.
BACKGROUND OF THE INVENTION
Genome editing introduces targeted modifications in the chromosomes of living cells, permitting the elucidation of causal links between genotype and phenotype, global reprogramming of cellular behavior, and emerging applications for gene therapy (Komor, et al., Cell, 168, 20-36 (2017)). Nuclease-dependent approaches to genome engineering, such as CRISPR/Cas9, generate DNA double stranded breaks (DSBs) to introduce modifications into the genome (Gaj, et al., Trends in biotechnology, 31, 397-405 (2013), Kim & Kim, Nature Reviews Genetics 15, 321-334 (2014)). Such approaches are well-suited for gene disruption applications by non-homologous end joining (NHEJ) and gene editing at single to few loci by homology-directed repair (HDR) across diverse organisms. Other approaches, such as prime editing (Anzalone, et al., Nature, 576, 149-157 (2019)), base editing (Gaudelli, et al., Nature, 551, 464-471 (2017)), and multiplex automated genome engineering (Wang, et al., Nature, 460, 894-898 (2009)) (MAGE), are nuclease-independent genome editing techniques, and can be employed for multi-site genomic edits as well as continuous evolution of a single genomic locus.
A major limitation with all genome editing approaches is the inability to selectively edit or diversify a genetic element that possesses high sequence homology to other genetic loci. Such repetitive genetic elements form large fractions of genomes across all domains of life. For example, repetitive elements constitute over two-thirds of the human genome (de Koning, et al., PLoS genetics, 1 (2011)). The inability to modify such loci for functional characterization, precise editing, or targeted diversification remains a defining challenge. Modification of repetitive genetic elements would permit their functional characterization and establish new avenues to alter cellular physiology. For example, deletion of transposable elements enhances genome stability (Dymond, et al., Nature, 477, 471 (2011), Posfai, et al., Science, 312, 1044-1046 (2006)), mutagenesis of CRISPR arrays affects innate immunity (Sapranauskas, et al., Nucleic acids research, 39, 9275- 9282 (2011)) and genome editing (Hsu, et al., Cell, 157, 1262-1278 (2014)), and translational components (e.g., tRNAs, ribosomes) can be evolved for genetic code expansion (Young & Schultz, ACS chemical biology 13, 854- 870 (2018)).
There is an ongoing need for methods that overcome the limitations of existing genome engineering approaches especially regarding repetitive sequences.
It is an object of the invention to provide compositions and methods for selective editing of repetitive genetic elements.
It is another object of the invention to provide compositions and methods for diversifying repetitive genetic elements.
SUMMARY OF THE INVENTION
Compositions and improved methods for genomic engineering are described. It has been discovered that introns can be introduced into repetitive genomic sequences to form a unique genetic address that facilitates insertion or recombination of a template containing a desired sequence within the repetitive genomic sequence. Thus, the disclosed compositions and methods are especially useful for editing genomic target sites that possess high sequence similarity to other target sites (e.g., repetitive genomic elements). The compositions and methods can be used for multi-site, targeted editing and/or continuous evolution of target sites (e.g., repetitive genomic elements) in tandem or in parallel, and in both prokaryotes and eukaryotes.
Nucleic acids and compositions thereof are described. In particular, disclosed is a polynucleotide that includes a sequence encoding an intron linked to a heterologous sequence. Typically, the heterologous sequence does not include a sequence flanking the intron in its native context. Likewise, where the heterologous sequence has a native context (e.g., sequences flanking the heterologous sequence), such a context does not typically include an intron. The intron is inserted or otherwise incorporated at a non-native locus in such a way that it disrupts the continuous nucleic acid sequence found at that locus (referred to herein as heterologous sequence(s)), and can serve as an anchor for targeted mutation of the heterologous sequence adjacent to the inserted intron. Thus, the heterologous sequence is heterologous to the intron, but in its native, uninterrupted form may be present in the host cells. The intron can be positioned upstream and/or downstream (i.e., 5’ and/or 3’) of the heterologous sequence targeted for mutation, according to the constraints of the gene editing technology with which it is used. The intron can be in any orientation as long as it is transcribed with the same sequence.
The intron is preferably a self-splicing intron, particularly for prokaryotic systems, but in the case of eukaryotic systems, may alternatively be a spliceosomal intron. In preferred embodiments, the self-splicing intron is a Group I intron. Suitable self-splicing introns include naturally occurring self-splicing introns from or derived from Tetrahymena thermophila, Nostoc punctiforme, Bacillus anthracis, Tilletiopsis flava, Azoarcus, Pneumocystis carinii, Agrobacterium, T7-like bacteriophage, Bacteriophage T4, or combinations thereof. Examples include, e.g., SEQ ID NOS:2-12, and variants thereof having e.g., 85% sequence identity thereto. In some embodiments, the self-splicing intron is a chimeric self-splicing or spliceosomal intron. Thus, compositions, methods, and strategies for making chimeric introns, and the chimeric introns themselves are provided. An exemplary chimeric self-splicing intron includes segments derived from Tetrahymena thermophila and Tilletiopsis flava. In a preferred embodiment, the chimeric self- splicing intron is encoded by the sequence of SEQ ID NO:1 or a sequence having at least 85% identity to SEQ ID NO:1.
In some embodiments, the heterologous sequence is or includes a repetitive element, such as, ribosomal, particularly a ribosomal RNA (rRNA) gene or portion thereof, a tRNA gene or portion thereof, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, a telomere, or a CRISPR array. The repetitive element can be a naturally or artificial introduced repetitive element. Thus, in some embodiments, the repetitive element is a recombinant or other non- native nucleic acid sequence. Any foreign (or heterologous or recombinant) sequence can be introduced into a cell that already contains that sequence or one with high sequence similarity, rendering the native and introduced elements as repetitive. The repetitive sequence can be one that is artificially or synthetically created. The heterologous sequence, and thus the repetitive sequence, can be within a cell’s genome or extrachromosomal.
Although the disclosed compositions and methods are particularly advantageous for targeting specific instances of repetitive elements, the heterologous sequence(s) need not necessarily be repetitive. Thus, the heterologous sequence(s) can be e.g., a part of, or fragment of, the coding or non-coding region of a non-repetitive gene, e.g., a gene encoding a protein. In some embodiments, the heterologous sequence(s) is or includes transcribable sequence within the host. Thus, while gene editing occurs in the host’s genome, the self-splicing intron alone, or spliceosomal intron in combination with a spliceosome, can be scarlessly removed from the transcript during or after transcription, such that the gene product of the edited heterologous sequence is expressed without the intron.
In some embodiments, the polynucleotide is a plasmid or portion thereof, or a viral vector or portion thereof.
Cells (i.e., host cells) containing the disclosed polynucleotides are also provided. In some embodiments, a prokaryotic cell (e.g., a bacterium such as E. coli) or eukaryotic cell contains the disclosed polynucleotide (e.g., plasmid or viral vector), which may or may not be integrated into the cell’s genome. In some embodiments, the intron replaces an endogenous intron in the cell. In particular embodiments, disclosed is an isolated cell containing a polynucleotide encoding a self-splicing intron, wherein the self-splicing intron is encoded by the sequence of any one of SEQ ID NOS: 1-12 or a sequence having at least 85% identity to any one of SEQ ID NOS: 1-12, and a population of cells thereof.
Methods of modifying cellular genomes are also provided. In particular, disclosed is a method of modifying the genome of a cell at one or more target sites. Typically, the methods includes integrating (a sequence encoding) an intron adjacent to each of the one or more target sites, and subsequently inducing incorporation of a donor oligonucleotide at each of the one or more target sites via a gene editing technology.
The donor oligonucleotide(s) can include one or more mutations relative to the target sites where they are incorporated. The donor oligonucleotide can be partially or completely homologous to the nucleotide sequence encoding the intron. In some embodiments, the donor oligonucleotide is DNA, such as single- stranded DNA (ssDNA) or double- stranded DNA (dsDNA). Any of the aforementioned naturally occurring or chimeric introns can be used in accordance with the method.
Suitable gene editing technologies that can be used to incorporate the donor oligonucleotide(s) include, without limitation, a CRISPR system (e.g., CRISPR/Cas9, base editors, prime editors, etc.) multiplex automated genome engineering (MAGE), ZFNs, TALENS, etc. In some embodiments, both a CRISPR system and MAGE are used. The one or more target sites to be modified can be or include a ribosomal gene, e.g., RNA (rRNA) gene, a tRNA gene, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, telomere, a CRISPR array, or any other desire repetitive or non-repetitive genetic target.
In particular embodiments, the cell is modified at two or more target sites. In such embodiments, the self-splicing intron integrated adjacent to the first target site can be the same or distinct from the intron integrated adjacent to a second target site. The donor oligonucleotide(s) are typically specific for each target site, and may be a plurality of single donor to induce a specific mutation(s) at the target site, or a pool of different donors to induce a random or semi-random mutation(s) at the target site or a library of cells containing different mutations at the target site. When introns are integrated at two or more target sites, the foregoing strategies can be employed separately or together to induce specific mutations at two or more target sites; random or semi-random mutation(s) at one or more target sites and/or a library of cells containing different mutations at one or more target sites; or a specific mutation(s) at one or more target sites in combination with a random or semi-random mutation(s) at one or more target sites and/or a library of cells containing different mutations at one or more target sites. This allows for simultaneous induction of complex combinations of specific, semi- random, and random mutations at multiple target sites. For example, in a particular embodiment, the donor oligonucleotide incorporated at the first target site originates from a plurality of identical donor oligonucleotides and/or the donor oligonucleotide incorporated at the second target site originates from a plurality of distinct oligonucleotides.
In some embodiments, incorporation of the donor oligonucleotide at a first target site is mediated by the CRISPR system and/or incorporation of the donor oligonucleotide at a second target site is mediated by MAGE. The CRISPR system and MAGE can be used in parallel or tandem.
Screening methods are also described. A method of screening for one or more mutations that confer a desirable phenotype can include modifying the genome of a plurality of cells as described by the methods above and subsequently selecting for a cell exhibiting the desirable phenotype. A desirable phenotype can be antibiotic resistance. Selecting for a cell that exhibits antibiotic resistance can include exposing the plurality of cells to an effective amount of one or more antibiotics.
The power of the disclosed compositions and methods for use in directed evolution were employed to create improved bacterial ribosomes. Thus, also disclosed are improved bacterial ribosomes. In an exemplary embodiment, an engineered bacterial ribosome contains one or more mutations in its 23S rRNA (e.g., compared to a wildtype ribosome), particularly within nucleotides 2030-2034 and/or nucleotides 2057-2061. Exemplary mutations include those encoded by a sequence selected from TCACC, CGCCG, TAGCA, GCCTG, CATTG, AAGGT, ACCCG, TCCCG, GTACA, ATTCT, AATGT, and ACCGT. In some embodiments, the mutation(s) confers resistance to one or more antibiotics, such as, cholramphenicol, erythromycin, clindamycin, and lincomycin. In some embodiments, the mutation(s) confers the ability for the mutated ribosomes formed therefrom to accommodate synthetic or abiotic monomer such as non-L-amino acids, non-canonical L-alpha-amin acids, D-amino acids, etc., and/or facilitate the formation of polymer therefrom. In some embodiments, the engineered bacterial ribosome includes a linker tethering the rRNA of the small subunit (16S rRNA) with the rRNA of the large subunit (23S rRNA).
Polynucleotides encoding the rRNAs of the engineered ribosome are provided. In some embodiments, the polynucleotide is included in an expression vector. Also provided are cells containing the engineered ribosome and/or the polynucleotide encoding the rRNAs of the engineered ribosome.
Additional advantages of the disclosed methods and compositions will be set forth in part in the description which follows, and in part will be understood from the description, or can be learned by practice of the disclosed methods and compositions. The advantages of the disclosed methods and compositions will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate several embodiments of the disclosed methods and compositions and together with the description, serve to explain the principles of the disclosed method and compositions.
Figure 1A is a schematic illustrating the seven native ribosomal operons in the E. coli genome, which share extensive sequence homology to an orthogonal tethered ribosome (oRiboT), which is also introduced into the genome. The cellular context, as well as the rDNA for oRiboT, is represented. Figure 1B is a schematic illustrating the secondary structure of oRiboT rRNA. Introns were introduced at four separate sites in oRibo-T. Intron insertion sites are designated with arrows, and areas that can be targeted with f-MAGE highlighted in purple. Figures 1C-1D are illustrations showing the general approach to filtered CRISPR (Fig. 1C) and MAGE (Fig. ID). An intron is introduced near the site targeted for mutagenesis to provide a unique address for hybridization of sgRNA to introduce a cas9-mediated double- stranded break (Fig. 1C), or a pool of mutagenic MAGE ssODNs (Fig. 1D). After the mutation is introduced into the DNA, the intron is spliced out of the transcribed RNA to produce the desired product. The Tetrahymena thermophila and a panel of other group 1 self-splicing introns were introduced into the gene encoding an orthogonal tethered ribosome (oRiboT) to distinguish it from the seven native ribosome genes; f-CRISPR and f-MAGE were then used to generate libraries of spliced oRiboT RNAs with targeted mutations.
Figure 2A is a scehmatic showing the oGFP reporter and oRibo-T constructs used for oGFP expression experiments. Figure 2B is an illustration showing constructs used to validate in vivo intron function and verify the sequence of ligated exons. Sequencing primers are shown in black. Figure 2C is a bar graph showing expression of oGFP by WT oRibo-T or variants whose genes contained Tt intron at sites 1, 2, 3, or 4. Values and error bars represent the mean and standard deviation of n=3 biologically independent replicates. Figure 2D is a bar graph showing percentage editing and library complexity. f-CRISPR was performed on genomic oRiboT-Tt2 to introduce dsDNA to replace a 832-bp region of oRibo-T and introduce a 7N mutagenic library; conversion-efficiency was determined by next-generation sequencing (where + conversion denotes a mutant, and - conversion denotes a WT ribosome). Figure 2E is a line graph showing percentage editing as a function of amount of homology to the intron. f-MAGE was performed with ssODNs having 0 to 70 nucleotides of homology to the intron. The conversion ratios of oRibo-T to WT ribosomes were determined with next generation sequencing. Figure 2F is a table showing 23S RNA mutations and their effects on antibiotic resistance in an oRiboT and WT ribosome background. (Er = erythromycin, Cl = clindamycin, Cm = chloramphenicol, and Ln = lincomycin). Figure 2G is a series of line graphs showing growth kinetic profiles of E. coli MG 1655 strain containing RiboT or WT ribosomes with mutations in chloramphenicol (7.74 μM).
Figures 3A-3E are bar graphs showing oGFP expression when group I self-splicing introns were introduced into oRibo-T at site 1 (Fig. 3A) or 2 (Fig. 3B), independently, or in combination with the Tetrahymena intron at site 2 (Fig. 3C), 3 (Fig. 3D), or 4 (Fig. 3E). Intron abbreviations correspond to Table 3. Figure 3F is a schematic showing construction of a chimeric CTt intron. The Tt intron was engineered to incorporate the Pl helix from the Tfa intron, in order to create an orthogonal intron that could be distinguished with unique 5’ homology for ssODNs or gRNAs. The chimeric intron (CTt) was inserted at site 2 in oRiboT and assayed for ability to self-splice and whether oRiboT with CTt intron would be functional (via oGFP production assay). The illustrated sequences are agucaucgugacuacaagc (SEQ ID NO: 13) (Tfa Pl) and uuuccauuuauaacgauaaa (SEQ ID NO: 14) (Tt Pl).
Figure 4A is a schematic showing multisite introduction of introns in the same oRibo-T construct. The intron at site 2 is an engineered (CTt) intron, and intron at site 4 is natural Tt. These orthogonal introns enable mutagenesis of two subunits of the ribosome independently, in a total of three parallel sites. Libraries of ribosomes with varying genotypic and phenotypic landscapes were created by performed editing on site 4 and editing and evolution on site 2. The illustrated sequences are ugaacucgcugugAAGAUgcaguguacccgcggcaagacgGAAAGaccccg ga (SEQ ID NO: 15) showing the location of the intron insertion site, ugaacucgcugugCAUUGgcaguguacccgcggcaagacgAAGGUaccccg ga (SEQ ID NO: 16) for editing, and ugaacucgcugugNNNNNgcaguguacccgcggcaagacgNNNNNaccccg ga (SEQ ID NO: 17) for evolution at Site 2, and uuggaucaUUGUGGua (WT aSD) (SEQ ID NO:18) and uuggaucaCCUCCUua (O-aSD) (SEQ ID NO: 19) at Site 4. Figure 4B is a graph showing oGFP expression in post- MAGE cultures from f-MAGE cycles 0-6. Figure 4C is a graph showing survival rate in chloramphenicol (7.74μM) from f-MAGE cycles 0-6. Chloramphenicol- selected variants were induced for oGFP expression and flow cytometry was performed. Percentage of oGFP-positive cells was quantified for all cycles. Figure 4D is a graph showing oGFP expression in post-MAGE cultures from f-MAGE cycles 0-6. Figure 4E is a graph showing survival rate in chloramphenicol (7.74μM) from f-MAGE cycles 0- 6. Chloramphenicol- selected variants were induced for oGFP expression and flow cytometry was performed. Figure 4F is a schematic illustrating in vivo ribosome mutagenesis with f-MAGE to validate in vivo evolution at the aSD (site 4). Six cycles of f-MAGE with ssODN to convert the anti-oRBS to WT- anti-SD sequence were performed on C321 strain with oRiboT-CTt2-Tt4 and oGFP reporter. Post-MAGE cultures were induced for oGFP expression and flow cytometry was performed on cultures from f-MAGE cycles 0-6.
DETAILED DESCRIPTION OF THE INVENTION
The disclosed methods and compositions can be understood more readily by reference to the following detailed description of particular embodiments and the Examples included therein and to the Figures and their previous and following descriptions.
Current genome editing technologies are limited by an inability to make edits to repetitive genetic elements for functional characterization, precise editing, or targeted diversification. The working Examples describe the development and validation of a method, termed Filtered Editing, that overcomes such limitations.
Group 1 self-splicing introns were introduced into repetitive sequences to construct unique genetic addresses that can be selectively modified. This was used in combination with CRISPR/Cas9 and filtered MAGE to enable targeted editing and evolution of ribosomes in vivo without making off-target edits to native genomic elements which share sequence homology. The working Examples show that naturally occurring self- splicing introns as well as engineered chimeric introns can be used. Using these methods, multi-site evolution of repetitive genetic elements such as the ribosome can be performed.
The Examples demonstrate the ability to drive evolution of repetitive genetic elements, such as orthogonal tethered ribosomes (oRiboT), continuously in vivo without the need for laborious plasmid cloning and re- transformation, while at the same time allowing selective editing of only oRiboT and not the cell's native translational apparatus. This can also allow for much larger ribosomal libraries to be created, including many mutations that would be otherwise toxic to the cell, but are targeted only to oRiboT. Furthermore, Filtered Editing can be used to not only randomize certain portions of the ribosome but also create discrete mutations while randomizing others. This makes it more flexible than continuous evolution strategies such as evolvr or PACE, because rationally-determined mutations can be introduced in precise loci to evolve new functions, alongside diversification of other precise regions.
The ability to evolve oRiboT continuously in vivo without modifying the native ribosomes of the cell allows for much more efficient evolution of the ribosome, and can be applied to, for example, evolve ribosomes that can catalyze chemistries beyond the peptide bond, and thus create platforms for preparation of sequence-defined polymers in vivo. Polymers of potentially new functions could then be scaled-up in vivo and produced for industrial, military, and medical applications. Increasing interest in producing protein biomaterials incorporating nonstandard amino acids could also benefit from metabolic encapsulation by oRiboT. The ability to evolve oRiboT to be more efficient, and to increase protein yields, could allow an improved chassis for protein biomaterial production.
Additional advantages of the disclosed method and compositions will be set forth in part in the description which follows, and in part will be understood from the description, or can be learned by practice of the disclosed method and compositions. The advantages of the disclosed method and compositions will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
It is to be understood that the disclosed method and compositions are not limited to specific synthetic methods, specific analytical techniques, or to particular reagents unless otherwise specified, and, as such, can vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. I. Definitions
“Introduce” in the context of genome modification refers to bringing into contact. For example, to introduce a gene editing reagent (e.g., a vector containing an intron or a Crispr effector protein) to a cell is to provide contact between the cell and the composition. The term encompasses penetration of the contacted composition to the interior of the cell by any suitable means, e.g., via transfection, electroporation, transduction, gene gun, nanoparticle delivery, etc.
“Homologous” refers to the sequence similarity or sequence identity between two polypeptides or between two nucleic acid molecules. When a position in both of the two compared sequences is occupied by the same base or amino acid monomer subunit, e.g., if a position in each of two DNA molecules is occupied by adenine, then the molecules are homologous at that position. The percent of homology between two sequences is a function of the number of matching or homologous positions shared by the two sequences divided by the number of positions compared X 100. For example, if 6 of 10 of the positions in two sequences are matched or are homologous, then the two sequences are 60% homologous. By way of example, the DNA sequences ATTGCC and TATGGC share 50% homology. Generally, a comparison is made when two sequences are aligned to give maximum homology.
The term “operably linked” or “operationally linked” refers to functional linkage between two elements (e.g., a regulatory sequence and a heterologous nucleic acid sequence) permitting them to function in their intended manner (e.g., resulting in expression of the latter). The term encompasses positioning of a regulatory region and a sequence to be transcribed in a nucleic acid so as to influence transcription or translation of such a sequence. For example, to bring a coding sequence under the control of a promoter, the translation initiation site of the translational reading frame of the polypeptide is typically positioned between one and about fifty nucleotides downstream of the promoter. A promoter can, however, be positioned as much as about 5,000 nucleotides upstream of the translation initiation site or about 2,000 nucleotides upstream of the transcription start site. "‘Endogenous” refers to any material from or produced inside a specific organism, cell, tissue or system. “Exogenous” refers to any material introduced from or produced outside an organism, cell, tissue or system. It is understood that the term is relative, but that a reference need not be specified. For example, a protein that is endogenous to a bacterial cell can be produced by the bacterial cell, but that same protein would be exogenous to a eukaryotic cell that does not natively express or produce that protein.
“Heterologous” is used herein in the context of two more elements having a different, non-native relation, relative position, or structure. The elements can include, but are not limited to, naturally occurring elements from the same or different organisms, chimeric elements, synthetic or engineered elements, etc., provided that the elements are not found in nature in the same relation, relative position, or structure.
“Heterologous sequence” refers to a nucleic acid sequence element having a different, non-native relation, relative position, or structure to a second sequence element. Each of the heterologous sequence and the second sequence element can be selected from, but are not limited to, naturally occurring elements from the same or different organisms, chimeric elements, synthetic or engineered elements, etc., provided that the elements are not found in nature in the same relation, relative position, or structure. By way of non- limiting illustration, in some embodiments, a second sequence element is a naturally occurring self-splicing or spliceosomal intron and the heterologous sequence linked thereto is not linked (e.g., directly) to the intron in nature, though it may also be a naturally occurring sequence from the same or different organism. Thus, heterologous sequence(s) can refer to naturally or non-naturally occurring sequences that flank (e.g., are interrupted by) a self-splicing or spliceosomal intron that has been inserted into a non-native position in the same or a different organism.
“Chimeric” as used in the context of a nucleic acid describes a non- naturally occurring polynucleotide that is or has a sequence that is made by an artificial combination of two or more otherwise separated segments of sequence. In some embodiments, the sequences combined to form the chimeric nucleic acid are derived from two or more different organisms or species. This artificial combination is often accomplished by chemical synthesis or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques known in the art (e.g., to facilitate addition, substitution, or deletion of a portion of the nucleic acid).
The terms “target sequence”, “target region”, and “target site” are used interchangeably and refer to a nucleic acid sequence or region which is targeted for a specific manipulation or activity, such as, modification (e.g., gene editing), amplification, detection, and the like. The target site can refer to a specific subsequence of a larger nucleic acid (e.g., an exon) or to the overall sequence (e.g., a gene). The difference in usage will be apparent from context.
The term “locus” is the specific physical location of a DNA sequence (e.g. of a gene) on a chromosome. It is understood that a locus of interest can include a nucleic acid sequence that exists in the main body of genetic material (e.g., in a chromosome) of a cell and also a portion of genetic material that can exist independently to said main body of genetic material such as plasmids, episomes, virus, transposons or in organelles such as mitochondria as non-limiting examples.
“Isolated” means altered or removed from the natural state. An isolated nucleic acid can exist in substantially purified form, or can exist in a non-native environment such as, for example, a host cell. An “isolated nucleic acid” encompasses a nucleic acid segment or fragment which has been separated from sequences which flank it in a naturally occurring state, e.g., a DNA fragment which has been removed from the sequences which are normally adjacent to the fragment in a genome in which it naturally occurs. The term also applies to nucleic acids which have been substantially purified from other components which naturally accompany the nucleic acid (e.g., RNA or DNA or proteins, which naturally accompany it in the cell). The term therefore includes, for example, a recombinant DNA which is incorporated into a vector, into an autonomously replicating plasmid or virus, or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule (e.g., as a cDNA or a genomic or cDNA fragment produced by PCR or restriction enzyme digestion) independent of other sequences. In the context of cells, the term “isolated” refers to a cell altered or removed from its natural state. An isolated cell is thus in an environment different from that in which the cell naturally occurs, e.g., separated from its natural milieu such as by concentrating to a concentration at which it is not found in nature. “Isolated cell” is meant to include cells that are within samples that are substantially enriched for the cell of interest and/or in which the cell of interest is partially or substantially purified.
A “vector” is a composition of matter which comprises an isolated nucleic acid and which can be used to deliver the isolated nucleic acid to the interior of a cell. Examples of vectors include but are not limited to, linear polynucleotides, polynucleotides associated with ionic or amphiphilic compounds, plasmids, and viruses. Thus, the term “vector” includes an autonomously replicating plasmid or a virus. The term is also construed to include non-plasmid and non- viral compounds which facilitate transfer of nucleic acid into cells, such as, for example, polylysine compounds, liposomes, and the like. Examples of viral vectors include, but are not limited to, adenoviral vectors, adeno-associated virus vectors, retroviral vectors, and the like. “Expression vector” refers to a vector containing a polynucleotide having expression control sequences operatively linked to a nucleotide sequence to be expressed. An expression vector contains sufficient cis-acting elements for expression; other elements for expression can be supplied by the host cell or in an in vitro expression system. Expression vectors include all those known in the art, such as cosmids, plasmids (e.g., naked or contained in liposomes), phagemids, BACs, YACs, and viral vectors (e.g., vectors derived from lentiviruses, retroviruses, adenoviruses, and adeno-associated viruses) that incorporate the recombinant polynucleotide.
A “mutation” refers to a change in a nucleotide (e.g., DNA) sequence resulting in an alteration from a given reference sequence. The mutation can be a deletion, insertion, duplication, rearrangement, and/or substitution of at least one deoxyribonucleic acid base such as a purine (adenine and/or guanine) and/or a pyrimidine (thymine, uracil and/or cytosine). Mutations may or may not produce discernible changes in the observable characteristics (phenotype) of a subject. The term “percent (%) sequence identity” describes the percentage of nucleotides or amino acids in a candidate sequence that are identical with the nucleotides or amino acids in a reference nucleic acid sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity. Alignment for purposes of determining percent sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2, ALIGN, ALIGN-2 or Megalign (DNASTAR) software. Appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full-length of the sequences being compared can be determined by known methods.
The % sequence identity of a given nucleic acid or amino acid sequence C to, with, or against a given nucleic acid or amino acid sequence D (which can alternatively be phrased as a given sequence C that has or includes a certain % sequence identity to, with, or against a given sequence D) is calculated as follows:
100 times the fraction W/Z, where W is the number of nucleotides or amino acids scored as identical matches by the sequence alignment program in that program’ s alignment of C and D, and where Z is the total number of nucleotides or amino acids in D. It will be appreciated that where the length of sequence C is not equal to the length of sequence D, the % sequence identity of C to D will not equal the % sequence identity of D to C.
The term “effective amount” means a quantity sufficient to provide a desired pharmacologic and/or physiologic effect.
Recitation of ranges of values are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein.
Use of the term “about” is intended to describe values either above or below the stated value in a range of approximately +/- 10%; in other embodiments the values can range in value either above or below the stated value in a range of approx. +/- 5%; in other embodiments the values may range in value either above or below the stated value in a range of approximately +/- 2%; in other embodiments the values may range in value either above or below the stated value in a range of approximately +/- 1%. The preceding ranges are intended to be made clear by context, and no further limitation is implied.
Disclosed are materials, compositions, and components that can be used for, can be used in conjunction with, can be used in preparation for, or are products of the disclosed method and compositions. These and other materials are disclosed herein, and it is understood that when combinations, subsets, interactions, groups, etc. of these materials are disclosed that while specific reference of each various individual and collective combinations and permutation of these compounds may not be explicitly disclosed, each is specifically contemplated and described herein. For example, if a ligand is disclosed and discussed and a number of modifications that can be made to a number of molecules including the ligand are discussed, each and every combination and permutation of ligand and the modifications that are possible are specifically contemplated unless specifically indicated to the contrary. Thus, if a class of molecules A, B, and C are disclosed as well as a class of molecules D, E, and F and an example of a combination molecule, A-D is disclosed, then even if each is not individually recited, each is individually and collectively contemplated. Thus, in this example, each of the combinations A-E, A-F, B-D, B-E, B-F, C-D, C-E, and C-F are specifically contemplated and should be considered disclosed from disclosure of A, B, and C; D, E, and F; and the example combination A-D. Likewise, any subset or combination of these is also specifically contemplated and disclosed. Thus, for example, the sub-group of A-E, B-F, and C-E are specifically contemplated and should be considered disclosed from disclosure of A, B, and C; D, E, and F; and the example combination A-D. Further, each of the materials, compositions, components, etc. contemplated and disclosed as above can also be specifically and independently included or excluded from any group, subgroup, list, set, etc. of such materials.
These concepts apply to all aspects of this application including, but not limited to, steps in methods of making and using the disclosed compositions. Thus, if there are a variety of additional steps that can be performed it is understood that each of these additional steps can be performed with any specific embodiment or combination of embodiments of the disclosed methods, and that each such combination is specifically contemplated and should be considered disclosed.
All methods described herein can be performed in any suitable order unless otherwise indicated or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate the embodiments and does not pose a limitation on the scope of the embodiments unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the invention.
II. Compositions
Reagents and compositions thereof for use in the disclosed methods are provided. For example, nucleic acids and constructs thereof and gene editing technologies for use in methods of modifying the genome of a cell are provided. Such modified cells, engineered ribosomes made according to the disclosed compositions and methods, and cells expressing the engineered ribosomes are also described.
Disclosed are polynucleotides including a nucleotide sequence encoding an intron, such as a self-splicing or spliceosomal intron. As described in more detail below, the intron can be naturally occurring or non- naturally occurring, such as a chimeric intron containing sequences derived from two or more organisms or species.
In some embodiments, the polynucleotide includes a sequence encoding an intron operably linked to a heterologous sequence in such a manner that the intron alone or in combination with the heterologous sequence can serve as an anchor for targeting gene editing technology, thus facilitating specific gene editing at a site in or adjacent to the heterologous sequence.
The polynucleotide can be single stranded or double stranded. The polynucleotide can be composed of DNA, RNA, one or more synthetic nucleotides, or any combination thereof. The polynucleotide can be integrated into the genome of a cell or can be extrachromosomal. In certain embodiments, the polynucleotide is a vector, such as an expression vector. Expression vectors include all those known in the art, such as cosmids, plasmids (e.g., naked or contained in liposomes), phagemids, artificial chromosomes (e.g., BACs, YACs), and viral vectors (e.g., vectors derived from lentiviruses, retroviruses, adenoviruses, and adeno- associated viruses) that incorporate the polynucleotide. In some embodiments, the polynucleotide is a plasmid or portion thereof, or a viral vector or portion thereof.
In some embodiments, the polynucleotide is present in the genome of the cell at, for example, a target locus. hi some embodiments, a nucleotide sequence encoding the intron and/or the heterologous sequence is operably linked to a control element, e.g., a transcriptional control element, such as a promoter. The transcriptional control element may be functional in either a eukaryotic cell, e.g., a mammalian cell, or a prokaryotic cell (e.g., bacterial or archaeal cell), In some embodiments, a nucleotide sequence encoding the intron and/or the heterologous sequence is operably linked to multiple control elements that allow expression of the nucleotide sequence encoding the intron and/or the heterologous sequence in either prokaryotic or eukaryotic cells. Depending on the host/vector system utilized, any of a number of suitable transcription and translation control elements, including constitutive and inducible promoters, transcription enhancer elements, transcription terminators, etc. may be used in the expression vector (e.g., U6 promoter, HI promoter, etc.).
In some embodiments, the control elements are endogenous to a cell harboring the polynucleotide.
The polynucleotide can form part of a larger unit of DNA, such that when the larger unit is transcribed, the intron sequence is scarlessly removed from the mature transcript.
A. Introns
Introns are genetic elements that interrupt functional RNA- or protein-coding genes, and are removed post-transcriptionally in a process termed splicing. RNA splicing is a process during which precursor messenger RNA (pre-mRNA) transcript is transformed into a mature messenger RNA (mRNA). Introns (non-coding regions of RNA) are removed and so joining together exons (coding regions). For nuclear- encoded genes, splicing occurs in the nucleus either during or immediately after transcription. For many eukaryotic introns, splicing occurs in a series of reactions which are catalyzed by the spliceosome, a complex of small nuclear ribonucleoproteins (snRNPs). However, there also exists self- splicing introns, that is, ribozymes that can catalyze their own excision from their parent RNA molecule.
1. Self-splicing introns
In some embodiments, the intron(s) utilized in the disclosed compositions, methods, and strategies are self-splicing introns. Self-splicing introns have the capacity to splice themselves out from a precursor RNA. Self-splicing introns are preferred because they do not require a spliceosome to scarlessly exit the disclosed constructs.
There are three kinds of self-splicing introns, Group I, Group II and Group III. The initial discovery of self-splicing ability was in the protozoan Tetrahymena thermophila. The self-splicing introns found in T. thermophila are now referred to as Group I introns. Group I introns are widespread but sporadically distributed in nature, and they are present in the genomes of some bacteria, protozoa, fungi, mitochondria, chloroplasts, bacteriophages, and eukaryotic viruses, and in the nuclei of eukaryotic microorganisms. Group I introns all fold into a complex secondary structure with nine loops and employ transesterification reactions to facilitate self- splicing.
Generally, self-splicing of Group I introns depends on two consecutive transesterification reactions initiated by a nucleophilic attack of the 3'OH of an exogenous guanosine cofactor (exoG) at the 5' splice site (SS). ExoG is specifically bound to the P7 catalytic core segment of the splicing ribozyme prior to the first splicing step. This reaction leaves exoG covalently attached to the 5' end of the intron RNA as well as a free 5' exon with an available 3'OH group. In the second transesterification reaction, exoG is replaced by the terminal guanosine at P7, and the reaction is initiated when the 5' exon attacks the 3' SS, resulting in ligated exons and the released linear intron. See Hedberg A., et al., Mob DNA, 4(1): 17 (2013).
On the other hand, Group II self-splicing introns, often found in mitochondrial genes, are excised by a mechanism that bears similarities to pre-mRNA splicing, including the production of lariats. Group II introns catalyze two transesterification reactions to excise themselves from pre- messenger RNA. In the first step of splicing, the 2'-OH of a bulged adenosine residue is used as the nucleophile to attack the 5' splice site. This is followed by a second step in which the free 3'-OH of the 5' exon attacks the 3' splice site to form ligated exons. Group II introns have been found in bacteria and in the mitochondrial and chloroplast genomes of fungi, plants, protists, and an annelid worm. Group II intron RNAs are characterized by a conserved secondary structure, which spans 400-800 nucleotides and is organized into six domains, DI- VI, radiating from a central “wheel”. These domains interact to form a conserved tertiary structure that brings together distant sequences to form an active site. The active site binds the splice sites and branch-point nucleotide residue and uses specifically bound Mg++ ions to activate the appropriate bonds for catalysis. See Lammbowitz AM, et al., Cold Spring Harb Perspect Biol., 3(8):a003616 (2011).
Group III introns perform self-splicing via a lariat structure mechanism, similar to intron excision as catalyzed by the spliceosome. A 2’- OH of a defined residue initiates the splicing by attack of the 5’ splice site to form the lariat, which is followed by a second reaction which joins the 3-OH of the 5’ splice site and the 3’ splice site.
Thus, any type of the foregoing self-splicing introns may be included in the polynucleotide. Particularly preferred for use are Group I and Group II introns; and of these, Group I introns are most preferred. Group I introns do not require any protein factors to splice. In the context of editing the ribosome, the complex 3 -dimensional structure of the ribosome may interfere with splicing of a Group II intron, thus creating a preference for Group I introns. Group II intron may be effective for editing other smaller noncoding RNAs.
Group I introns and Group II self-splicing introns are known in the art. Exemplary Group I introns include: Tetrahymena thermophila rRNA intron, Neurospora crassa cytochrome b gene intron 1 , Neurospora crassa mitochondrial RRNA, Neurospora crassa cytochrome oxidase subunit 1 gene oxi3 intron, phage T4 thymidylate synthase intron, Clamydoronas reinhardtrii 23S rRNA Cr.LSU intron, phage T4 nrdB intron, and Anabaena pre tRNA(Leu) intron. Group II self-splicing introns include yeast mitochondrial oxi3 gene intron5γ and Podospora anserina cytochrome c oxidase I gene.
In some embodiments, the self-splicing intron is a naturally occurring intron, e.g., a naturally occurring group I intron. Suitable self-splicing introns include naturally occurring self-splicing introns from or derived from Tetrahymena thermophila, Nostoc punctiforme, Bacillus anthracis, Tilletiopsis flava, Azoarcus, Pneumocystis carinii, Agrobacterium, or T7-like bacteriophage, Bacteriophage T4. An exemplary suitable Tetrahymena thermophila self-splicing intron is the intron encoded by the following sequence:
AAATAGCAATATTTACCTTTGCACTGAAAAGTTATCAGGCATGCACCTGG TAGCTAGTCTTTAAACCAATAGATTGCATCGGTTTAAAAGGCAAGACCGT CAAATTGCGGGAAAGGGGTCAACAGCCGTTCAGTACCAAGTCTCAGGGGA AACTTTGAGATGGCCTTGCAAAGGGTATGGTAATAAGCTGACGGACATGG TCCTAACCACGCAGCCAAGTCCTAAGTCAACAGATCTTCTGTTGATATGG ATGCAGTTCACAGACTAAATGTCGGTCGGGGAAGATGTATTCTTCTCATA AGATATAGTCGGACCTCTCCTTAATGGGAGCTAGCGGATGAAGTGATGCA ACACTGGAGCCGCTGGGAACTAATTTGTATGCGAAAGTATATTGATTAGT TTTGGAGTACTCG (SEQ ID NO:2, with Pl sequence indicated with italics and IGS indicated with bolding).
Thus, in some embodiments, the polynucleotide includes the sequence of SEQ ID NO:2 or a sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO:2.
Additional naturally occurring self-splicing introns are provided in Table 1. In some embodiments, the polynucleotide includes the sequence of any sequence of Table 1 (i.e., any one of SEQ ID NOS:2-12), or a sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity to thereto (i.e., any one of SEQ ID NOS:2-12).
Table 1: Exemplary intron sequences
Figure imgf000023_0001
Figure imgf000024_0001
Figure imgf000025_0001
Figure imgf000026_0001
In some embodiments, the self-splicing intron is a non-naturally occurring intron. For example, chimeric self-splicing introns can be used. An exemplary chimeric self-splicing intron includes segments derived from Tetrahymena thermophila and Tilletiopsis flava, whose sequence is provided below:
GAACATCAGTGCTACTGACGCACTGAAAAGTTATCAGGCATGCACCTGGT AGCTAGTCTTTAAACCAATAGATTGCATCGGTTTAAAAGGCAAGACCGTC AAATTGCGGGAAAGGGGTCAACAGCCGTTCAGTACCAAGTCTCAGGGGAA ACTTTGAGATGGCCTTGCAAAGGGTATGGTAATAAGCTGACGGACATGGT
CCTAACCACGCAGCCAAGTCCTAAGTCAACAGATCTTCTGTTGATATGGA TGCAGTTCACAGACTAAATGTCGGTCGGGGAAGATGTATTCTTCTCATAA GATATAGTCGGACCTCTCCTTAATGGGAGCTAGCGGATGAAGTGATGCAA
CACTGGAGCCGCTGGGAACTAATTTGTATGCGAAAGTATATTGATTAGTT TTGGAGTACTCG (SEQ ID NO: 1, with P1 sequence indicated with italics and IGS indicated with bolding). Thus, in some embodiments, the polynucleotide includes the sequence of SEQ ID NO:1 or a sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO:1.
2. Spliceosomal Introns
Although discussed herein primarily with respect to self-splicing introns, it is believed that similar strategies and principles can also be applied/carried out in eukaryotic cells (e.g., for modification of eukaryotic genomes) using spliceosomal introns instead of self-splicing (e.g., group I and II) introns. This is because the RNA will be spliced by the time it is exported from the nucleus, and thus should not interfere with RNA function. Spliceosomal introns, which have been found in most eukaryotic genes, are non-coding sequences excised from pre-mRNAs by a special complex called spliceosome during mRNA splicing. Introns occur in both protein- and RNA-coding genes and can be found in coding and untranslated gene regions. Such introns can be (e.g., as naturally occurring introns), or used to derive (e.g., as chimeric introns) intron(s) utilized in the disclosed compositions, methods, and strategies. See, e.g., Poverennaya and Roytberg, “Spliceosomal Introns: Features, Functions, and Evolution,” Biochemistry (Moscow), volume 85, pages 725-734 (2020), which is specifically incorporated by reference herein in its entirety.
Typically, when spliceosomal intron(s) are utilized, the cells have functional spliceosomes. Thus, this embodiment is most typically reserved for eukaryotic cells.
B. Heterologous sequences
The polynucleotides include a sequence heterologous to the nucleotide sequence encoding the intron. For example, the heterologous sequence preferably does not include a sequence endogenous to or flanking the intron in its native context, such as the exon(s) or other coding or non- coding sequence(s) (e.g., 5, 10, 25, 50, 100, etc., bases) upstream and/or downstream of the intron in the organism from which it is derived. In some embodiments, the heterologous sequence is a sequence from an organism distinct from the source of the intron, e.g., a self-splicing intron. For example, the self-splicing intron can be a Tetrahymena thermophila intron and the heterologous sequence can be a bacterial or human exon sequence or other coding or non-coding sequence(s). The heterologous sequence can be positioned upstream or downstream, or preferably, upstream and downstream, of the intron. The intron is preferably inserted into or otherwise interrupts the heterologous sequence at its native genomic locus, thus providing an anchor or target for gene editing of the heterologous sequence.
In some embodiments, the heterologous sequence is or includes a repetitive element (also referred to as “repeat element”). The initial sequencing of the human genome revealed that repetitive DNA sequences accounts for -55% of the genome. More recent computational approaches indicate the proportion of repetitive elements in the human genome may be as high as two-thirds. Repetitive elements differ in their position in the genome, sequence, size, number of copies, and presence or absence of coding regions within them. Identified repetitive DNA sequences can be characterized using five broad categories. Four minor categories, accounting for -10% of genomic DNA, include simple sequence repeats, segmental duplications, tandem repeats and satellite DNA sequences, and processed pseudogenes. The fifth category is transposable elements, accounting for -45% of genomic DNA.
Microsatellites are tandemly repeated sequences, containing units that are 1-6 base pairs long, repeated up to a length of 100 bp or more. Minisatellites form arrays of several hundred units of 7 to 100 bp in length. They are present everywhere with an increasing concentration toward the telomeres. They differ from satellites in that they are found only in moderate numbers of tandem repeats and because of their high degree of dispersion throughout chromosomes.
On the basis of transposition mechanisms, transposable elements can be divided into DNA transposons and retrotransposons. The latter are predominant in most mammals. Transposable elements are primarily composed of retrotransposons. Retro transpos able elements (RTEs) are parasitic DNA sequences that can proliferate by a “copy and paste” mechanism and insert themselves into new genomic positions. RTEs are classified into Long Terminal Repeat (LTR) elements, whose structure and mechanism of retrotransposition resembles that of retroviruses, and non-LTR elements, which do not contain LTRs, resemble integrated mRNAs, and have a distinct mechanism of retrotransposition. The non-LTR elements can be classified as either Long Interspersed Nuclear Elements (LINEs) or Short Interspersed Nuclear Elements (SINEs), predominantly represented by the L1 and Alu families, respectively.
Any of the foregoing repeat elements are suitable heterologous sequences. In some embodiments, the heterologous sequence is or includes a repetitive element, such as, a ribosomal, particular a ribosomal RNA (rRNA), gene or portion thereof, a tRNA gene or portion thereof, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, a telomere, a CRISPR array, etc. Additional or alternatively, any foreign or heterologous or recombinant sequence can be introduced into a cell that already contains that sequence or one with high sequence similarity, rendering the native and introduced elements as repetitive. Thus, in some embodiment, the repetitive element is artificially or synthetically created.
Although the disclosed compositions and methods are particularly advantageous for specifically targeting a specific instance (e.g., locus) of a repetitive element, the heterologous sequence(s) need not be repetitive elements. Thus, in some embodiments, the heterologous sequence(s) can be non-repetitive elements. The heterologous sequence(s) can be coding or non-coding regions. The heterologous sequence(s) can encode proteins or RNA or other functional or non-functional genetic elements.
In some embodiments, the heterologous sequence(s) form part or all of a coding region(s), a non-coding region(s), or a combination thereof, of a gene. In some embodiments, part or all of the heterologous sequence (e.g., a gene) is transcribed. Preferably the intron, e.g., a self-splicing intron, is scarlessly removed during or after transcription.
C. Cells
Cells are also provided. In some embodiments, the cells contain any of the polynucleotides disclosed herein. For example, disclosed are cells harboring a polynucleotide including a sequence encoding an intron (e.g., a self-splicing intron or spliceosomal intron, which can be naturally occurring or a chimeric, for example a chimeric self-splicing intron such as the self- splicing intron encoded by the sequence of SEQ ID NO:1 or a sequence having at least 85% identity to SEQ ID NO:1). In some embodiments, the cells harbor a polynucleotide including the sequence of any one of SEQ ID NOS:2-12 or a sequence having at least 85% identity to any one of SEQ ID NOS:2-12.
In some embodiments, the cells harbor a polynucleotide including a sequence encoding an intron operably linked to a heterologous sequence (e.g., a repetitive element, such as, an rRNA gene or portion thereof, a tRNA gene or portion thereof, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, a telomere, or a CRISPR array). In some embodiments, the intron replaces an endogenous intron in the cell, such as an intron endogenous to or contained within the heterologous sequence. The heterologous sequence may be present in its uninterrupted form in the host cells prior to introduction of the intron alone or in combination with other uninterrupted, native instances of the sequences, particularly where the heterologous sequences are a repetitive element. Thus, the heterologous sequence can be, and preferably it, derived from the host cell’s genome. The genomic sequence can be a native (e.g., endogenous) sequence or it can be a foreign or recombinant sequence. In the experiments below, the heterologous sequence is exemplified with an orthogonal rRNA (oRiboT) in a recombinant E. coli genome.
Also provided are cells containing a disclosed engineered ribosome and/or the polynucleotide encoding one or more rRNAs that form the engineered ribosome.
The polynucleotides can be introduced into the cell by any suitable approach known in the art, including transformation, transduction, gene gun, microinjection, transfection, electroporation, and nucleofection. Transfection techniques are known in the art. See, e.g., Angel and Yanik PLoS ONE 5(7): el 1756. doi: 10.1371/journal.pone.0011756. (2010), the commercially available TransMessenger® reagents from Qiagen, Stemfect™ RNA Transfection Kit from Stemgent, and TransIT®-mRNA Transfection Kit from Mirus Bio LLC, and Clonetegration (e.g., St-Pierre, F.o. et al. ACS synthetic biology 2, 537-541 (2013).
The polynucleotide can be in the form of a vectors. Many vectors, e.g. plasmids, cosmids, minicircles, phage, viruses, etc., useful for transferring nucleic acids into target cells are available. The polynucleotide (e.g., plasmid or viral vector) may or may not be integrated into the cell’s genome. For example, the vector may be maintained episomally, e.g. as plasmids, minicircle DNAs, viruses such cytomegalovirus, adenovirus, etc., or they may be integrated into the target cell genome, through homologous recombination or random integration, e.g. retrovirus-derived vectors such as MMLV, HIV-1, ALV, etc. Upon introduction into the cell, the polynucleotide can be expressed by the cellular machinery.
Suitable expression vectors include, but are not limited to, viral vectors (e.g. viral vectors based on vaccinia virus; poliovirus; adenovirus; adeno- associated virus; SV40; herpes simplex virus; human immunodeficiency virus; a retroviral vector (e.g., Murine Leukemia Virus, spleen necrosis virus, and vectors derived from retroviruses such as Rous Sarcoma Virus, Harvey Sarcoma Virus, avian leukosis virus, a lentivirus, human immunodeficiency virus, myeloproliferative sarcoma virus, and mammary tumor virus); and the like.
Numerous suitable expression vectors are known to those of skill in the art, and many are commercially available, including, pXT1, pSG5 (Stratagene), pSVK3, pBPV, pMSG, and pSVLSV40 (Pharmacia). However, any other vector may be used so long as it is compatible with the host cell.
Any cell may be used in accordance with the foregoing. In some embodiments, the cell is a prokaryotic cell (e.g., an archaeal or bacterial cell). In some embodiments, the cell is E. coli.
In other embodiments, the cell is a eukaryotic cell. For example, the cell can be a cell of a single-cell eukaryotic organism, a plant, cell, an algal cell, a fungal cell (e.g., a yeast cell). The cell can be a mammalian cell. The mammalian cell can be human or non-human mammal, e.g., primate, bovine, ovine, porcine, canine, rodent, monkey, rat, or mouse cell.
In preferred embodiments, the cell is a human cell including, but not limited to, skin cells, lung cells, heart cells, kidney cells, pancreatic cells, muscle cells, neuronal cells, human embryonic stem cells, blood cells (e.g., white blood cells), fibroblasts, bone cells, hepatocytes, pancreatic cells, and pluripotent stem cells. The cell can be a T cell (e.g., CD8+ T cells, CD4+ T cells), hematopoietic stem cells (HSC), macrophages, natural killer cells (NK), B cells, dendritic cells (DC), or other immune cells.
In some embodiments, the cell is from an established cell line or primary cells, where “primary cells,” refers to cells and cells cultures that have been derived from a subject and allowed to grow in vitro for a limited number of passages or splittings of the culture. For example, primary cells may have been passaged 0 times, 1 time, 2 times, 4 times, 5 times, 10 times, or 15 times, but not enough times go through the crisis stage. Typically, the primary cell lines of are maintained for fewer than 10 passages ex vivo.
If the cells are primary cells, they may be harvested from an individual by any convenient method. For example, leukocytes may be conveniently harvested by apheresis, leukocytapheresis, density gradient separation, etc., while cells from tissues such as skin, muscle, bone marrow, spleen, liver, pancreas, lung, intestine, stomach, etc. are most conveniently harvested by biopsy. An appropriate solution may be used for dispersion or suspension of the harvested cells. Such solution can be, for example, a balanced salt solution, e.g. normal saline, phosphate-buffered saline (PBS), Hank's balanced salt solution, etc., conveniently supplemented with fetal calf serum or other naturally occurring factors, in conjunction with an acceptable buffer at low concentration, generally from 5-25 mM. The cells may be used immediately, or they may be stored, frozen, for long periods of time, being thawed and capable of being reused.
D. Gene editing technologies
Gene editing technologies are preferably used to mediate incorporation of a donor oligonucleotide at one or more target sites in the disclosed methods for modifying cellular genomes. Exemplary gene editing technologies include, without limitation, a CRISPR system (e.g., CRISPR/Cas9, base editing, prime editing, etc.), MAGE, zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), triplex-forming compositions, pseudocomplementary oligonucleotides, intron encoded meganucleases small fragment homologous replacement (e.g., polynucleotide small DNA fragments (SDFs)), single- stranded oligodeoxynucleotide-mediated gene modification (e.g., ssODN/SSOs), and intron encoded meganucleases, etc. In some embodiments, the gene editing technologies is a CRISPR system, MAGE, zinc finger nucleases (ZFNs), or TALEN, each of which are discussed in more detail below. i. MAGE
MAGE refers to multiplex automated genome evolution, and generally includes introducing multiple nucleic acid sequences into one or more cells such that the entire cell culture approaches a state involving a set of changes to each genome or targeted region (Wang et al., Nature, 460:894 (2009)). The method can be used to generate one specific configuration of alleles or can be used for combinatorial exploration of designed alleles optionally including additional random, i.e., not-designed, changes. This can be used with any of a variety of devices that allow the cyclic addition of many DNAs in parallel in random or specific order, with or without use of one or more selectable markers.
Compositions and methods for carrying out MAGE are described in U.S. Patent 8,153,432. Briefly, MAGE-based methods typically include introducing multiple nucleic acid sequences into a cell including the steps of transforming or transfecting a cell(s) using transformation medium or transfection medium including at least one nucleic acid oligomer (also referred to herein as a “donor oligonucleotide”) containing one or more mutations, replacing the transformation medium or transfection medium with growth medium, incubating the cell in the growth medium, and repeating the steps if necessary or desired until multiple nucleic acid sequences have been introduced into the cell. In some embodiments, the one or more nucleic acid oligomers is a pool of oligomers having a diversity of different random or non-random mutations at the location(s) of desired mutagenesis. Cells are transfected with a variety of combination of nucleotides leading to the formation of a diverse genomic library of mutants. The diversity of the library can be increased by increasing the number of MAGE cycles. The oligomers can be single-stranded DNA. In preferred embodiments, multiple mutations are generated in a chromosome or in a genome.
Mediated by λ-Red ssDNA-binding protein β, the oligos are incorporated into the lagging strand of the replication fork during DNA replication, creating a new allele that will spread through the population as the bacteria divide. The efficiency of oligo incorporation depends on several factors, but the frequency of the allele can be increased by performing multiple rounds of MAGE on the same cell culture.
For example, genetic diversity of the mutants can be tuned by the number of cycles of mutagenesis. For example, increasing the number of cycles of mutagenesis generally increases the diversity of the library. In particular embodiments, a library is prepared by one or more cycles of MAGE, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more cycles, with or without intervening cycles of selection. In a particular embodiment, a library of mutants is prepared by, for example, between 1 and 50, between 3 and 15, between 5 and 9 cycles of MAGE. The cycles can occur without intervening rounds of selection to increase the diversity of the library prior to selection. The methods can also be modified to include additional or alternative steps to improve genetic diversity. See, for example, Carr, et al., Nucleic Acids Research, l;40(17):el32, 12 pages (2012), and Gregg, et al., Nucleic Acids Research,' 42(7):4779-90 (2014).
Genetic diversity can also be tuned by selecting the number and diversity of the oligonucleotides introduced during any step of the mutagenesis processes. It will be appreciated that the number of oligonucleotides can be increased, that the oligonucleotides can include one or multiple mutations per oligonucleotide and therefore target multiple position (e.g., amino acid positions encoded by the target DNA); that the oligonucleotides can introduce various types of mutations (mismatches, insertions, deletions and with varying degrees of degeneracy (4N — A, T, G, C, 2 selected therefrom, or 3 selected therefrom) or specificity (N equals specific nt).
In general, MAGE experiments can be divided into three classes, characterized by varying degrees of scale and complexity: (i) many target sites, single genetic mutations; (ii) single target site, many genetic mutations; and (iii) many target sites, many genetic mutations. In the first class, MAGE has been used to recode all 321 instances of the TAG stop codon for the synonymous TAA codon using 321 discrete ssDNAs. This project yielded a strain of E. coli with only 63 ‘active’ codons and a 64th ‘blank’ codon available for site- specific incorporation of nonstandard amino acids. In the second class, MAGE can be used to explore the effects of all possible amino acid substitutions at a single target locus. In such an experiment, it is possible, for example, to use a single degenerate ssDNA containing the NNN triplet at its center to introduce all possible amino acid substitutions. In the third class, MAGE can be used to construct diverse cell populations containing combinations of alleles across many loci involved, for example, in a biosynthetic pathway. In this implementation, discrete oligos designed to knockout competing pathways by deletion can be mixed with degenerate oligos designed to randomize target positions in the coding sequence or regulatory regions of key pathway enzymes. The highly diverse population resulting from a MAGE experiment can be used downstream to screen or select for mutants with a prescribed phenotype (e.g., overproduction of a metabolite or small molecule).
MAGE has also been developed in eukaryotic systems in the form of eMAGE, see, e.g., Barbieri, et al, Cell 171, 1-15 (2017). Like MAGE in bacteria, eukaryotic MAGE directs the annealing of synthetic ssDNA at the lagging strand of DNA replication. The mechanism is independent of Rad51- directed homologous recombination and avoids the creation of double-strand DNA breaks, allowing precise chromosome modifications at single base-pair resolution with an efficiency of >40%, without unintended mutagenic changes at the targeted genetic loci. Simultaneous incorporation of up to 12 oligonucleotides with as many as 60 targeted mutations have been observed in one transformation. Iterative transformations of a complex pool of oligonucleotides rapidly produced large combinatorial genomic diversity >105.
This method was used to diversify a heterologous b-carotene biosynthetic pathway that produced genetic variants with precise mutations in promoters, genes, and terminators, leading to altered carotenoid levels. The approach of engineering the conserved processes of DNA replication, repair, and recombination can be automated and establishes a general strategy for multiplex combinatorial genome engineering in eukaryotes. Given the analogous mechanism of annealing ssDNA, the disclosed filtered editing approach can be easily applied for the modification of repetitive genetic elements in eukaryotes. Although MAGE-based mutagenesis is one example, suitable alternative methods of mutagenesis which are well known in the art can be used to create a library of variants. Exemplary methods include, but are not limited to, error prone PCR, PCR or overlap-elongation PCR with degenerate primers, custom DNA synthesis of degenerate DNA fragments encoding the library of interest. ii. CRISPR/Cas
In some embodiments, the gene editing technology is the CRISPR/Cas system. CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) is an acronym for DNA loci that contain multiple, short, direct repetitions of base sequences. The prokaryotic CRISPR/Cas system has been adapted for use as gene editing (silencing, enhancing or changing specific genes) for use in eukaryotes (see, for example, Cong, Science, 15 :339(6121) : 819— 823 (2013) and Jinek, et al., Science, 337(6096):816-21 (2012)). By transfecting a cell with the required elements including a cas gene and specifically designed CRISPRs, the organism's genome can be cut and modified at any desired location. Methods of preparing compositions for use in genome editing using the CRISPR/Cas systems are described in detail in WO 2013/176772 and WO 2014/018423.
In general, “CRISPR system” refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g., tracrRNA or an active partial tracrRNA), a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a “spacer” in the context of an endogenous CRISPR system), or other sequences and transcripts from a CRISPR locus. One or more tracr mate sequences operably linked to a guide sequence (e.g., direct repeat-spacer- direct repeat) can also be referred to as pre-crRNA (pre-CRISPR RNA) before processing or crRNA after processing by a nuclease.
In some embodiments, a tracrRNA and crRNA are linked and form a chimeric crRNA-tracrRNA hybrid where a mature crRNA is fused to a partial tracrRNA via a synthetic stem loop to mimic the natural crRNA:tracrRNA duplex as described in Cong, Science, 15:339(6121): 819— 823 (2013) and Jinek, et al., Science, 337(6096):816-21 (2012)). A single fused crRNA-tracrRNA construct can also be referred to as a guide RNA or gRNA (or single- guide RNA (sgRNA)). Within a sgRNA, the crRNA portion can be identified as the “target sequence” and the tracrRNA is often referred to as the “scaffold.”
There are many resources available for helping practitioners determine suitable target sites once a desired DNA target sequence is identified. For example, numerous public resources, including a bioinformatically generated list of about 190,000 potential sgRNAs, targeting more than 40% of human exons, are available to aid practitioners in selecting target sites and designing the associated sgRNA to affect a nick or double strand break at the site. See also, crispr.u-psud.fr/, a tool designed to help scientists find CRISPR targeting sites in a wide range of species and generate the appropriate crRNA sequences. In the disclosed embodiments, the gRNA or sgRNA are designed to target Cas endonuclease cuts in the intron and/or the heterologous sequence adjacent thereto in a manner sufficient to increase or otherwise direct mutagenesis (preferably by recombination of a donor oligonucleotide) at a target site, typically also in the heterologous sequence(s) adjacent to the intron.
In some embodiments, one or more vectors driving expression of one or more elements of a CRISPR system are introduced into a target cell such that expression of the elements of the CRISPR system direct formation of a CRISPR complex at one or more target sites. While the specifics can be varied in different engineered CRISPR systems, the overall methodology is similar. A practitioner interested in using CRISPR technology to target a DNA sequence can insert a short DNA fragment containing the target sequence into a guide RNA expression plasmid. The sgRNA expression plasmid contains the target sequence (about 20 nucleotides), a form of the tracrRNA sequence (the scaffold) as well as a suitable promoter and necessary elements for proper processing in eukaryotic cells. Such vectors are commercially available (see, for example, Addgene). Many of the systems rely on custom, complementary oligomers that are annealed to form a double stranded DNA and then cloned into the sgRNA expression plasmid. Co-expression of the sgRNA and the appropriate Cas enzyme from the same or separate plasmids in transfected cells results in a single or double strand break (depending of the activity of the Cas enzyme) at the desired target site.
In some embodiments, a vector includes a regulatory element operably linked to an enzyme-coding sequence encoding a CRISPR enzyme, such as a Cas protein. Non-limiting examples of Cas proteins include Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csn1 and Csx12), Cas1O, Csy1, Csy2, Csy3, Cse1, Cse2, Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx1O, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, Cpf1, homologues thereof, or modified versions thereof. In some embodiments, the unmodified CRISPR enzyme has DNA cleavage activity, such as Cas9. In some embodiments, the CRISPR enzyme directs cleavage of one or both strands at the location of a target sequence, such as within the target sequence and/or within the complement of the target sequence. In some embodiments, the CRISPR enzyme directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence.
The CRISPR/Cas system may contain an enzyme that is mutated with respect to a corresponding wild-type enzyme such that the mutated CRISPR enzyme lacks the ability to cleave one or both strands of a target polynucleotide containing a target sequence. By independently mutating one of the two Cas9 nuclease domains, the Cas9 nickase was developed. For example, an aspartate-to-alanine substitution (D10A) in the RuvC I catalytic domain of Cas9 from S. pyogenes converts Cas9 from a nuclease that cleaves both strands to a nickase (cleaves a single strand). Other residues can be mutated to achieve the above effects (i.e. inactivate one or the other nuclease portions). As non-limiting examples, residues D10, G12, G17, E762, H840, N854, N863, H982, H983, A984, D986, and/or A987 can be substituted. Specific mutations that render Cas9 a nickase include, without limitation, H840A, N854A, and N863A. Mutations other than alanine substitutions are also suitable. Two or more catalytic domains of Cas9 (RuvC I, RuvC II, and RuvC III) can be mutated to produce a mutated Cas9 substantially lacking all DNA cleavage activity. A D10A mutation may be combined with one or more of H840A, N854A, or N863A mutations to produce a Cas9 enzyme substantially lacking all DNA cleavage activity (e.g., when activity of the mutated enzyme is less than about 25%, 10%, 5%>, 1%>, 0.1 %>, 0.01%, or lower with respect to its non-mutated form).
Preferably, variants of Cas9, such as for example, a Cas9 nickase are employed in the gene editing technologies containing a CRISPR/Cas system. Nickases can lower the probability of off-target editing, for example, when used with two adjacent gRNAs. A Cas9 nickase having a D10A mutation cleaves only the target strand. Conversely, a Cas9 nickase having an H840A mutation in the HNH domain creates a non-target strand-cleaving nickase. Instead of cutting both strands bluntly with WT Cas9 and one gRNA, one can create a staggered cut using a Cas9 nickase and two gRNAs. This provides even greater control over precise gene integration and insertion. Because both nicking Cas9 enzymes must effectively nick their target DNA, paired nickases have significantly lower off-target effects compared to the double-strand-cleaving Cas9 system, and are generally more effective tools. In a preferred embodiment, the gene editing technology is a Crispr/Cas9 or Crispr/Cas9 nickase (e.g., D10A, H840A, N854A, and N863A nickase).
In some embodiments, the gene editing technology is, or includes, base editing or prime editing. See, e.g., Kantor, et al., Int J Mol Sci. 2020 Sep; 21(17): 6240, which is specifically incorporated by reference herein in its entirety. Due to reliance on homologous recombination, HDR-mediated editing is restricted to dividing cell types, limiting the range of diseases that can be targeted. CRISPR/Cas-mediated single-base-pair editing systems have been devised to bypass these limitations. DNA base-editors encompass two key components: a Cas enzyme for programmable DNA binding and a single- stranded DNA modifying enzyme for targeted nucleotide alteration. Two classes of DNA base-editors (BEs) include: cytosine base-editors (CBEs) and adenine base-editors (ABEs). These BEs can install all four transition mutations, and dual base-editor systems can be utilized for combinatorial editing.
Prime-editors (PEs) are the latest addition to the CRISPR genome- engineering toolkit and represents an approach to expand the scope of donor- free precise DNA editing to not only all transition and transversion mutations, but small insertion and deletion mutations as well. As with CRISPR-mediated base-editing, prime-editing does not rely on DSBs. Prime-editors use an engineered reverse transcriptase fused to Cas9 nickase and a prime-editing guide RNA (pegRNA). PegRNA differs from regular sgRNAs and plays a major role in the system’s function. The pegRNA contains not only (a) the sequence complimentary to the target sites that directs nCas9 to its target sequence, but also (b) an additional sequence spelling the desired sequence changes. The 5' of the pegRNA binds to the primer binding site (PBS) region on the DNA, exposing the non- complimentary strand. The unbound DNA of the PAM-containing strand is nicked by Cas9, creating a primer for the reverse transcriptase (RT) that is linked to nCas9. The nicked PAM-strand is then extended by the RT by using the interior of the pegRNA as a template, consequently modifying the target region in a programmable manner. The result of this step is two redundant PAM DNA flaps: the edited 3' flap that was reverse transcribed from the pegRNA and the original, unedited 5' flap. The choice of which flap hybridizes with the non-PAM containing DNA-strand is an equilibrium process, in which the perfectly complimentary 5' would likely be thermodynamically favored. However, the 5' flaps are preferentially degraded by cellular endonucleases that are ubiquitous during lagging- strand DNA synthesis. Finally, the resulting heteroduplex containing the unedited strand and edited 3' flap is resolved and stably integrated into the host genome via cellular replication and repair process.
Collectively, DNA base-editing and prime-editing tools support precise nucleotide substitutions in a programmable manner, without requiring a donor template. iii. Zinc Finger Nucleases
In some embodiments, the gene editing technology is a zinc finger nuclease (ZFNs) that is engineered to specifically recognize the intron address sequence.
ZFNs are typically fusion proteins that include a DNA-binding domain derived from a zinc-finger protein linked to a cleavage domain. The most common cleavage domain is the Type IIS enzyme Fok I. Fok I catalyzes double-stranded cleavage of DNA, at 9 nucleotides from its recognition site on one strand and 13 nucleotides from its recognition site on the other. See, for example, U.S. Pat. Nos. 5,356,802; 5,436, 150 and 5,487,994; as well as Li et al. Proc., Natl. Acad. Sci. USA 89 (1992):4275- 4279; Li et al. Proc. Natl. Acad. Sci. USA, 90:2764-2768 (1993); Kim et al. Proc. Natl. Acad. Sci. USA. 91:883-887 (1994a); Kim et al. J. Biol. Chem. 269:31,978-31,982 (1994b). One or more of these enzymes (or enzymatically functional fragments thereof) can be used as a source of cleavage domains.
Exemplary Type IIS restriction enzymes are described in International Publication WO 07/014275. Additional restriction enzymes also contain separable binding and cleavage domains. See, for example, Roberts et al. Nucleic Acids Res., 31:418-420 (2003). In certain embodiments, the cleavage domain includes one or more engineered cleavage half-domain (also referred to as dimerization domain mutants) that minimize or prevent homodimerization, as described, for example, in U.S. Published Application Nos. 2005/0064474, 2006/0188987, and 2008/0131962. In certain embodiments the cleavage half domain is a mutant of the wild type Fok I cleavage half domain. In some embodiments the cleavage half domain is a wild type Fok I mutant where one or more amino acid residues at positions 446, 447, 479, 483, 484, 486, 487, 490, 491, 496, 498, 499, 500, 531, 534, 537, and 538 is substituted. See, e.g., Example 1 of WO 07/139898, with amino acid residues in the Fok I protein numbered according to Wah et al, (1998) Proc. Natl. Acad. Sci. USA 95: 10564-10569. In some embodiments the cleavage half domains are modified to include nuclear or other localization signals, peptide tags, or other binding domains.
The DNA-binding domain, which can, in principle, be designed to target any genomic location of interest, can be a tandem array of Cys2His2 zinc fingers, each of which generally recognizes three to four nucleotides in the target DNA sequence. The Cys2His2 domain has a general structure: Phe (sometimes Tyr)-Cys-(2 to 4 amino acids)-Cys-(3 amino acids)- Phe(sometimes Tyr)-(5 amino acids)-Leu-(2 amino acids)-His-(3 amino acids)-His. By linking together multiple fingers (the number varies: three to six fingers have been used per monomer in published studies), ZFN pairs can be designed to bind to genomic sequences 18-36 nucleotides long. Another type of zinc finger that binds zinc between 2 pairs of cysteines has been found in a range of DNA binding proteins. The general structure of this type of zinc finger is: Cys-(2 amino acids)-Cys-(13 amino acids)-Cys-(2 amino acids)-Cys. This is called a Cys2Cys2 zinc finger. It is found in a group of proteins known as the steroid receptor superfamily, each of which has 2 Cys2Cys2 zinc fingers.
The DNA-binding domain of a ZFN can be composed of two to six zinc fingers. Each zinc finger motif is typically considered to recognize and bind to a three-base pair sequence and as such, a protein including more zinc fingers targets a longer sequence and therefore may have a greater specificity and affinity to the target site. Zinc finger binding domains can be “engineered” to bind to a predetermined nucleotide sequence. See, for example, Beerli et al. Nature Biotechnol. 20: 135-141 (2002); Pabo et al. Ann. Rev. Biochem. 70:313-340 (2001); Isalan et al., Nature Biotechnol. 19:656-660 (2001); Segal et al. Curr. Opin. Biotechnol. 12:632-637 (2001); Choo et al., Curr. Opin. Struct. Biol. 10:41 1-416 (2000). Consequently, zinc finger binding domains can be engineered to have a different binding specificity, compared to a naturally-occurring zinc finger protein.
Standard ZFNs fuse the cleavage domain to the C-terminus of each zinc finger domain. In order to allow the two cleavage domains to dimerize and cleave DNA, the two individual ZFNs must bind opposite strands of DNA with their C-termini a certain distance apart. As discussed above, the most commonly used linker sequences between the zinc finger domain and the cleavage domain requires the 5' edge of each binding site to be separated by 5 to 7 bp.
Both single-stranded cleavage and double- stranded cleavage are possible, and double- stranded cleavage can occur as a result of two distinct single- stranded cleavage events. DNA cleavage can result in the production of either blunt ends or staggered ends. In certain embodiments, fusion polypeptides are used for targeted double-stranded DNA cleavage. In certain embodiments fusion proteins target a single-stranded cleavage in a double- stranded section of DNA. Fusion proteins of this type are sometimes referred to as nickases, and can in some embodiments be preferred to limit undesired mutations. In some cases, a nickase is created by blocking or limiting the activity of one half of a fusion half-domain dimer.
Engineering methods include, but are not limited to, rational design and various types of empirical selection methods. Rational design includes, for example, using databases including triplet (or quadruplet) nucleotide sequences and individual zinc finger amino acid sequences, in which each triplet or quadruplet nucleotide sequence is associated with one or more amino acid sequences of zinc fingers which bind the particular triplet or quadruplet sequence. See, for example, U.S. Pat. Nos. 6, 140,081; 6,453,242; 6,534,261; 6,610,512; 6,746,838; 6,866,997; 7,067,617; U.S. Published Application Nos. 2002/0165356; 2004/0197892; 2007/0154989; 2007/0213269; and International Patent Application Publication Nos. WO 98/53059 and WO 2003/016496. iv. Transcription Activator-Like Effector Nucleases
In some embodiments, the gene editing technology is a transcription activator-like effector nuclease (TALEN) that is engineered to specifically recognize the intron address sequence.. TALENs have an overall architecture similar to that of ZFNs, with the main difference that the DNA-binding domain comes from TAL effector proteins, transcription factors from plant pathogenic bacteria. The DNA-binding domain of a TALEN is a tandem array of amino acid repeats, each about 34 residues long. The repeats are very similar to each other; typically they differ principally at two positions (amino acids 12 and 13, called the repeat variable diresidue, or RVD). Each RVD specifies preferential binding to one of the four possible nucleotides, meaning that each TALEN repeat binds to a single base pair, though the NN RVD is known to bind adenines in addition to guanine. TAL effector DNA binding is mechanistically less well understood than that of zinc-finger proteins, but their seemingly simpler code could prove very beneficial for engineered-nuclease design. TALENs also cleave as dimers, have relatively long target sequences (the shortest reported so far binds 13 nucleotides per monomer) and appear to have less stringent requirements than ZFNs for the length of the spacer between binding sites. Monomeric and dimeric TALENs can include more than 10, more than 14, more than 20, or more than 24 repeats.
Methods of engineering TAL to bind to specific nucleic acids are described in Cermak, et al, Nucl. Acids Res. 1-11 (2011). US Published Application No. 2011/0145940, which discloses TAL effectors and methods of using them to modify DNA. Miller et al. Nature Biotechnol 29: 143 (2011) reported making TALENs for site-specific nuclease architecture by linking TAL truncation variants to the catalytic domain of Fok I nuclease. The resulting TALENs were shown to induce gene modification in immortalized human cells. General design principles for TALE binding domains can be found in, for example, WO 2011/072246.
TALENs using the +63 C-terminal truncation have been shown to cleave over a wide range of spacers. This makes design of TALENs easier and increases the number of potential sequences that can be targeted, but it also increases the number of potential regions of the genome that could be cleaved through off-target activity.
There are numerous strategies for creating the fusion proteins described above. These will typically involve joining the DNA binding domain to the cleavage domain or half domain by an operable linker. For instance in typical ZFN with a Fok I cleavage domain cleavage is obtained when the zinc finger proteins bind to target sites separated by approximately 5-6 base pairs. A linker, typically a flexible linker rich in glycine and serine, is used to join each zinc finger binding domain to the cleavage domain See, e.g., U.S. Published Application No. 2005/0064474 and PCT Application WO 07/139898. In some embodiments the engineered nuclease may use modified linkers, linkers that are longer or shorter, more or less rigid, etc. than those conventionally employed for created ZFN or TALEN fusion proteins. The linker may form a stable alpha helix linker. See, e.g., Yan et al. Biochemistry, 46:8517-24 (2007) and Merutka and Stellwagen, Biochemistry, 30:4245-8 (1991). Although the methods described herein are flexible to produce nucleases having a range of linkers, in some embodiments the linkers will be preferentially less than 50 base pairs, less than 30 base pairs, less than 20 base pairs, less than 15 base pairs, or less than 10 base pairs in length. E. Donor oligonucleotides
In some embodiments, a donor oligonucleotide is incorporated at one or more target sites in a cell’s genome. The donor oligonucleotide can include a sequence that can correct a mutation(s) in the genome, though in some embodiments, the donor introduces one or more mutations. In addition to containing a sequence designed to introduce a desired correction or mutation, the donor oligonucleotide may also contain synonymous (silent) mutations, which can facilitate detection of the corrected target sequence using allele- specific PCR of genomic DNA isolated from modified cells.
The donor oligonucleotide can exist in single stranded (ss) or double stranded (ds) form (e.g., ssDNA, dsDNA). The donor oligonucleotide can be of any length. For example, the size of the donor oligonucleotide may be between 1 to 1000 nucleotides. In one embodiment, the donor oligonucleotide is between 25 and 200 nucleotides. In some embodiments, the donor oligonucleotide is between 100 and 150 nucleotides. In a further embodiment, the donor nucleotide is about 50 to 100 nucleotides in length. The donor oligonucleotide may be about 60 nucleotides in length. ssDNAs of length 25-200 are active, e.g., ssDNAs of length 60-90. In some embodiments, the preferred length is about 90 nucleotides.
Successful insertion or recombination of the donor sequence results in a change of the sequence of the target region. Donor oligonucleotides are also referred to as donor fragments, donor nucleic acids, donor DNA, or donor DNA fragments. It is understood in the art that a greater number of homologous positions within the donor fragment will increase the probability that the donor fragment will be inserted or recombined into the target sequence, target region, or target site. Target sequences can be within the coding DNA sequence of a gene or within introns. Target sequences can also be within DNA sequences which regulate expression of the target gene, including promoter or enhancer sequences or sequences that regulate RNA splicing.
The donor sequence can contain one or more nucleic acid sequence alterations compared to the sequence of the region targeted for recombination, for example, a point mutation, a substitution, a deletion, or an insertion of one or more nucleotides. Deletions and insertions can result in frameshift mutations or deletions. Point mutations can cause missense or nonsense mutations. These mutations may disrupt, reduce, stop, increase, improve, or otherwise alter the expression of a gene contained in the target region or site.
The donor oligonucleotide may correspond to the wild type sequence of a gene (or a portion thereof), for example, a mutated gene involved with a disease or disorder.
One or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) different donor oligonucleotide sequences may be used in accordance with the disclosed methods. This may be useful, for example, to create a heterozygous target gene where the two alleles contain different modifications or to create libraries of cells harboring different sequences at one or more target sites.
Donor oligonucleotides are preferably DNA oligonucleotides, composed of the principal naturally-occurring nucleotides (thymine, cytosine, adenine and guanine) as the heterocyclic bases, deoxyribose as the sugar moiety, and phosphate ester linkages. Donor oligonucleotides may include modifications to nucleobases, sugar moieties, or backbone/linkages, depending on the desired structure of the replacement sequence at the site of recombination or to provide some resistance to degradation by nucleases. For example, the terminal two or three inter-nucleoside linkages at each end of a ssDNA oligonucleotide (both 5’ and 3’ ends) may be replaced with phosphorothioate linkages in lieu of the usual phosphodiester linkages, thereby providing increased resistance to exonucleases. Modifications to the donor oligonucleotide should not prevent the donor oligonucleotide from successfully integrating at the target sequence.
In some embodiments, the donor oligonucleotide includes 1, 2, 3, 4, 5, 6, or more optional phosphorothioate internucleoside linkages. In some embodiments, the donor includes phosphorothioate internucleoside linkages between first 2, 3, 4 or 5 nucleotides, and/or the last 2, 3, 4, or 5 nucleotides in the donor oligonucleotide.
Donor oligonucleotides can be either single stranded or double stranded, and can target one or both strands of the genomic sequence at a target locus. For example, the donor oligonucleotides are typically single stranded DNA sequences for MAGE. However, even where not expressly provided, the reverse complement of each donor, and double stranded DNA sequences, based on the provided sequences may also be used. In some embodiments, the donor oligonucleotide is a functional fragment of the disclosed sequence, or the reverse complement, or double stranded DNA thereof.
The nuclease activity of some of the gene editing systems described herein (e.g., Crispr/Cas) cleave target DNA to produce single or double strand breaks in the target DNA. Double strand breaks can be repaired by the cell by non-homologous end joining or homology-directed repair. In non- homologous end joining (NHEJ), the double-strand breaks are repaired by direct ligation of the break ends to one another. As such, no new nucleic acid material is inserted into the site, although some nucleic acid material may be lost, resulting in a deletion. In homology-directed repair, a donor polynucleotide with homology to the cleaved target DNA sequence is used as a template for repair of the cleaved target DNA sequence, resulting in the transfer of genetic information from a donor polynucleotide to the target DNA. As such, new nucleic acid material can be inserted/copied into the site. The modifications of the target DNA due to NHEJ and/or homology-directed repair can be used to induce gene correction, gene replacement, gene tagging, transgene insertion, nucleotide deletion, gene disruption, gene mutation, etc.
The donor polynucleotide typically contains sufficient homology to a genomic sequence at the cleavage site, e.g., 70%, 80%, 85%, 90%, 95%, or 100% homology with the nucleotide sequences flanking the cleavage site, e.g., within about 50 bases or less of the cleavage site, e.g., within about 30 bases, within about 15 bases, within about 10 bases, within about 5 bases, or immediately flanking the cleavage site, to support homology-directed repair between it and the genomic sequence to which it bears homology.
The donor sequence may or may not be identical to the genomic sequence that it replaces. The donor sequence may correspond to the wild type sequence (or a portion thereof) of the target sequence (e.g. , a gene). The donor sequence may contain at least one or more single base changes, insertions, deletions, inversions or rearrangements with respect to the genomic sequence, so long as sufficient homology is present to support homology-directed repair. In some embodiments, the donor sequence includes a non-homologous sequence flanked by two regions of homology, such that homology-directed repair between the target DNA region and the two flanking sequences results in insertion of the non-homologous sequence at the target region. The donor oligonucleotides can be used to add, i.e., insert or replace, nucleic acid material to a target DNA sequence (e.g., to “knock in” a nucleic acid that encodes for a protein, an siRNA, an miRNA, etc.), to add a tag (e.g., 6xHis, a fluorescent protein (e.g., a green fluorescent protein; a yellow fluorescent protein, etc.), hemagglutinin (HA), FLAG, etc.), to add a regulatory sequence to a gene (e.g., promoter, polyadenylation signal, internal ribosome entry sequence (IRES), 2A peptide, start codon, stop codon, splice signal, localization signal, etc.), insert random nucleotides (e.g., NNNNN, where N is any nucleotide), or to otherwise modify a nucleic acid sequence (e.g., introduce a mutation).
To facilitate specific or targeted insertion or recombination of the donor oligonucleotide at the target site, the donor oligonucleotide should possess sufficient sequence homology to both the nucleotide sequence encoding the intron and the heterologous sequence/target site. In particular embodiments, the donor oligonucleotide contains the following components: a 5’ homology arm, a replacement sequence (e.g., the sequence desired to be integrated into the genome), and a 3’ homology arm. The homology arms provide for insertion or recombination into the chromosome (e.g., at the target site), thus replacing a portion of the endogenous genomic sequence with the replacement sequence. In some embodiments, the 3’ end of the 5’ homology arm is the position next to the 5 ’ end of the replacement sequence. In some embodiments, the 5’ end of the 3’ homology arm is the position next to the 3’ end of the replacement sequence.
In some embodiments, the 5’ homology arm of the donor oligonucleotide is homologous to the intron and the 3’ homology arm of the donor oligonucleotide is homologous to the target site. In some embodiments, the 5’ homology arm of the donor oligonucleotide is homologous to the target site and the 3 ’ homology arm of the donor oligonucleotide is homologous to the intron. The extent of homology to the intron and target site can vary. For example, the 5’ homology arm of a donor oligonucleotide can include about 10, 15, 20, 25, 50, 75, 100, 150, 200, 300, 400, 500, or more nucleotides homologous to the intron sequence or the target site. Alternatively or additionally, the 3’ homology arm of a donor oligonucleotide can include about 10, 15, 20, 25, 50, 75, 100, 150, 200, 300, 400, 500, or more nucleotides homologous to the intron sequence or the target site. When optimally aligned, the 5’ and/or 3’ homology arms of the donor oligonucleotide can overlap with one or more (e.g., about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 or more) nucleotides of the intron sequence or target sequence.
Donor oligonucleotides should be designed based on the target site and the requirements and/or preferences of the gene editing technology with which it/they will be used. For example, MAGE donor oligonucleotides are typically about 90 bases long. The first four 5’ bases can be phosphorothioated. The oligonucleotide is designed to match the sequence of the region of interest (with the exception of the desired mutations) such that it will be incorporated into the lagging strand during replication. To determine which genomic strand to use as the template, it is necessary to determine whether the gene is in replichore 1 or 2 and whether it is on the + or - strand. The mismatches, insertions, and/or deletions in the sequence must be centered on the oligonucleotide, and there should be as few alterations as possible, since each change will lower the efficiency of incorporation into the genome.
The extent of homology to the intron and target site can depend on the total length of the donor oligonucleotide, which in turn is impacted by the gene editing technology being used. For example, CRISPR/Cas donor oligonucleotides are typically about 1000 nucleotides long, thus the 5’ and/or 3’ homology arms may be longer for such oligos compared to MAGE oligos. In some embodiments, the 5’ and/or 3’ homology arm of a CRISPR/Cas donor oligonucleotide is about 10, 15, 20, 25, 50, 75, 100, 150, 200, 300, 400, 500, or more nucleotides in length.
On the other hand, MAGE donor oligonucleotides are typically about 90 nucleotides long. In some embodiments, the 5’ and/or 3’ homology arm of a MAGE donor oligonucleotide is about 25, 30, 35, 40, 45, 50, or more nucleotides in length. In some embodiments, the Gibbs Free Energy of homodimer formation less than 12 kcal/mol.
III. Methods
A. Gene and Genome modification
The disclosed compositions can be used in methods of genome engineering as well as modification of extragenomic targets. Such methods include targeted editing and/or continuous evolution of target sites. Modification may be performed in vivo, ex vivo, and/or in vitro (e.g., to produce genetically modified cells that can be reintroduced into a living organism). Engineering can be performed at one target site or multiple target sites, either in parallel or tandem. In preferred embodiments, the modification is genomic modification.
An exemplary method for modifying the genome or extragenomic site of a cell at one or more target sites includes integrating a sequence encoding an intron adjacent to each of the one or more target sites. Any of the disclosed introns can be used in accordance with the method. Subsequently, a donor oligonucleotide is incorporated (e.g., by insertion or recombination (HDR)) at one or more target sites via a gene editing technology. For example, in some embodiments, a CRISPR system (e.g., CRISPR/Cas9) and/or multiplex automated genome engineering (MAGE) is used. Other gene editing systems, including but limited to those discussed elsewhere herein can also be used.
The methods are especially useful for editing of repetitive elements, e.g., repetitive genomic elements. Because the integration of the intron at a specific locus containing a repetitive element in effect constitutes a unique genetic address, the donor oligonucleotide can be preferably incorporated at the specific locus as compared to other loci where copies of the repetitive genomic element may be present.
For most repetitive genetic elements, there is ample unique homology to the genome at least Ikb upstream and downstream of the genetic element (i.e., ribosomes) which permits knockout and replacement of any one of the native operons with a synthetic sequence containing the intron in the desired location. While replacement of native repetitive genetic elements can be performed with current techniques, performing numerous, iterative, edits on these elements is not currently possible. However, generation of a strain containing the repetitive element with introns next to sites that will be edited, provides a platform for, e.g., filtered editing that can be used to perform many rounds of edits coupled with selections or screens for function.
In addition, in most contexts in which practitioners will perform editing on repetitive genetic elements, they will prefer to edit them to investigate effects on their phenotype or evolve them to impart a new phenotype, independent of the operation of their native homologs within the cell. For this purpose, (as illustrated in the experiments below) there is often no need to replace one of the native elements. Instead, a copy of the repetitive element with introns already in the sequence can be provided into the genome of the cell or episomally, and then used to perform continuous in vivo editing.
Integration can be accomplished by any suitable means including, for example, traditional cloning methods, CRISPR/Cas, etc. In some embodiments, the integration is confirmed by allele specific PCR, sequencing, etc. Thus, in some embodiments, the one or more target sites to be modified can be or include a ribosomal gene, for example a native or non- native ribosomal RNA (rRNA) gene, a tRNA gene, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, a telomere, a CRISPR array, or another endogenous, exogenous, foreign, recombinant, etc., repetitive element.
The donor oligonucleotide can be partially or completely homologous to the nucleotide sequence encoding the intron. Working Example 2 shows that when introducing a 90 nt ssODN at a target site using MAGE, a preferred range of about 30-50 nucleotides of homology to the self-splicing intron increased efficiency without significant off-target integration (see Fig. 2E). In some embodiments, the donor oligonucleotide can be partially or completely homologous, except for any nucleotide(s) to be mutated, to the one or more target sites to be modified. Thus, the donor oligonucleotide can be partially or completely homologous to both the intron and the target site to be modified (e.g., a ribosomal gene such as an rRNA gene, a tRNA gene, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, a telomere, a CRISPR array, etc.), or a segment thereof. For example, the 5’ arm of the donor oligonucleotide can be partially or completely homologous to the intron and the 3 ’ arm of the donor oligonucleotide can be partially or completely homologous to the target site to be modified (i.e., except for the nucleotide(s) to be mutated). In some embodiments, the 3’ arm of the donor oligonucleotide can be partially or completely homologous to the intron and the 5 ’ arm of the donor oligonucleotide can be partially or completely homologous to the target site to be modified (i.e., except for the nucleotide(s) to be mutated). One of ordinary skill in the art would be able to determine the suitable amount of homology needed to maximize the efficiency of integration.
The donor oligonucleotide can include one or more mutations relative to the target sites where it is incorporated. The mutations can be targeted (e.g., a specific desirable sequence) or random. In preferred embodiments, the donor oligonucleotide is single- stranded DNA (ssDNA) or double- stranded DNA (dsDNA).
The methods for genome modification can be used to modify a genome at two or more target sites (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, or more). In a particular embodiment, the cell’s genome is modified at two or more target sites. In such cases, the intron integrated adjacent to the first target site can be the same or distinct (e.g., having a different sequence) from the intron integrated adjacent to the second and/or subsequent target site. Thus, one or more introns (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) can be used at the two or more different target sites.
In some embodiments, targeted editing is performed at a first target site (e.g., a specific alteration is introduced), while randomized diversification is performed at a second site. For example, the cell to be genetically modified can be contacted with a plurality of identical donor oligonucleotides such that the donor oligonucleotide incorporated at the first target site is selected therefrom, and/or the cell can be contacted with a plurality of distinct donor oligonucleotides such that the donor oligonucleotide incorporated at the second target site can be selected from a plurality of distinct oligonucleotides (e.g., having a different sequences, such as random mutations relative to the genomic sequence to be replaced). The targeted editing and randomized diversification can be performed in parallel or in tandem in any desirable order.
In some embodiments, targeted editing at the first target site is mediated by a CRISPR system and/or randomized diversification at the second target site is mediated by MAGE. The CRISPR system and MAGE can be used in parallel or tandem.
B. Screens
Methods of screening for desired mutants are also provided. Typically, the screens are designed to identify genetic alterations involved in one or more phenotypes of interest. The screens can be loss of function or gain of function. The screens can be performed in vitro (e.g., in cultured cells) or in vivo (e.g., in a subject such as a mouse or rat).
In particular, a method of screening for one or more mutations that confer a desirable phenotype is described. The method includes modifying the genome of a plurality of cells in accordance with any of the gene editing methods described herein and subsequently selecting for a cell exhibiting the desirable phenotype. The step of selecting can include applying selective pressure to the cells in order to enrich for cells that exhibit a desired phenotype. In certain embodiments, the selection step involves negative selection, positive selection step, or both negative and positive selection.
In some embodiments, multiple rounds of genome modification and selection can be used. For example, when MAGE is used to perform genome modification in screening methods, about 1-10 (e.g., 4, 5, 6, or 7), or 1-100, or 1-1,000 or more MAGE cycles can be performed. Selective pressure can be applied after every cycle to enrich for cells exhibiting the desired phenotype. Different cycles can utilize the same or different pools of donor oligonucleotides targeting the same or different locations.
For example, a desirable phenotype, particularly in the case of improved ribosomes, can be antibiotic resistance. Selecting for a cell that exhibits antibiotic resistance can include exposing the plurality of cells to an effective amount of one or more antibiotics. For example, in the case of screening bacteria, the bacteria can be plated onto an agar containing an effective amount of one or more antibiotics. Variants that are resistant to the antibiotic(s) can be isolated, propagated, and/or characterized. Results of the screen can be validated by independently generating cells containing the one or more genomic modifications (e.g., mutations) identified by the screen.
Other screens designed to select for other cellular improvements can be similarly designed and executed based on the desired trait using a similar overall strategy.
For example, other desired phenotypes can include, but are not limited to, the ability to translate difficult-to-translate amino acid sequences, the ability to catalyze translation of non-natural polymers, and improved orthogonal mRNA recognition and translation.
C. Determining genome modification
Sequencing and allele- specific PCR can be used for determining if gene modification has occurred. PCR primers may be designed to distinguish between the original allele, and the new predicted sequence following recombination. Other methods of determining if a recombination event has occurred are known in the art and may be selected based on the type of modification made. Methods include, but are not limited to, analysis of genomic DNA, for example by sequencing, allele- specific PCR, droplet digital PCR, or restriction endonuclease selective PCR (REMS -PCR); analysis of mRNA transcribed from the target gene for example by northern blot, in situ hybridization, real-time or quantitative reverse transcriptase (RT) PCR; and analysis of the polypeptide encoded by the target gene, for example, by immunostaining, ELISA, or FACS. In some cases, modified cells will be compared to parental controls. Other methods may include testing for changes in the function of the RNA transcribed by, or the polypeptide encoded by the target gene. For example, if the target gene encodes an enzyme, an assay designed to test enzyme function may be used. IV. Ribosomes
As discussed in more detail above and exemplified below, the disclosed compositions, methods, and strategies can be utilized for directed mutagenesis and evolutions. For example, in the experiments below, rRNA was targeted for mutagenesis and improved ribosomes were engineered. Thus, also disclosed are engineered ribosomes. In some embodiments, the engineered ribosome is a prokaryotic ribosome, e.g., a bacterial ribosome. In some embodiments, the engineered bacterial ribosome includes a linker tethering the rRNA of the small subunit (16S rRNA) with the rRNA of the large subunit (23S rRNA), herein referred to as tethered ribosomes. Tethered ribosomes and methods of making thereof are known in the art. See, for example, International Published Application No. WO 2015/184283 and Orelle, C., et al., Nature 524: 119-124 (2015), which are hereby incorporated by reference in their entirety.
Preferably, the engineered ribosome contains one or more mutations relative to a reference, such as a naturally occurring or non-engineered ribosome, or a known, previously engineered ribosome such oRiboT. In specific embodiments, the mutation is a gain-of-function mutation or a loss- of-function mutation. A gain-of- function mutation may be any mutation that confers a new function. A loss-of-function mutation may be any mutation that results in the loss or reduction of a function possessed by the parent. In certain embodiments, the mutation may be in the peptidyl transferase center of the ribosome. In some embodiments, the mutation may be in an A-site of the peptidyl transferase center. In other embodiments, the mutation may be in the exit tunnel of the engineered ribosome.
The ribosome is a ribonucleoprotein machine responsible for protein synthesis. In all kingdoms of life it is composed of two subunits, each built on its own ribosomal RNA (rRNA) scaffold. The independent but coordinated functions of the subunits, including their ability to associate at initiation, rotate during elongation, and dissociate after protein release, are an established paradigm of protein synthesis. The ribosome is an extraordinary complex machine. This large particle, in which RNA is the main structural and functional component, is invariably composed of two subunits that coordinate distinct but complementary functions: the small subunit decodes the mRNA, while the large subunit catalyzes peptide-bond formation and provides the exit tunnel for the polypeptide. The association of the subunits is tightly regulated throughout the cycle of translation.
Bacterial 70S ribosomes are composed of two subunits, a small 30S subunit and a large 50S subunit, both of which are ribonucleoprotein particles. In the bacterium Escherichia coli, the small subunit is assembled from 21 ribosomal proteins and a single 16S ribosomal RNA (rRNA) of 1541 nucleotides, whereas the large subunit is assembled from 33 ribosomal proteins and two rRNAs, a 5S rRNA of 115 nucleotides, and a 23S rRNA of 2904 nucleotides. See Arenz S, et al., Cold Spring Harb Perspect Med., 6(9):a025361 (2016).
Thus, in some embodiments, the engineered ribosome contains one or more mutations in the large and/or the small subunit, such as in the rRNA therein. In a particular embodiment, the engineered ribosome contains one or more mutations in its 23S rRNA. For example, one or more mutations can be present at or within nucleotides 2030-2034 and/or nucleotides 2057-2061 of the 23S rRNA. Exemplary mutations within nucleotides 2030-2034 and/or nucleotides 2057-2061 of the 23S rRNA that can be used, include, without limitation, mutations encoded by a sequence selected from TCACC, CGCCG, TAGCA, GCCTG, CATTG, AAGGT, ACCCG, TCCCG, GTACA, ATTCT, AATGT, and ACCGT.
In some embodiments, the one or more mutations confer resistance to one or more antibiotics. Preferably, the antibiotics reduce or prevent bacterial protein synthesis. Exemplary antibiotics include, without limitation, tetracyclines (e.g., doxycycline), aminoglycosides (e.g., streptomycin, kanamycin and tobramycin), erythromycin, roxithromycin, clarithromycin, lincomycin, lincosamides (e.g., clindamycin), puromycin, phenicols (chloramphenicol), oxazolidinones (linezolid), pleuromutilins (tiamulin), hygromycin A, and hygromycin B. See Arenz S, et al., Cold Spring Harb Perspect Med., 6(9):a025361 (2016), which is hereby incorporated by reference in its entirety, for a discussion of bacterial protein synthesis as a target for antibiotic inhibition.
In certain embodiments, the one or more mutations may render the engineered ribosome resistant to an aminoglycoside, a tetracycline, a pactamycin, a streptomycin, an edein, or any other antibiotic that targets the small ribosomal subunit. In certain embodiments, the one or more mutations may render the engineered ribosome resistant to a macrolide, a chloramphenicol, a lincosamide, an oxazolidinone, a pleuromutilin, a streptogramin, or any other antibiotic that targets the large ribosomal subunit. In preferred embodiments, the one or more mutations confer resistance to one or more antibiotics selected from cholramphenicol, erythromycin, clindamycin, and lincomycin. In some embodiments, the engineer ribosome can accommodate synthetic, abiological monomers. Such monomer include, but are not limited to, non L-alpha-amino acids, noncanonical L-alpha-amino acids and/or D- amino acids. In some embodiments, the ribosomes can polymerize polymers formed from the monomers.
In some embodiments, e.g., f-MAGE (e.g., as exemplied in the experiments below) is employed to precisely edit and evolve the entire ribosome, including the peptidyl transferase center (PTC) where catalysis occurs. Doing so would allow the practioner to re-mold the PTC in a manner to accommodate abiological monomers (beyond the natural L-alpha-amino acids) such that the genetically encoded, or template-directed, production of entirely new polymers and materials can be made.
In some embodiments, the ribosome is a eukaryotic ribosome. Such evolution of ribosomes can be used to create an orthogonal ribosomal system in eukaryotic cells and use this, for example, for production of protein biomaterials in yeast incorporating nonstandard amino acids.
The engineered ribosome may be prepared by expressing a polynucleotide encoding the rRNA of the engineered ribosome. Thus, polynucleotides encoding the rRNAs forming the engineered ribosomes are provided. In some embodiments, the polynucleotide is included in a vector, such as an expression vector.
V. Kits
The nucleic acids, gene editing compositions, cells, reagents, and other disclosed materials can be packaged together in any suitable combination as a kit useful for performing, or aiding in the performance of, the methods. It is useful if the kit components in a given kit are designed and adapted for use together in the method. The kits may include instructions for dosages and dosing regimens.
Provided are kits containing polynucleotides (e.g., plasmid, expression vector) encoding one or more introns, rRNAs, and/or Crispr/Cas components (e.g., sgRNA and Cas9 protein). In some embodiments, the kits contain instructional material for use thereof. In some embodiments, the kit can contain a population of cells, such as prokaryotic or eukaryotic cells to be genetically modified or harboring a disclosed polynucleotide (e.g., plasmid, expression vector). The instructional material can include a publication, a recording, a diagram, or any other medium of expression which can be used to communicate the usefulness of the compositions and methods of the kit. For example, the instructional material may provide instructions for methods using the kit components, such as performing transfections, transductions, infections, and conducting screens.
It is to be understood that the methods and compositions are not limited to specific synthetic methods, specific analytical techniques, or to particular reagents unless otherwise specified, and, as such, can vary. It is also to be understood that the terminology used is for the purpose of describing particular embodiments only and is not intended to be limiting.
The invention can be further understood by the following numbered paragraphs:
1. A polynucleotide comprising a nucleotide sequence encoding an intron flanked on one or both of the 5’ and 3’ ends by heterologous sequence(s).
2. The polynucleotide of paragraph 1 , wherein the intron is a self-splicing intron.
3. The polynucleotide of paragraph 2, wherein the self-splicing intron is a Group I or Group II self-splicing intron.
4. The polynucleotide of paragraph 3, wherein the self-splicing intron is a Group I self-splicing intron.
5. The polynucleotide of any one of paragraphs 2-4, wherein the self-splicing intron is derived from an organism selected from the group comprising Tetrahymena thermophila, Nostoc punctiforme, Bacillus anthracis, Tilletiopsis flava, Azoarcus, Pneumocystis carinii, Agrobacterium, T7-like bacteriophage, Bacteriophage T4, or combinations thereof.
6. The polynucleotide of any one of paragraphs 2-5, wherein the self-splicing intron is a chimeric self-splicing intron, optionally derived from Tetrahymena thermophila and Tilletiopsis flava.
7. The polynucleotide of any one of paragraphs 2-6, wherein the nucleotide sequence encoding the intron comprises the sequence of any one of SEQ ID NOS: 1-12 or a sequence comprising at least 85% identity to any one of SEQ ID NOS: 1-12. 8. The polynucleotide of any one of paragraphs 1-7, wherein the nucleotide sequence encoding the chimeric self-splicing intron comprises the sequence of SEQ ID NO: 1 or a sequence comprising at least 85% identity to SEQ ID NO:1.
9. The polynucleotide of any one of paragraphs 1-8, wherein the intron is a spliceosomal intron.
10. The polynucleotide of any one of paragraphs 1-9, wherein a sequence to be targeted for mutation is 5’, 3’, or a combination thereof relative to the intron sequence, optionally wherein the sequence to be targeted for mutation is comprised in the heterologous sequence(s).
11. The polynucleotide of any one of paragraphs 1-10, wherein the heterologous is native or non-native to the cell into which it will be introduced.
12. The polynucleotide of any one of paragraphs 1-11, wherein the heterologous sequence comprises a repetitive element.
13. The polynucleotide of paragraph 12, wherein the repetitive element is native or non-native to the cell into which it will be introduced.
14. The polynucleotide of paragraphs 11 or 12, wherein the repetitive element is selected from a ribosomal RNA (rRNA) gene or portion thereof, a tRNA gene or portion thereof, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, telomere, or a CRISPR array.
15. The polynucleotide of any one of paragraphs 1-14 comprised in a plasmid or viral vector.
16. A cell comprising the polynucleotide of any one of paragraphs 1-15.
17. The cell of paragraph 16, wherein the cell is a prokaryotic cell.
18. The cell of paragraph 17, wherein the cell is a bacterium.
19. The cell of paragraphs 16 or 17, wherein the intron is not a spliceosomal intron.
20. The cell of paragraph 19, wherein the intron is derived from a Group I self-splicing intron.
21. The cell of paragraph 16, wherein the cell is a eukaryotic cell. 22. The cell of paragraph 16, wherein the intron is a spliceosomal or self-splicing intron.
23. The cell of any one of paragraphs 16-22, wherein the polynucleotide is integrated into the genome of the cell.
24. The cell of any one of paragraphs 16-23 comprising the intron in place of an endogenous intron of the cell.
25. An isolated cell comprising a polynucleotide encoding a self- splicing intron comprising the sequence of any one of SEQ ID NOS: 1-12 or a sequence comprising at least 85% identity to any one of SEQ ID NOS:1- 12, optionally integrated into the genome of the cell.
26. The isolated cell of paragraph 25 comprising the sequence of SEQ ID NO:1 or a sequence comprising at least 85% identity to SEQ ID NO:1, optionally integrated into the genome of the cell.
27. A method of modifying the genome of a cell at one or more target sites comprising integrating a nucleotide sequence encoding an intron adjacent to each of the one or more target sites, and subsequently inducing incorporation of a donor oligonucleotide at each of the one or more target sites via a gene editing technology.
28. A method of modifying the genome of a cell comprising integrating one or more of the polynucleotides of any one of paragraphs 1- 15, and subsequently inducing incorporation of a donor oligonucleotide at or adjacent to the one or more sites of integration via a gene editing technology.
29. A method of modifying a nucleic acid in a cell comprising inducing incorporation of a donor oligonucleotide at or adjacent to the one or more sites of the polynucleotide in the cell of any one of paragraphs 16-26 via a gene editing technology.
30. The method of any one of paragraphs 27-29, wherein the donor oligonucleotide comprises one or more mutations relative to the target locus where it is incorporated.
31. The method of any one of paragraphs 27-30, wherein at least a segment of the donor oligonucleotide is partially or completely homologous to the nucleotide sequence encoding the intron.
32. The method of any one of paragraphs 27-31, wherein the donor oligonucleotide is ssDNA or dsDNA. 33. The method of any one of paragraphs 27-32, wherein the intron is a self-splicing intron derived from one or more organisms selected from the group comprising Tetrahymena thermophila, Nostoc punctiforme, Bacillus anthracis, Tilletiopsis flava, Azoarcus, Pneumocystis carinii, Agrobacterium, T7-like bacteriophage, Bacteriophage T4, or combinations thereof.
34. The method of any one of paragraphs 27-33, wherein the intron is a self-splicing intron that is a chimeric self-splicing intron, optionally derived from Tetrahymena thermophila and Tilletiopsis flava.
35. The method of paragraph 34, wherein the nucleotide sequence encoding the chimeric self-splicing intron comprises the sequence of any one of SEQ ID NOS: 1-12 or a sequence comprising at least 85% identity to any one of SEQ ID NOS: 1-12.
36. The method of any one of paragraphs 27-35, wherein the gene editing technology comprises one or more of a CRISPR system, multiplex automated genome engineering (MAGE), zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), triplex- forming oligonucleotides, pseudocomplementary oligonucleotides, intron encoded meganucleases small fragment homologous replacement, single- stranded oligodeoxynucleotide-mediated gene modification, and intron encoded meganucleases.
37. The method of any one of paragraphs 27-36, wherein the gene editing technology is selected from a CRISPR system, MAGE, or a combination thereof.
38. The method of paragraphs 36 or 37, wherein the CRISPR system comprises one or more sgRNAs and Cas protein optionally wherein the Cas protein is Cas9.
39. The method of any one of paragraphs 36-38, wherein the CRISPR system comprises prime editing or base editing.
40. The method of any one of paragraphs 36-39, wherein the CRISPR system comprises a single- stranded DNA modifying enzyme.
41. The method of any one of paragraphs 36-40, wherein the CRISPR system comprises an engineered reverse transcriptase fused to Cas9 nickase and a prime-editing guide RNA (pegRNA). 42. The method of any one of paragraphs 27-41, wherein the one or more target sites is selected from a RNA (rRNA) gene, a tRNA gene, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, a telomere, a CRISPR array, or another native or non-native repetitive element.
43. The method of any one of paragraphs 27-42 comprising modifying the genome of the cell at two, three, four, five, six, seven, eight, nine, ten, or more target sites.
44. The method of paragraph 43, wherein the intron integrated adjacent to a first target site is distinct from the intron integrated adjacent to a second target site, optionally wherein the intron at each site is different from all of the other introns.
45. The method of paragraphs 43 or 44, wherein the donor oligonucleotide incorporated at one or more target sites is selected from a plurality of identical oligonucleotides and/or wherein the donor oligonucleotide incorporated at one or more target sites is selected from a plurality of distinct oligonucleotides.
46. The method of paragraph 45, wherein incorporation of the donor oligonucleotides at two more sites rely on different gene editing technologies.
47. The method of paragraph 46, wherein incorporation the donor oligonucleotide at one or more target sites is mediated by the CRISPR system and/or incorporation of the donor oligonucleotide at one or more target sites is mediated by MAGE.
48. The method of paragraph 47, wherein the CRISPR system and MAGE are used in parallel or tandem.
49. A method of screening for one or more mutations that confer a desirable phenotype comprising modifying the genome of a plurality of cells by the method of any one of paragraphs 27-48 and selecting for a cell exhibiting the desirable phenotype.
50. The method of paragraph 49, wherein the desirable phenotype is antibiotic resistance, wherein selection comprises exposing the plurality of cells to an effective amount of one or more antibiotics.
51. The method of any one of paragraphs 27-50, wherein a donor oligonucleotide introduces a mutation into one or more ribosomal genes.
52. The method of paragraph 51, wherein a ribosome comprising the rRNA and/or protein encoded by the one or more ribosomal genes can accommodate synthetic, abiological monomers optionally non L-alpha- amino acids, noncanonical L-alpha-amino acids and/or D-amino acids, and optionally polymerize polymers formed therefrom.
53. An engineered bacterial ribosome comprising one or more mutations at nucleotides 2030-2034 and/or 2057-2061 in its 23S rRNA, wherein the one or more mutations are encoded by a sequence selected from TCACC, CGCCG, TAGCA, GCCTG, CATTG, AAGGT, ACCCG, TCCCG, GTACA, ATTCT, AATGT, and ACCGT.
54. The engineered ribosome of paragraph 53, wherein the mutation confers resistance to one or more antibiotics selected from Cholramphenicol, Erythromycin, Clindamycin, Lincomycin.
55. The engineered ribosome of paragraphs 53 and 54 comprising a linker tethering the rRNA of the small subunit (16S rRNA) with the rRNA of the large subunit (23S rRNA).
56. A polynucleotide encoding the rRNAs of the engineered ribosome of any of paragraphs 53-56, optionally wherein the polynucleotide is comprised in an expression vector.
57. A cell comprising the engineered ribosome of any of paragraphs 53-55 and/or the polynucleotide of paragraph 56.
The present invention will be further understood by reference to the following non- limiting examples.
Examples
The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how the compounds, compositions, articles, devices and/or methods claimed herein are made and evaluated, and are intended to be purely exemplary and are not intended to limit the disclosure. Example 1: Integration of a self-splicing intron into oRiboT does not compromise catalytic functionality of oRiboT.
Materials and Methods
Strains and culture conditions
Two strains of E. coli, the wild type MG1655 (CP027060.1, GI:1352181442) and GRO C321.ΔA (CP006698.1, GI:54981157) that lack all UAG codons and release factor 1 , were used and modified in this study. The C321.ΔA strain is derived from strain EcNR2 (ΔmutS:catΔ(yhhB- bioAB):\cl857Δ(cro-ea59):tetR-bla\), modified from E. coli K-12 substr. MG 1655 as previously described. The original C321.ΔA was modified by replacing the carbenicillin gene with spectinomycin using standard λ-Red recombination to permit compatibility with plasmid constructs containing the engineered ribosomes. C321.ΔA_spec was grown in low salt LB-min medium (10 g tryptone, 5 g yeast extract, 5 g NaCl in 1L dH2O) at 34°C. Variants of this strain were constructed to contain the chromosomal- and plasmid-based oRiboT and wild type ribosome constructs described below. All strain variants were grown under the same conditions with the exception of supplementation with inducers (aTc, IPTG) or antibiotics as described.
Plasmid construction
Plasmids containing the tethered ribosome (RiboT, pRibo-T), the orthogonal-tethered ribosome (oRiboT, poRibo-T), and oGFP (poGFP) were obtained. The promoters driving the expression of RiboT and oRiboT were replaced with PL-tetO such that transcription of the ribosome variants can be controlled by the TetR protein and anhydrotetracycline (aTc, Sigma). These plasmids were used as templates to construct all ribosome variants described in this study.
Plasmids containing oRibo-T with intron variants were constructed by insertion of gblocks (Integrated DNA Technologies) of introns with modified IGS (to match complementary site next to the 5’ intron start site) at sites 1 (after U1926, IGS = gggacc), 2 (after U2041, IGS = gcactg), 3 (after U2489, IGS = gccgcc), and 4 (after 16S U1522, IGS = gggttc) into the oRiboT plasmid harboring a colEl origin of replication and carbenicillin resistance marker. All plasmids were assembled using Gibson assembly (NEB). oRiboT-Tt4 and oRiboT-Tt4Δ were constructed by amplifying oRiboT-Tt1 and oRiboT-Tt1Δ plasmids, respectively, with primers containing desired mutations and assembled using Gibson Assembly.
Genomic integration ofoRibo-T and variants
Clonetegration (St-Pierre, F.o. et al. ACS synthetic biology 2, 537-541 (2013)) was used to introduce oRiboT-Til into the genome of C321.A_spec. pOSIP-CH (Addgene plasmid # 45980) and pE-FLP (Addgene plasmid # 45978). Briefly, oRiboT-Til was amplified by PCR and cloned into pOSIP- CH plasmid using Gibson assembly. After electroporation and overnight recovery, cells were plated on chloramphenicol plates. Following overnight growth, correct integration was verified by colony PCR. The attP sites of the correct integrants were then removed with MAGE in order to prevent re- excision. Following PCR and sequencing verification, pE-FLP (flipase), containing a heat-sensitive origin of replication, was transformed and used to remove the FLP-excisable integration module. Then pE- FLP was removed with overnight growth at 37°C.
RNA isolation and RT-PCR
Cells expressing WT oRiboT or variants were grown overnight in LB media supplemented with 50 mg/mL carbenicillin, diluted 1:100, and grown to mid-logarithmic growth with aTc induction for RNA isolation. Total cellular RNA was purified using a DNeasy Blood and Tissue Kit (Qiagen) following manufacturer’s instructions. RT-PCR was performed using SuperScript OneStep RT-PCR System with Platinum Taq DNA Polymerase on purified RNA from oRibo-T, oRiboT-Tt1, oRiboT-Tt1Δ, oRiboT-Tt4, oRiboT-Tt4Δ.
As a control and in order to rule out PCR amplification of the DNA template, RNA from cells containing oRiboT-Tt4 and oRiboT-Tt4Δ were treated with or without DNase. It was believed that if DNA template were present, the amplified PCR product would match the non-spliced intron instead of the size corresponding to the post-spliced RNA or WT ribosomes. Results showed that samples containing DNase have the expected products for oRiboT-Tt4 and oRiboT-Tt4Δ whereas the same samples lacking DNase show an unspliced band consistent with genomic DNA contamination. All other RT-PCR reactions were performed in the presence of DNase to eliminate genomic DNA contamination. Products of RT-PCR were analyzed by agarose gel electrophoresis and sequenced by Sanger sequencing.
Testing oRibo-T activity in vivo
C321.A_spec cells were transformed with the following plasmids: (1) oRibo-T and poGFP (2) oRiboT-Tt1 and poGFP (3) oRiboT-Tt1Δ and poGFP (4) oRiboT-Tt2 and poGFP (5) oRiboT-Tt2Δ and poGFP (6) oRiboT- Tt3 and poGFP (7) oRiboT-Tt3Δ and poGFP. All transformed strains were grown at 34°C in LB-min medium supplemented with 50 mg/mL carbenicillin and 30 mg/mL of kanamycin. Wells of a 96-well plate were filled with 150 μL of LB media supplemented with 50 mg/mL carbenicillin and 30 mg/mL kanamycin. The wells were inoculated with colonies from each plasmid combination above (in triplicate), and incubated at 34°C for 16 h with shaking. Clear bottom wells of another 96-well plate were filled with 150 μL of LB-min medium supplemented with 50 mg/mL carbenicillin and 30 mg/mL of kanamycin, and 1 mM IPTG and 100 ng/mL anhydrotetracycline (aTc). The plate was inoculated with 2 μL of saturated initial inoculation plate, and incubated with linear shaking (731 cycles per min) for 16 h at 34°C on a Biotek Synergy Hl plate reader, with continuous monitoring of cell density (GD600) and sfGFP fluorescence (excitation at 485 nm and emission 528 nm with sensitivity setting at 70).
Oligonucleotides and DNA sequencing
All oligonucleotides were purchased from Integrated DNA Technologies or the Yale University W.M. Keck Oligonucleotide Synthesis Facility with standard purification. MAGE oligonucleotides were 90 nucleotides in length and contained two phosphorothioated bases on the 5’ end. Depending on the oligonucleotides described in the paper, degenerate bases or mutations were placed within the oligo. Additional primers were purchased for cloning and RT-PCR on oRibo-T constructs. Primers for next- generation sequencing (NGS) were designed with five degenerate bases at the 5’ end. To create libraries for NGS, genomic DNA of each of ~2 x 109 cells after f-MAGE was extracted using a Qiagen Genomic DNA purification kit and PCR was used for targeted amplification of the sequencing region. Up to two libraries were pooled for sequencing using an Illumina MiSeq. Data was analyzed with open source software; briefly, after quality filtering, reads were searched for primer sequence, the site of mutagenesis was determined, and WT and mutant reads were quantified.
Results
A self-splicing intron was inserted into the rRNA gene encoding an orthogonal tethered ribosome (oRiboT) (Orelle, et al., Nature, 524, 119-124 (2015)) to break the sequence redundancy with the seven homologous rRNA genes native to E. coli (Figs. 1A-B), and to act as a unique addressable site to target genome editing to the desired locus and exclude all others in the cell. Experiments were designed to determine if the placement of the intron within the oRiboT gene would allow a 90mer MAGE ssODN to hybridize across the junction of the intron and repetitive genetic element (exon). By using CRISPR/Cas9 to cut the intron-exon junction and using a dsDNA pool (harboring multiple mutations spread throughout a Ikb stretch) that would hybridize upstream and downstream of the cut-sites (Fig. 1C), it was believed that mutations could be introduced exclusively into the oRiboT gene at one or multiple loci, differentiating it from other repetitive genetic elements. Similarly, by performing MAGE mutagenesis cycles with pools of ssODNs targeting the intron-oRiboT junction and harboring mutations in the region of the repetitive genetic element (Fig. 1D), mutations could be localized only to the oRiboT gene at the exclusion of other ribosomal targets. When the targeted oRiboT gene is transcribed, the intron is spliced out, releasing an rRNA containing targeted mutations.
It was investigated whether the Tetrahymena thermophila type 1 self- splicing intron can be stably integrated and post-transcriptionally removed scarlessly to maintain the sequence and function of oRiboT. The Tetrahymena intron was chosen because it is spliced naturally from the T. thermophila rRNA and functions effectively in both in vitro and in vivo contexts (Kruger, et al., Cell, 31, 147-157 (1982)). The intron was inserted into oRiboT immediately after position U1926 of the 23S rRNA (Site 1 in Fig. 1B), as this position represents the location of the intron within the T. thermophila ribosome, and has been previously demonstrated to function in Escherichia coli (Zhang, et al., Rna, 1, 284 (1995)). Two independent C321.ΔA spec strains (Lajoie, et al., Science, 342, 357-360 (2013)) of E. coli containing wild type (WT) oRiboT (+ control) or oRiboT with the Tetrahymena thermophila intron inserted (oRiboT-Tt1) were cultured, and RT-PCR was performed on total purified RNA from each population using primers that amplified the region spanning the intron-exon junctions. A single band at 128 nucleotides (nt) was observed in both cases, indicating the complete and scarless splicing of the intron from oRiboT-Tt1.
To confirm this finding, an oRiboT-Tt1 mutant (oRiboT-Tt1Δ) lacking the internal guide sequence (IGS) necessary for intron function was created. RT-PCR of total purified RNA from this strain revealed two bands, one at 128 nt indicating the presence of native ribosomes, and a second at 535 nt, indicating an unspliced product resulting from the IGS deletion. In this case, two bands were observed because the total purified RNA contained both the WT 23S sequence and oRiboT-Tt1Δ.
In order to assess whether oRiboT-Tt1 was spliced correctly and to differentiate it from the cellular background, oRiboT-Tt1 and oRiboT-Tt1Δ were used as templates to construct two other mutants, oRiboT-Tt1b (+IGS) and oRiboT-Tt1bΔ (ΔIGS), which contain a unique sequence following the intron-exon splice junction (Fig. 2B) such that a primer can selectively amplify oRiboT-Tt1b or oRiboT-Tt1bΔ among the native ribosomes. Following RNA purification and RT-PCR from cell culture, a single band at 128 nt for oRiboT-Tt1b, corresponding to the expected spliced product, and a single band at 535 nt for oRiboT-Tt1bΔ, corresponding to the expected un- spliced product from the ΔIGS mutant were observed.
Finally, sequencing of oRiboT-Tt1b RT-PCR products confirmed the scarless ligation of the ribosome at the intron-exon junction post-splicing. Together, these results demonstrate stable integration of the Tetrahymena intron into the ribosome followed by scarless self-splicing from the oRiboT RNA precursor.
The catalytic functionality of oRiboT ribosomes whose genes contain a stably integrated intron was investigated to determine by assaying protein production in vivo. Because oRiboT contains an orthogonal anti-Shine Dalgarno sequence in its 16S rRNA, (Hui & De Boer, Proceedings of the National Academy of Sciences 84, 4762-4766 (1987), Rackham & Chin, Nat Chem Biol, 1, 159-166 (2005)) a compatible GFP reporter (orthogonal GFP, oGFP) containing an orthogonal ribosome binding site sequence (oRBS) (Orelle, et al., Nature, 524, 119-124 (2015), Rackham & Chin, Nat Chem Biol, 1, 159-166 (2005)) was constructed to establish an orthogonal ribosome — mRNA translation channel designed to limit cross-reactivity with native translational machinery. In this scheme, the expression of oGFP was driven from the IPTG — LacR inducible PL-lacO promoter and oRibo-T from the aTc — TetR inducible PL-tetO promoter (Fig. 2A). Steady-state GFP expression was first assayed in cells containing either oRibo-T or oRiboT- Tt1 under all four induction conditions (-IPTG -aTc, +IPTG -aTc, -IPTG +aTc, +IPTG +aTc) and in each case equivalent GFP expression in cells containing oRiboT or oRiboT-Tt1 was observed (Fig. 2C). Specifically, low levels of GFP was observed without oGFP induction (-IPTG -aTc and - IPTG +aTc). Upon full induction (+IPTG + aTc), high levels of GFP was observed in cells containing either oRiboT or oRiboT-Tt1, demonstrating that the insertion of the Tetrahymena intron into the oRiboT-Tt1 ribosomal RNA and its subsequent removal does not interfere with protein translation. Notably, slightly elevated levels of oGFP fluorescence was observed in cells induced with IPTG alone and verified in cells containing the oGFP construct alone, indicating basal levels of translation from native ribosomes cross- reacting with oGFP mRNAs. This is consistent with previously reported characterization of tethered ribosomes (Carlson, et al., Nature communications, 10, 1-13 (2019), Schmied, et al., Nature, 564, 444-448 (2018)), and is amplified when oRiboT-Tt1 is induced by aTc (Fig. 2C).
Example 2: Integration of a self-splicing intron permits targeted modification of oRiboT via CRISPR/Cas9 and MAGE.
Materials and Methods
Filtered-CRISPR (f-CRISPR)
C321.ΔA spec with genomically-integrated oRiboT-Tt2 with distinguishing mutation upstream of intron for NGS-sequencing validation (oRiboT-Ti2-ed) was transformed with a plasmid encoding cas9 from S. pyogenes under a pLtetO inducible promoter. Cells were inoculated from single colonies, and grown to mid- logarithmic growth in a shaking incubator at 34°C with induction of cas9. To induce expression of the lambda-red recombination proteins (Exo, Beta and Gam), cell cultures were shifted to 42°C for 15 min and then immediately cooled on ice. In a 4°C environment, 1 mL of cells was centrifuged at 16,000g for 30 s. The supernatant was removed and the cells resuspended in Milli-Q water. The cells were spun down, the supernatant was removed, and the cells were washed a second time. After a final 30 s spin, the supernatant was removed and dsDNA with 400nt-homology 5’ and 3’ of the cas9 cut-sites was introduced in DNase-free water to the cell pellet, along with a plasmid containing a CRISPR array with two spacers specific for the 3’ terminus of the Tetrahymena intron and tether of oRibo-T, respectively. The ssODN-cell mixture was transferred to a pre- chilled 1 mm gap electroporation cuvette (Bio-Rad) and electroporated using the following parameters: 1.8 kV, 200 V and 25 mF. LB-min medium (3 mL) was immediately added to the electroporated cells. The cells were recovered from electroporation and grown at overnight with induction of both cas9 and CRISPR plasmids.
Filtered-MAGE (f-MAGE)
MAGE was carried out as previously described (Wang, H.H. et al. Nature 460, 894-898 (2009)). Liquid cultures were inoculated from single colonies, and grown to mid- logarithmic growth in a shaking incubator at 34°C. To induce expression of the lambda-red recombination proteins (Exo, Beta and Gam), cell cultures were shifted to 42°C for 15 min and then immediately cooled on ice. In a 4°C environment, 1 mL of cells was centrifuged at 16,000g for 30 s. The supernatant was removed and the cells resuspended in Milli-Q water. The cells were spun down, the supernatant was removed, and the cells were washed a second time. After a final 30 s spin, the supernatant was removed and ssODNs prepared at a concentration of 5-6 μM in DNase-free water were added to the cell pellet. The ssODN- cell mixture was transferred to a pre-chilled 1 mm gap electroporation cuvette (Bio-Rad) and electroporated using the following parameters: 1.8 kV, 200 V and 25 mF. LB-min medium (3 mL) was immediately added to the electroporated cells. The cells were recovered from electroporation and grown at 30°C for 3-3.5 h. Once cells reached mid- logarithmic growth they were used in additional MAGE cycles. The ssODNs used for Filtered-MAGE were designed to possess between 0 to 70 nt of homology to the intron in order to determine optimum design parameters for targeted mutagenesis and reduction of off-target mutations. All mutagenesis was performed on a genomically integrated oRiboT-Til integrated with clonetegration into C321.ΔA_spec (containing a 22-nt mutation 108-nt upstream of the intron-exon junction to distinguish it from native ribosomes). For subsequent f-MAGE experiments, 44 nt overlap to intron was used, following the general MAGE protocol as above. For MAGE mutagenesis of RiboT plasmid for cell survival in antibiotics, the same oligo design strategy was employed and oligo homology was targeted to the 5' or 3’ side of intron, respectively. Cells were recovered after every MAGE cycle. A minimum of 3 MAGE cycles were used due to lower efficiency of plasmid targeting. Cells were then recovered overnight before plating or selections.
Results
It was next investigated whether the intron-ribosome junction could serve as a unique addressable site for targeted modification of oRiboT by commonly used gene editing methods - CRISPR/Cas9 (Halperin, et al., Nature, (2018)) and MAGE (Gaudelli, et al., Nature, 551, 464-471 (2017)). A CRISPR array plasmid was designed having two spacers with homology to the 5’ region of the intron, and the linker between the 16S and 23S rRNA unique to oRibo-T, respectively. Then a dsDNA was created containing 400 bp of homology directly upstream and downstream of the cut-sites, and having a 7-bp degenerate region to allow both the quantification of allelic replacement frequency (ARF) and the complexity of the generated library by deep sequencing. The CRISPR plasmid and dsDNA were introduced to a strain with genomically integrated oRiboT-Tt2 with upstream distinguishing mutation for sequencing (oRiboT-Tt2-ed). Cells were grown to saturation and induced with Cas9 to select for cells with dsDNA replacing double- stranded breaks introduced into the E. coli chromosome. After isolating the genomic DNA of the selected cultures followed by paired-end NGS, the ARF for oRiboT-Tt2-ed and WT ribosomes was quantified. It was found that there was extensive editing of oRiboT-Tt2-ed and almost no mutagenesis of WT ribosomes, even though the region mutagenized was identical to all seven native ribosomes (Fig. 2D). The ARF was 98.27% for oRiboT and 0.29% for all seven native ribosomes. Furthermore, 14,147 unique mutants were obtained out of a theoretical complexity of 16,384 (86.35% library efficiency), demonstrating that complex libraries of mutants can be easily generated with this method while avoiding editing of unintended genomic sites sharing sequence similarity with the target locus.
This approach was then applied to MAGE, and to assess the ability to generate molecular evolution across multiple target sites in vivo in oRibo-T. MAGE introduces mutations using single- stranded DNA (ssDNA) oligodeoxynucleotides (ssODNs) that complex with ssDNA annealing proteins (e.g., λ. Red Beta recombinase (Costantino, Proceedings of the National Academy of Sciences, 100, 15748-15753 (2003)) and hybridize to the lagging strand of the replicating chromosome (Wang, et al., Nature, 460, 894-898 (2009), Barbieri, et al., Cell, 171, 1453-1467. el413 (2017)). Through cyclical introduction of complex pools of ssODNs, MAGE permits higher depth and breadth of mutation by avoiding toxicity associated with DNA DSB inherent in other genome editing methods (Komor, et al., Cell, 168, 20-36 (2017), Gaj, et al., Trends in biotechnology, 31, 397-405 (2013), Kim & Kim, Nature Reviews Genetics 15, 321-334 (2014)). As such, MAGE generates multisite genome modifications and has been used for the molecular evolution of proteins (Amiram, et al., Nature biotechnology, 33, 1272 (2015)), pathway diversification (Wang, et al., Nature, 460, 894-898 (2009), Barbieri, et al., Cell, 171, 1453-1467. el413 (2017)), and whole- genomic recoding (Lajoie, et al., Science, 342, 357-360 (2013)). To first determine the optimal parameters for filtered MAGE (f-MAGE) ssODN construction, ten 90-mer ssODNs were designed to target the intron- ribosome junction, containing varying homology to the intron and exon and harboring mismatch mutations targeting the 23 S oRiboT sequence of a chromosomally integrated oRiboT-Tt variant (Fig. 2E). One cycle of MAGE was performed for each ssODN and then deep sequencing was performed to quantify the frequency of conversion at the oRiboT-Tt locus and at the seven wild-type ribosome genes. To restrict PCR-screening to only the oRiboT-Tt1 locus, a 22-nt mutation was introduced 108 nt upstream of the intron-exon junction. ssODNs targeting exclusively the 23S rRNA sequence with no homology to the intron demonstrated allelic replacement frequencies (ARFs) of 4% and 5% at the native ribosomes and oRiboT-Tt1, respectively (Fig. 2E). Notably, the measured ARFs at the native ribosomes represent a frequency shared across all seven sites, rendering the frequency of a mutation at any one of those sites <1%. As homology of the ssODN to the intron increased to 44 nt, an increase in ARFs at the oRiboT-Tt1 locus to 11% was observed coupled with a striking decrease at the native ribosome genes to <1%. As ssODN homology to the intron increased to 70 nt (20 nt homology to the ribosome), almost undetectable (<0.0066%) ARFs at the native ribosomes with decreasing ARFs at oRiboT to 1.1% was observed. Thus, optimal parameters for f-MAGE ssODN design may be context specific, in which 44 nt of homology to the intron (46 nt to the ribosome) maximizes conversion of oRiboT whereas 70 nt of homology to the intron (20 nt to the ribosome) renders off-target conversions at the seven native ribosome loci effectively undetectable (Fig. 2E, Table 2).
Table 2. Ratio of targeting of oRiboT over WT ribosomes with each level of intron overlap.
Figure imgf000073_0001
ARF = allelic replacement frequency.
Ratio of targeting of oRiboT over WT ribosomes with each level of intron overlap. ARF = allelic replacement frequency. As overlap to intron is increased, the specificity for f-MAGE editing of the desired locus also increases. Example 3: Targeted diversity via filtered MAGE enables generation of antibiotic resistance variants.
Materials and Methods
Antibiotic selections
Following the f-MAGE procedure as described above, cells containing Ribo-T-intron or WT-ribosome-intron plasmids were diversified with six cycles of f-MAGE, with degenerate ssODNs targeting regions 2030 - 2034 and 2057 - 2061, and having homology to the 5’ or 3’ portion of the intron, respectively Cells were grown for 16 h with aTc to induce ribosome expression. Then 50 μL of each culture was seeded into 3 mL of LB-min medium containing aTc and antibiotic (273 μM erythromycin, 1.3 mM clindamycin, 7.74 μM chloramphenicol, or 28.22 mM lincomycin) and grown overnight. Plasmid DNA was isolated from each culture (Qiagen) and re-transformed into unselected C321 and MG1655 strains to confirm that the plasmid was causal to antibiotic resistance. The cells were plated on carbenicillin plates. Individual clones were grown in a 96-well plate after overnight induction in LB + aTc, with aTc and one of four antibiotics included at the concentrations specified above.
Results
To demonstrate the utility of selectively evolving ribosomes with new function by f-MAGE while preserving native ribosomes in the same cell, it was tested whether targeted diversity could be introduced within the PTC to confer resistance to ribosome-targeting antibiotics. F-MAGE was first used to recreate G2032A, G2057A, and A2058G mutations in non- orthogonal wild type (WT-Tt2) and tethered (RiboT-Tt2) ribosomes containing the Tetrahymena intron at Site 2. These mutations were chosen because they confer resistance to erythromycin, clindamycin, chloramphenicol, and lincomycin (Douthwaite, Journal of bacteriology, 174, 1333-1338 (1992), Ettayebi, et al., Journal of bacteriology, 162, 551-557 (1985)). The rRNA was expressed from the strong inducible PL-tetO promoter on plasmids to enhance ribosome expression. F-MAGE was used to introduce targeted modifications by directing mutagenic ssODNs at the 5’ and 3’ ribosome-intron junctions at Site 2. WT-Tt2-derived mutants at the three published sites displayed antibiotic resistance phenotypes when challenged with the panel of four antibiotics. These results demonstrate cell survival from WT-Tt2 ribosomes under antibiotic conditions that render the native ribosomes non-functional, and validate two key aspects of this study. First, new antibiotic-resistant ribosomes could be evolved by f-MAGE in the presence of the native translational machinery. Second, cells are capable of surviving solely from ribosomes transcribed from intron-containing genes. Interestingly, differences in antibiotic sensitivities in RiboT-Tt2 (e.g., resistance to erythromycin in native RiboT-Tt2 variant), were observed indicating that the presence of the tether may cause functional changes in the ribosome that are not completely understood.
F-MAGE was next applied to generate a complex library of RiboT- Tt2 ribosomes in order to discover new mutations that confer antibiotic resistance. Based on the positions of known antibiotic resistance mutations, mutagenic MAGE ssODNs were designed containing five degenerate nucleotides to target two 23S rRNA regions: Region 1: 2030 - 2034 and Region 2: 2057 - 2061. Six cycles of MAGE was performed with this complex pool of ssODNs, followed by liquid selections in the four antibiotics to isolate individual mutants after plating on solid media. Seven mutant ribosomes were identified that showed varying degrees of resistance to the antibiotics (Fig. 2F). To confirm that these ribosome mutants conferred antibiotic resistance and that the results were independent of any potential mutation(s) in the cell’s native ribosome or elsewhere, the plasmids containing the mutant ribosomes were re-transformed into a clean MG 1655 genetic background. It was observed that each mutant conferred resistance to a subset of the antibiotics (Fig. 2G). For example, some of the identified mutants exhibited broad resistance to the panel of four antibiotics assayed (e.g., M4, M6), while others (e.g., M5, M7) showed resistance exclusive to one or two antibiotics.
To further validate these Ribo-T antibiotic -resistant mutants and assess the impact of the tethered linker, these mutations were reconstructed in the natural ribosome with the intron (WT-Tt2). Most mutations mapped qualitatively to WT-Tt2, exhibiting a slightly weaker resistance phenotype, likely due to the higher basal resistance to antibiotics observed in RiboT-Tt2 (Fig. 2G). However, the antibiotic resistance of several mutants was tether- dependent (e.g., M2 for erythromycin and lincomycin, M3 for chloramphenicol, and M6 for clindamycin). Collectively, seven new ribosome mutations that can be used to confer conditional orthogonality to native ribosomes and/or RiboT were identified. Functional differences in the RiboT due to its tether were revealed.
Example 4: Integration of distinct self-splicing introns across multiple sites in the ribosome.
Two main experiments were conducted to assess whether a self- splicing intron can be introduced across multiple sites of the ribosome simultaneously so as to enable multi-site Filtered Editing. First, ribosome variants in which the Tetrahymena intron was introduced across three additional sites were cloned. To determine whether the Tetrahymena intron could be placed in multiple distinct positions in the ribosome, three additional oRiboT constructs in which the intron was inserted at three additional sites within the peptidyl transferase center (PTC) were constructed (Fig. 1B). These variants (Site 2 = oRiboT-Tt2, Site 3 = oRiboT-Tt3, site 4 = oRiboT-Tt4) were profiled with respect to oGFP expression under the same growth and induction conditions described for oRiboT-Tt1 and qualitatively equivalent results were observed. Importantly, oRiboT-Tt2 and oRiboT-Tt3 both supported robust expression of oGFP, indicating that the introns were successfully spliced to yield functional ribosomes (Fig. 2C). RT-PCR was also performed on total purified RNA from strains with Tetrahymena intron at sites 1-4, as above, and a single band indicating complete splicing at each site was observed. Sequencing of the RT-PCR products confirmed the scarless ligation of the ribosome at the intron-exon junction post-splicing in all four insertion sites.
To further characterize the Tetrahymena ribosomes, oRiboT-Tt1, oRiboT-Tt2, and oRiboT-Tt3 were used as templates to construct ΔIGS mutants of each, yielding oRiboT-Tt1Δ, oRiboT-Tt2Δ, and oRiboT-Tt3Δ, respectively. All ΔIGS oRiboT mutants showed low levels of GFP fluorescence, demonstrating the deleterious impact on translation of an unspliced Tetrahymena intron in contrast to full ribosomal function upon splicing of the un-ablated Tetrahymena intron at sites throughout the large or small subunits of the ribosome. These results indicated that introns can be modularly place into the ribosome across many sites, opening its use for filtered editing at significant ribosomal positions. In a second set of experiments, bioprospecting was performed to identify ten Group I self-splicing introns and it was investigated whether these additional intron variants could serve as unique addressable sites for multiplex editing of two or more repetitive sites. oRibo-T variants with self- splicing intron candidates at sites 1 or 2, respectively were built and tested (Table 3, Figs. 3A-3B). In selecting the introns, a combined bioinformatics and literature-based approach was employed to select group I introns that had been previously shown to self-splice in vitro or in vivo, with a preference for splicing in vivo (Table 3).
Table 3. Introns employed in this study.
Figure imgf000077_0001
It was observed that all introns tested were functional at site 1 (Fig. 3A), but showed reduced function at site 2 (Fig. 3B), in contrast to the Tetrahymena intron, which was functional at all sites tested. Nevertheless, a number of the introns at site 1 demonstrated WT-levels of oRiboT function, indicating that the use of these introns at site 1 in parallel with Tetrahymena at another site could enable editing at several locations simultaneously. The introns that demonstrated the best performance at site 1 were subsequently used to construct oRiboT constructs having the new intron at site 1 , and Tetrahymena intron at sites 2, 3, or 4, respectively. Results showed that all of the ribosomal constructs had robust activity with Tetrahymena at site 4 and intron candidates at site 1 (Fig. 3E). However, introduction of Tetrahymena at sites 2 or 3 led to ablation of ribosomal function (Figs. 3C-3D).
In order to further validate splicing of these new intron candidates as was previously done for Tetrahymena, E. coli containing oRiboT with Tfa, Tfb, and Tfc introns at sites 1 or 2, respectively, were grown and RT-PCR was performed on total purified RNA from each population using primers that amplified the region spanning the intron-exon junctions. All of the introns showed splicing at both sites 1 and 2, yet ribosomes with newly- tested introns at site 2 showed reduced function. These data indicate that the highly complex 3-dimensional structure of the ribosome may be a barrier to simultaneous splicing and ribosome assembly in some ribosomal locations (e.g., site 2). It is believed that the additional space provided by the intersubunit bridge at site 1 could act to relieve this inhibition.
Having demonstrated the ability to introduce distinct introns across multiple sites in the ribosome, this system was expanded to target multiple sites within a repetitive genetic element in parallel. A strategy to create orthogonal introns with both the capability to splice in any location (as demonstrated for the Tetrahymena intron) and also contain unique 5’ or 3’ sequences that would differentiate these introns from each other was developed. Since all introns tested were group I introns, chimeric introns were created by toggling between elements from one intron to another. Variants of the Tetrahymena intron with the Pl helix and P9 helix replaced with homologues from the other introns tested in this study were developed. Changes to Pl have been shown to be extremely deleterious to Tetrahymena function (Guo & Cech, RNA, 8, 647-658 (2002)) but it was hypothesized that Pl transplanted from introns demonstrating self-splicing activity could functionally compensate for the loss of the native Pl in Tetrahymena. An oRiboT-variant with chimeric Tetrahymena introduced into site 2, containing Pl from Tfa (oRiboT-CTt2) was created (Fig. 3F). RT-PCR on total purified RNA from E. coli strains containing wild type (WT) oRiboT (+ control), oRiboT-Tt2 (+ control), or oRiboT-CTt2 was performed using primers that amplified the region spanning the intron-exon junction. A single band was observed in all cases, indicating the complete and scarless splicing of the engineered intron from oRiboT-CTt2, just as the natural intron was spliced from oRiboT-Tt2. Sequencing of oRiboT-CTt2 RT-PCR products confirmed the scarless ligation of the ribosome at the intron-exon junction post-splicing. Furthermore, strains with an oGFP reporter controlled by the IPTG — LacR inducible PL-lacO promoter and oRibo-T from the aTc — TetR inducible PL- tetO promoter, containing WT oRibo-T, oRiboT-Tt2, or oRiboT-CTt2 were constructed. Upon assaying steady-state GFP expression under full induction (+IPTG +aTc), it was observed that the ribosome produced oGFP at the same level as WT oRibo-T or oRiboT-Tt2. This indicates that the ribosome assembled correctly and is functional after intron splicing.
Example 5: Multi-site intron integration permits targeted editing and randomized diversification in vivo.
Materials and Methods
Anti-Shine-Delgarno site editing experiments
Cells containing oRiboT-CTt2-Tt4 plasmid were diversified with six cycles of f-MAGE, with ssODN (AntiSD-WT) that switches the orthogonal aSD to the WT E. coli aSD sequence targeting region in the 16S rRNA. Cells recovered from each of the cycles of f-MAGE were grown for 16 h with aTc and IPTG to induce ribosome and oGFP expression, respectively. The oGFP fluorescence of all members of the population was quantified with the BD FACS Aria.
Multi-site editing and evolution experiments
For multisite-editing experiments, cells containing oRiboT-CTt2-Tt4 plasmid were diversified with six cycles of f-MAGE with ssODNs to make the M4 mutation in the 23 S rRNA, and having homology to the 5’ or 3’ portion of the CTt intron, respectively, as well as an ssODN to switch the orthogonal aSD to the WT E. coli aSD sequence targeting region in the 16S rRNA.
For multisite-evolution experiments, cells containing oRiboT-CTt2- Tt4 plasmid were diversified with six cycles of f-MAGE with ssODNs targeting regions 2030 - 2034 and 2057 - 2061 in the 23S rRNA, and having homology to the 5’ or 3’ portion of the CTt intron, respectively, as well as an ssODN to switch the orthogonal aSD to the WT E. coli aSD sequence targeting region in the 16S rRNA.
Cells recovered from each of the cycles of f-MAGE were grown for 16 h with aTc and IPTG to induce ribosome and oGFP expression, respectively. The oGFP fluorescence of cells across the population was quantified with the BD FACSAria. For control experiments to dissect unique phenotypes via continues f-MAGE editing cells from f-MAGE cycle 0-6 were FACS-sorted into negative and positive bins, respectively, and grown overnight. Confluent cultures were grown for 16 h with 100 ng/mL aTc to induce ribosome expression, and plated on chloramphenicol plates for obtaining CFU counts (see below).
For antibiotic selection experiments, cells from each of cycles of f- MAGE were grown for 16 h with 100 ng/mL aTc to induce ribosome expression. 50μL of confluent culture was plated on LB-min agar plates containing 100 ng/mL aTc and 15.48 μM chloramphenicol, or LB-min agar plates containing 50 mg/mL carbenicillin and 30 mg/mL of kanamycin (non- selective plates). CFUs were quantified for the selective and non-selective plates to calculate survival ratios. In order to obtain kinetic growth curves, 1.5 μL of each overnight culture was seeded into 150 μL of LB-min medium containing 100 ng/mL aTc and 7.74 μM chloramphenicol in a 96-well clear- bottom plate. The plate was incubated with linear shaking (731 cycles per min) for 16 h at 34°C on a Biotek Synergy Hl plate reader, with continuous monitoring of cell density (OD600).
Results
After establishing the capabilities of this technology to enable in vivo evolution of noncoding RNAs having extensive sequence homology to native loci, this system was employed for continuous in vivo multi-site evolution. An intron was positioned at site 2 in order to diversify the PTC/exit tunnel, and also an intron was positioned near the anti-Shine Delgarno Sequence (aSD) in the 16S rRNA (designated as site 4) (Fig. 4A, Fig. 1B), in order to evolve orthogonality of the anti-Shine-Delgarno sequence (aSD) simultaneously in vivo (Fig. 4A). As before, the rRNA was expressed from the strong inducible PL-tetO promoter on plasmids to enhance ribosome expression, in a strain containing an oGFP reporter under the IPTG — LacR inducible PL-lacO promoter. To first validate whether a ribosome containing such a placement of introns would be functional, the chimeric CTt intron was inserted into site 2 of oRiboT, and natural Tt intron into site 4 to form oRiboT-CTt2-Tt4. Upon full induction (+IPTG 4-aTc), it was observed that oRiboT-CTt2-Tt4 also produced oGFP at the same level as WT oRiboT. These results indicate efficient intron self-splicing and ribosome assembly. To conclusively validate complete splicing of both introns within oRiboT-CTt2-Tt4, the RT-PCR assay was performed as before on total purified RNA from E. coli strains containing wild type (WT) oRiboT(+ control), oRiboT-Tt2 (+ control), or oRiboT-CTt2-Tt4 using primers that amplified the region spanning the intron-exon junctions of each intron. Analysis, and subsequent sequencing, of the RT-PCR products indicated complete splicing at both sites 2 and 4, and seamless ligation of exon. A single band (~128nt) was observed for the site 2 and 4 introns, respectively, with no evidence of the unspliced (535 nt) product.
Having validated an oRiboT construct with two mutually-orthogonal introns, f-MAGE to was performed to validate in vivo evolution at the aSD (site 4). Site 4 of oRiboT-CTt2-Tt4 was targeted with an ssODN that would switch the orthogonal aSD to the WT E. coli aSD sequence. Six cycles of MAGE was performed. The cells were then induced for oGFP expression and the cell populations from each MAGE cycle were visualized with flow cytometry (Fig. 4F). The fluorescence after each cycle shifted noticeably, and there was a dramatic enrichment of non- fluorescent cells observed after cycle 6, indicating that the aSD of most members of the oRiboT-CTt2-Tt4 population had shifted to WT and thus prevented oGFP translation. Sequencing of individual clones plated from the 6th f-MAGE cycle confirmed that they contained the WT anti-SD sequence. This finding highlights the ease with which ribosomal populations can be evolved in vivo with f-MAGE and a similar approach can be employed to diversify several positions in parallel.
Then, multi-site continuous editing was performed simultaneously on the large and small subunits in the same ribosome in vivo to enable evolution of antibiotic resistance and tuning of orthogonality. Six cycles of f-MAGE was performed at two positions in the large subunit (PTC/exit tunnel, site 2) and one position in the small subunit (anti-oSD, site 4) simultaneously in a strain containing oRiboT-CTt2-Tt4 (Fig. 4A). The 5’ and 3’ of the site 2 chimeric intron were targeted with ssODNs encoding the M4 mutation previously identified to confer chloramphenicol resistance (Fig. 2F), and site 4 intron with ssODN that switched the orthogonal aSD to the WT E. coli aSD sequence. The oGFP fluorescence was quantified by flow cytometry after each cycle of f-MAGE and growth curves were performed in chloramphenicol and plated on chloramphenicol plates for CFUs. A decrease in fluorescence with each cycle was observed as the ribosome was switched to no longer being orthogonal (Fig. 4B). A gradual increase in liquid growth and CFUs was observed with successive cycles of f-MAGE (no growth without f-MAGE cycling, and peaking at 0.02 survival ratio at cycle 6) (Fig. 4C), which indicates that two phenotypes could be evolved continuously in vivo in a repetitive genetic element while reducing off-target edits in the cell’s native context. Furthermore, the ability to edit both the large and small subunits simultaneously enabled the evolution of convergent phenotypes: the switch of oRiboT to non-orthogonal RiboT as well as the introduction of mutations to the exit tunnel were necessary for cell survival in chloramphenicol .
In order to demonstrate the capability to evolve populations continuously in multiple sites in vivo, f-MAGE was performed in a strain containing oRiboT-CTt2-Tt4 with degenerate ssODNs targeting 5’ and 3’ of the site 2 chimeric intron to create a diverse population (1.04 x106 theoretical complexity) with the aim of evolving chloramphenicol resistance while simultaneously tuning the orthogonality of this population via an ssODN that switched the orthogonal aSD to the WT E. coli aSD sequence at site 4 (Fig. 4D). As before, a decrease in fluorescence was observed with each cycle until the population became non-fluorescent at cycle 6. This was paralleled by an increase in growth in chloramphenicol and numbers of surviving clones on solid medium (Fig. 4E). Antibiotic resistance was rarer in the diversified library compared to editing with a discrete oligo at site 2 (0.0003 vs. 0.02 survival ratio at cycle 6). Also, the ratio of surviving mutants increased dramatically after cycle 4 (Fig. 4E) as in the discrete editing experiment (Fig. 4C). The survival of a population member depends on conversation of both the anti-SD to WT (site 4), and a mutation at site 2 which can allow chloramphenicol resistance. Thus, the accumulation of members of the population with both mutations is necessary for survival, which can explain the delay in survival ratio until cycle 4. Liquid growths of populations after each cycle of f-MAGE in 7.74μM chloramphenicol was also performed, and a gradual improvement in growth in both the discrete editing experiment and the evolution experiment were observed. The observed accumulation of surviving mutants with each f-MAGE cycle demonstrates the rapid evolution of a population that can be achieved by targeting edits to a single repetitive genetic element at the exclusion of others in the cell.
Further induction of oGFP and the RiboT ribosomes of the selected populations enriched for non-orthogonal ribosomes (Fig. 4C-4D, panels 2 and 3), evident by the populations no longer being fluorescent. This is consistent with the RiboT mutants, containing chloramphenicol-resistant mutations near site 2, needing to be non-orthogonal to support cell survival under chloramphenicol selection. To further validate that unique phenotypes can be dissected via continues f-MAGE editing or evolution, oGFP expression was induced in populations from f-MAGE cycles 0-6 (Fig. 4B, Fig. 4D) which were sorted by FACS into low or high bins, respectively. The cells were then plated on chloramphenicol plates for obtaining CFU counts. Results showed consistent survival of all cycles sorted for high fluorescence, and no growth for cells from the low bin, for the exception of small amount of CFUs for cycle 6 (0.37% survival ratio compared to positive sorted bin). These trends in chloramphenicol growth for subpopulations sorted by their oGFP fluorescence highlight the functional interdependence between sites 2 and 4 within the ribosome. Both editing to a non-orthogonal aSD and favorable mutation within the exit tunnel near site 2 are necessary for evolution of chloramphenicol resistance. The cycle 6 data for the negative bin can be explained by the increase of escape mutants due to accumulated off-target mutations, but does not off-set the greater population-level trends observed in this or previous cycles. Importantly, these data demonstrate that one can functionally dissect phenotypic differences within complex populations as a result of both discrete edits and evolution of complex population via Filtered Editing. Furthermore, multi-site editing can be used to evolve desired traits due to the combined functional role of each one of the multiple sites being targeting. It is anticipated that this will be of importance in efforts to evolve the ribosome and other complex repetitive genetic elements.
Summary
A method which allows targeted genome editing and evolution of select repetitive genetic elements while preserving the integrity of native loci containing identical sequences was developed. The utility of f-MAGE and f- CRISPR was demonstrated through targeted editing of the most complex noncoding RNA in the cell, the ribosome, and this achieved in vivo editing of ribosome variants and rapid generation of a ribosome library across a cell population without the need for laborious plasmid cloning, site-directed mutagenesis, or re-transformations. This study allowed isolation of new ribosome variants resistant to antibiotics, elucidation of subtle impacts that tethered subunits have on ribosome function, and establishment of a new approach for the rapid evolution of next-generation ribosomes that could accommodate nonstandard monomers or catalyze new chemical bond formation in the future (Maini, et al. Journal of the American Chemical Society, 137, 11206-11209 (2015), Dedkova, et al., Biochemistry, 51, 401- 415 (2012), Melo Czekster, et al., Journal of the American Chemical Society, 138, 5194-5197 (2016), Fujino, et al., Journal of the American Chemical Society, 138, 1962-1969 (2016), Ohta, et al., Current opinion in chemical biology 12, 159-167 (2008)). Similar strategies may also be applied to evolve noncoding RNAs or other repetitive genetic elements. Furthermore, the strategy employed to perform multi-site editing of the ribosome highlights the ability of Filtered Editing to both create randomized diversification, as well as introduce discrete edits with high efficiency, across a population continuously in vivo. Such precision in both type and location of edits, as well as the targeting of only one repetitive genetic element out of many containing identical sequence within the cell, sets this method apart from previous continuous evolution (Halperin, et al., Nature, (2018), Esvelt, et al., Nature, 472, 499-503 (2011)) or genome engineering (Gaj, et al., Trends in biotechnology, 31, 397-405 (2013), Kim & Kim, Nature Reviews Genetics 15, 321-334 (2014), Anzalone, et al., Nature, 576, 149-157 (2019), Gaudelli, et al., Nature, 551, 464-471 (2017), Wang, et al., Nature, 460, 894-898 (2009)) approaches, and it is anticipated that these criteria will be important for the next generation engineering efforts aimed at complex nanomachines such as ribosomes.
It was observed that in the group I introns were tested besides the Tt intron, ribosomes were only functional with these introns at site 1. These data indicate that the highly complex 3-dimensional structure of the ribosome may be a barrier to simultaneous splicing and ribosome assembly in some ribosomal locations (e.g., site 2). The slower catalytic rate of some of these introns (Zarrinkar, et al., Nucleic acids research, 24, 854-858 (1996), Doudna & Szostak, Molecular and cellular biology, 9, 5480-5483 (1989)) compared to the Tt intron could present a barrier for simultaneous splicing of the intron and correct ribosomal folding. The additional space provided by the intersubunit bridge at site 1 could act to relieve this inhibition. Similarly, while the Tt intron could be tolerated in every location of the ribosome where it was positioned, the placement of two Tt introns in close proximity to each other (at sites 1 and 2) led to inhibition of ribosomal function (Fig. 3). Placement of an intron at sites 1, 2, 3, and a second at site 4 was well-tolerated, however, and this was used for multi-site evolution across the ribosome. It is also anticipated that in other noncoding RNAs with less 3 -dimensional structure than ribosomes, the locations for intron insertion will be more permissive, allowing the broader application of this technique and introns identified in this study.
The precision afforded by the disclosed filtered editing compositions and methods, in both type and location of edits, as well as the targeting of only one repetitive genetic element out of multiple identical genomic sequences, distinguishes these compositions and methods from previous evolution and genome engineering approaches. It is believed that previously, only in vitro mutagenesis techniques could be used to evolve repetitive genetic elements such as ribosomes, at the cost of library complexity, and inability to edit native loci. Genome editing methods, while allowing more complex libraries, could not be used as they would edit all instances of a repetitive genetic element throughout the genome. It is believed that none of the traditional approaches such CRISPR/Cas9, Base editing, Prime editing, MAGE Overlap PCR, Error-Prone PCR, Quickchange Mutatgenesis MP6 Random Mutatgensis, evolvR, and PACE, when used alone, can accomplish all of: mutate genomic locus, mutate episomal locus, in vivo mutagenesis, generate randomized stretches of sequence, make defined edits, distinguish repetitive genetic elements, with minimal time between selection/screen of repetitive genetic element to next round of mutagenesis (e.g., 2 hours), and maximum in vivo complexity generated within locus per round of mutagenesis (e.g., estimated at 109).
Filtered editing permits the co-evolution of multiple, distal sites of a single repetitive genetic element directly in the genome. This allows for iterative introduction of precise edits that drive continuous evolution of dynamic genotypic diversity, while leaving the remainder of the cell’s genome unperturbed. Such capabilities hold promise for current challenges in synthetic biology, such as the systematic repurposing of the cell’s translational apparatus, which spans multiple components (e.g., tRNAs, aaRS, EF-Tu, and the ribosome).
In addition to the use of native and engineered introns to provide additional addresses for genome editing within the ribosome or other repetitive genetic elements, the use of f-CRISPR can expand the space over which mutations can be introduced. While f-MAGE can be used to introduce deep edits near a chosen intron, f-CRISPR can be used to make distributed edits between two introns at a distance of Ikb or more. This is ideal for evolving complex populations for desired phenotypes, where the mutagenic landscape of the population can be continuously refined and assayed in vivo. Due to the species-independence of both CRISPR/Cas and the Group I introns tested in this study, it is contemplated that f-CRISPR can be ported to eukaryotic genome engineering to edit and evolve repetitive genetic elements such as tRNAs, ncRNAs, and ribosomes.
In this work, two parallel phenotypes with convergent functions in the ribosome were simultaneously evolved, editing orthogonality of oRiboT and its antibiotic resistance. Such co-evolution strategies can be further used for evolving ribosomes and other genetic elements in vivo. The ability to evolve ribosomes with Filtered Editing can aid in efforts to reengineer orthogonal ribosomes for the metabolic insulation of the translation of protein biopolymers, or enable the continuous in vivo evolution of next- generation ribosomes capable of accommodating nonstandard monomers and synthesis of sequence-defined biopolymers, while insulated from the cell’s native translation apparatus.
For the challenge of editing and evolving repetitive genetic elements such as ribosomes to expand their functions, the previous toolkit available involved either a compromise of specificity, time, and/or depth of library complexity. It is believed that the filtered editing compositions and methods described herein, for the first time, allow for the application of genome editing technologies to precisely edit and evolve repetitive genetic elements in vivo.
Radford, et al., “Targeted editing and evolution of engineered ribosomes in vivo by filtered editing”, Nat Commun. 2022 Jan 10; 13(1): 180. doi: 10.1038/s41467-021-27836-x, and included all supplemental materials associated therewith is specifically incorporated by reference herein in its entirety.
Unless defined otherwise, all technical and scientific terms used herein have the same meanings as commonly understood by one of skill in the art to which the disclosed invention belongs. Publications cited herein and the materials for which they are cited are specifically incorporated by reference.
Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. Such equivalents are intended to be encompassed by the following claims.

Claims

CLAIMS We claim:
1. A polynucleotide comprising a nucleotide sequence encoding an intron flanked on one or both of the 5 ’ and 3 ’ ends by heterologous sequence(s).
2. The polynucleotide of claim 1 , wherein the intron is a self-splicing intron.
3. The polynucleotide of claim 2, wherein the self-splicing intron is a Group I or Group II self-splicing intron.
4. The polynucleotide of claim 3, wherein the self-splicing intron is a Group I self-splicing intron.
5. The polynucleotide of any one of claims 2-4, wherein the self- splicing intron is derived from an organism selected from the group comprising Tetrahymena thermophila, Nostoc punctiforme, Bacillus anthracis, Tilletiopsis flava, Azoarcus, Pneumocystis carinii, Agrobacterium, T7-like bacteriophage, Bacteriophage T4, or combinations thereof.
6. The polynucleotide of any one of claims 2-5, wherein the self- splicing intron is a chimeric self-splicing intron, optionally derived from Tetrahymena thermophila and Tilletiopsis flava.
7. The polynucleotide of any one of claims 2-6, wherein the nucleotide sequence encoding the intron comprises the sequence of any one of SEQ ID NOS: 1-12 or a sequence comprising at least 85% identity to any one of SEQ ID NOS:1-12.
8. The polynucleotide of any one of claims 1-7, wherein the nucleotide sequence encoding the chimeric self-splicing intron comprises the sequence of SEQ ID NO:1 or a sequence comprising at least 85% identity to SEQ ID NO:1.
9. The polynucleotide of any one of claims 1-8, wherein the intron is a spliceosomal intron.
10. The polynucleotide of any one of claims 1-9, wherein a sequence to be targeted for mutation is 5’, 3’, or a combination thereof relative to the intron sequence, optionally wherein the sequence to be targeted for mutation is comprised in the heterologous sequence(s).
11. The polynucleotide of any one of claims 1-10, wherein the heterologous is native or non-native to the cell into which it will be introduced.
12. The polynucleotide of any one of claims 1-11, wherein the heterologous sequence comprises a repetitive element.
13. The polynucleotide of claim 12, wherein the repetitive element is native or non-native to the cell into which it will be introduced.
14. The polynucleotide of claims 11 or 12, wherein the repetitive element is selected from a ribosomal RNA (rRNA) gene or portion thereof, a tRNA gene or portion thereof, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, telomere, or a CRISPR array.
15. The polynucleotide of any one of claims 1-14 comprised in a plasmid or viral vector.
16. A cell comprising the polynucleotide of any one of claims 1-15.
17. The cell of claim 16, wherein the cell is a prokaryotic cell.
18. The cell of claim 17, wherein the cell is a bacterium.
19. The cell of claims 16 or 17, wherein the intron is not a spliceosomal intron.
20. The cell of claim 19, wherein the intron is derived from a Group I self-splicing intron.
21. The cell of claim 16, wherein the cell is a eukaryotic cell.
22. The cell of claim 16, wherein the intron is a spliceosomal or self- splicing intron.
23. The cell of any one of claims 16-22, wherein the polynucleotide is integrated into the genome of the cell.
24. The cell of any one of claims 16-23 comprising the intron in place of an endogenous intron of the cell.
25. An isolated cell comprising a polynucleotide encoding a self-splicing intron comprising the sequence of any one of SEQ ID NOS: 1-12 or a sequence comprising at least 85% identity to any one of SEQ ID NOS:1-12, optionally integrated into the genome of the cell.
26. The isolated cell of claim 25 comprising the sequence of SEQ ID NO:1 or a sequence comprising at least 85% identity to SEQ ID NO:1, optionally integrated into the genome of the cell.
27. A method of modifying the genome of a cell at one or more target sites comprising integrating a nucleotide sequence encoding an intron adjacent to each of the one or more target sites, and subsequently inducing incorporation of a donor oligonucleotide at each of the one or more target sites via a gene editing technology.
28. A method of modifying the genome of a cell comprising integrating one or more of the polynucleotides of any one of claims 1-15, and subsequently inducing incorporation of a donor oligonucleotide at or adjacent to the one or more sites of integration via a gene editing technology.
29. A method of modifying a nucleic acid in a cell comprising inducing incorporation of a donor oligonucleotide at or adjacent to the one or more sites of the polynucleotide in the cell of any one of claims 16-26 via a gene editing technology.
30. The method of any one of claims 27-29, wherein the donor oligonucleotide comprises one or more mutations relative to the target locus where it is incorporated.
31. The method of any one of claims 27-30, wherein at least a segment of the donor oligonucleotide is partially or completely homologous to the nucleotide sequence encoding the intron.
32. The method of any one of claims 27-31, wherein the donor oligonucleotide is ssDNA or dsDNA.
33. The method of any one of claims 27-32, wherein the intron is a self- splicing intron derived from one or more organisms selected from the group comprising Tetrahymena thermophila, Nostoc punctiforme, Bacillus anthracis, Tilletiopsis flava, Azoarcus, Pneumocystis carinii, Agrobacterium, T7-like bacteriophage, Bacteriophage T4, or combinations thereof.
34. The method of any one of claims 27-33, wherein the intron is a self- splicing intron that is a chimeric self- splicing intron, optionally derived from Tetrahymena thermophila and Tilletiopsis flava.
35. The method of claim 34, wherein the nucleotide sequence encoding the chimeric self-splicing intron comprises the sequence of any one of SEQ ID NOS: 1-12 or a sequence comprising at least 85% identity to any one of SEQ ID NOS:1-12.
36. The method of any one of claims 27-35, wherein the gene editing technology comprises one or more of a CRISPR system, multiplex automated genome engineering (MAGE), zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), triplex-forming oligonucleotides, pseudocomplementary oligonucleotides, intron encoded meganucleases small fragment homologous replacement, single- stranded oligodeoxynucleotide-mediated gene modification, and intron encoded meganucleases.
37. The method of any one of claims 27-36, wherein the gene editing technology is selected from a CRISPR system, MAGE, or a combination thereof.
38. The method of claims 36 or 37, wherein the CRISPR system comprises one or more sgRNAs and Cas protein optionally wherein the Cas protein is Cas9.
39. The method of any one of claims 36-38, wherein the CRISPR system comprises prime editing or base editing.
40. The method of any one of claims 36-39, wherein the CRISPR system comprises a single- stranded DNA modifying enzyme.
41. The method of any one of claims 36-40, wherein the CRISPR system comprises an engineered reverse transcriptase fused to Cas9 nickase and a prime-editing guide RNA (pegRNA).
42. The method of any one of claims 27-41, wherein the one or more target sites is selected from a RNA (rRNA) gene, a tRNA gene, a microsatellite, a minisatellite, an insertion sequence (IS), a transposable element, a pseudogene, a prophage, a telomere, a CRISPR array, or another native or non-native repetitive element.
43. The method of any one of claims 27-42 comprising modifying the genome of the cell at two, three, four, five, six, seven, eight, nine, ten, or more target sites.
44. The method of claim 43, wherein the intron integrated adjacent to a first target site is distinct from the intron integrated adjacent to a second target site, optionally wherein the intron at each site is different from all of the other introns.
45. The method of claims 43 or 44, wherein the donor oligonucleotide incorporated at one or more target sites is selected from a plurality of identical oligonucleotides and/or wherein the donor oligonucleotide incorporated at one or more target sites is selected from a plurality of distinct oligonucleotides.
46. The method of claim 45, wherein incorporation of the donor oligonucleotides at two more sites rely on different gene editing technologies.
47. The method of claim 46, wherein incorporation the donor oligonucleotide at one or more target sites is mediated by the CRISPR system and/or incorporation of the donor oligonucleotide at one or more target sites is mediated by MAGE.
48. The method of claim 47, wherein the CRISPR system and MAGE are used in parallel or tandem.
49. A method of screening for one or more mutations that confer a desirable phenotype comprising modifying the genome of a plurality of cells by the method of any one of claims 27-48 and selecting for a cell exhibiting the desirable phenotype.
50. The method of claim 49, wherein the desirable phenotype is antibiotic resistance, wherein selection comprises exposing the plurality of cells to an effective amount of one or more antibiotics.
51. The method of any one of claims 27-50, wherein a donor oligonucleotide introduces a mutation into one or more ribosomal genes.
52. The method of claim 51, wherein a ribosome comprising the rRNA and/or protein encoded by the one or more ribosomal genes can accommodate synthetic, abiological monomers optionally non L-alpha- amino acids, noncanonical L-alpha-amino acids and/or D-amino acids, and optionally polymerize polymers formed therefrom.
53. An engineered bacterial ribosome comprising one or more mutations at nucleotides 2030-2034 and/or 2057-2061 in its 23S rRNA, wherein the one or more mutations are encoded by a sequence selected from TCACC, CGCCG, TAGCA, GCCTG, CATTG, AAGGT, ACCCG, TCCCG, GTACA, ATTCT, AATGT, and ACCGT.
54. The engineered ribosome of claim 53, wherein the mutation confers resistance to one or more antibiotics selected from Cholramphenicol, Erythromycin, Clindamycin, Lincomycin.
55. The engineered ribosome of claims 53 and 54 comprising a linker tethering the rRNA of the small subunit (16S rRNA) with the rRNA of the large subunit (23S rRNA).
56. A polynucleotide encoding the rRNAs of the engineered ribosome of any of claims 53-56, optionally wherein the polynucleotide is comprised in an expression vector.
57. A cell comprising the engineered ribosome of any of claims 53-55 and/or the polynucleotide of claim 56.
PCT/US2022/078446 2021-10-20 2022-10-20 Compositions and methods for targeted editing and evolution of repetitive genetic elements WO2023070043A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163257961P 2021-10-20 2021-10-20
US63/257,961 2021-10-20

Publications (1)

Publication Number Publication Date
WO2023070043A1 true WO2023070043A1 (en) 2023-04-27

Family

ID=84369765

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/078446 WO2023070043A1 (en) 2021-10-20 2022-10-20 Compositions and methods for targeted editing and evolution of repetitive genetic elements

Country Status (1)

Country Link
WO (1) WO2023070043A1 (en)

Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US150A (en) 1837-03-25 Island
US5436A (en) 1848-02-08 Air-heating furnace
US5356802A (en) 1992-04-03 1994-10-18 The Johns Hopkins University Functional domains in flavobacterium okeanokoites (FokI) restriction endonuclease
US5487994A (en) 1992-04-03 1996-01-30 The Johns Hopkins University Insertion and deletion mutants of FokI restriction endonuclease
WO1998053059A1 (en) 1997-05-23 1998-11-26 Medical Research Council Nucleic acid binding proteins
US6140081A (en) 1998-10-16 2000-10-31 The Scripps Research Institute Zinc finger binding domains for GNN
US6453242B1 (en) 1999-01-12 2002-09-17 Sangamo Biosciences, Inc. Selection of sites for targeting by zinc finger proteins and methods of designing zinc finger proteins to bind to preselected sites
US20020165356A1 (en) 2001-02-21 2002-11-07 The Scripps Research Institute Zinc finger binding domains for nucleotide sequence ANN
WO2003016496A2 (en) 2001-08-20 2003-02-27 The Scripps Research Institute Zinc finger binding domains for cnn
US6534261B1 (en) 1999-01-12 2003-03-18 Sangamo Biosciences, Inc. Regulation of endogenous gene expression in cells using zinc finger proteins
US6746838B1 (en) 1997-05-23 2004-06-08 Gendaq Limited Nucleic acid binding proteins
US20040197892A1 (en) 2001-04-04 2004-10-07 Michael Moore Composition binding polypeptides
US20050064474A1 (en) 2003-08-08 2005-03-24 Sangamo Biosciences, Inc. Methods and compositions for targeted cleavage and recombination
US20060188987A1 (en) 2003-08-08 2006-08-24 Dmitry Guschin Targeted deletion of cellular DNA sequences
WO2007014275A2 (en) 2005-07-26 2007-02-01 Sangamo Biosciences, Inc. Targeted integration and expression of exogenous nucleic acid sequences
US20070154989A1 (en) 2006-01-03 2007-07-05 The Scripps Research Institute Zinc finger domains specifically binding agc
US20070213269A1 (en) 2005-11-28 2007-09-13 The Scripps Research Institute Zinc finger binding domains for tnn
WO2007139898A2 (en) 2006-05-25 2007-12-06 Sangamo Biosciences, Inc. Variant foki cleavage half-domains
US20110145940A1 (en) 2009-12-10 2011-06-16 Voytas Daniel F Tal effector-mediated dna modification
US8153432B2 (en) 2006-10-25 2012-04-10 President And Fellows Of Harvard College Multiplex automated genome engineering
WO2013176772A1 (en) 2012-05-25 2013-11-28 The Regents Of The University Of California Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription
WO2014018423A2 (en) 2012-07-25 2014-01-30 The Broad Institute, Inc. Inducible dna binding proteins and genome perturbation tools and applications thereof
WO2015184283A1 (en) 2014-05-29 2015-12-03 Northwestern University Tethered ribosomes and methods of making and using thereof
WO2020219563A1 (en) * 2019-04-22 2020-10-29 TCR2 Therapeutics Inc. Compositions and methods for tcr reprogramming using fusion proteins

Patent Citations (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5436A (en) 1848-02-08 Air-heating furnace
US150A (en) 1837-03-25 Island
US5356802A (en) 1992-04-03 1994-10-18 The Johns Hopkins University Functional domains in flavobacterium okeanokoites (FokI) restriction endonuclease
US5487994A (en) 1992-04-03 1996-01-30 The Johns Hopkins University Insertion and deletion mutants of FokI restriction endonuclease
WO1998053059A1 (en) 1997-05-23 1998-11-26 Medical Research Council Nucleic acid binding proteins
US6866997B1 (en) 1997-05-23 2005-03-15 Gendaq Limited Nucleic acid binding proteins
US6746838B1 (en) 1997-05-23 2004-06-08 Gendaq Limited Nucleic acid binding proteins
US6610512B1 (en) 1998-10-16 2003-08-26 The Scripps Research Institute Zinc finger binding domains for GNN
US6140081A (en) 1998-10-16 2000-10-31 The Scripps Research Institute Zinc finger binding domains for GNN
US6453242B1 (en) 1999-01-12 2002-09-17 Sangamo Biosciences, Inc. Selection of sites for targeting by zinc finger proteins and methods of designing zinc finger proteins to bind to preselected sites
US6534261B1 (en) 1999-01-12 2003-03-18 Sangamo Biosciences, Inc. Regulation of endogenous gene expression in cells using zinc finger proteins
US20020165356A1 (en) 2001-02-21 2002-11-07 The Scripps Research Institute Zinc finger binding domains for nucleotide sequence ANN
US7067617B2 (en) 2001-02-21 2006-06-27 The Scripps Research Institute Zinc finger binding domains for nucleotide sequence ANN
US20040197892A1 (en) 2001-04-04 2004-10-07 Michael Moore Composition binding polypeptides
WO2003016496A2 (en) 2001-08-20 2003-02-27 The Scripps Research Institute Zinc finger binding domains for cnn
US20050064474A1 (en) 2003-08-08 2005-03-24 Sangamo Biosciences, Inc. Methods and compositions for targeted cleavage and recombination
US20060188987A1 (en) 2003-08-08 2006-08-24 Dmitry Guschin Targeted deletion of cellular DNA sequences
WO2007014275A2 (en) 2005-07-26 2007-02-01 Sangamo Biosciences, Inc. Targeted integration and expression of exogenous nucleic acid sequences
US20070213269A1 (en) 2005-11-28 2007-09-13 The Scripps Research Institute Zinc finger binding domains for tnn
US20070154989A1 (en) 2006-01-03 2007-07-05 The Scripps Research Institute Zinc finger domains specifically binding agc
WO2007139898A2 (en) 2006-05-25 2007-12-06 Sangamo Biosciences, Inc. Variant foki cleavage half-domains
US20080131962A1 (en) 2006-05-25 2008-06-05 Sangamo Biosciences, Inc. Engineered cleavage half-domains
US8153432B2 (en) 2006-10-25 2012-04-10 President And Fellows Of Harvard College Multiplex automated genome engineering
US20110145940A1 (en) 2009-12-10 2011-06-16 Voytas Daniel F Tal effector-mediated dna modification
WO2011072246A2 (en) 2009-12-10 2011-06-16 Regents Of The University Of Minnesota Tal effector-mediated dna modification
WO2013176772A1 (en) 2012-05-25 2013-11-28 The Regents Of The University Of California Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription
WO2014018423A2 (en) 2012-07-25 2014-01-30 The Broad Institute, Inc. Inducible dna binding proteins and genome perturbation tools and applications thereof
WO2015184283A1 (en) 2014-05-29 2015-12-03 Northwestern University Tethered ribosomes and methods of making and using thereof
WO2020219563A1 (en) * 2019-04-22 2020-10-29 TCR2 Therapeutics Inc. Compositions and methods for tcr reprogramming using fusion proteins

Non-Patent Citations (60)

* Cited by examiner, † Cited by third party
Title
AMIRAM ET AL., NATURE BIOTECHNOLOGY, vol. 33, 2015, pages 1272
ANZALONE ET AL., NATURE, vol. 576, 2019, pages 149 - 157
ARENZ S ET AL., COLD SPRING HARB PERSPECT MED., vol. 6, no. 9, 2016, pages a025361
BARBIERI ET AL., CELL, vol. 171, 2017, pages 1453 - 1467
BARTLETT JOANNE G. ET AL: "Intron-mediated enhancement as a method for increasing transgene expression levels in barley", PLANT BIOTECHNOLOGY JOURNAL, vol. 7, no. 9, 1 December 2009 (2009-12-01), GB, pages 856 - 866, XP093025552, ISSN: 1467-7644, DOI: 10.1111/j.1467-7652.2009.00448.x *
BEERLI ET AL., NATURE BIOTECHNOL., vol. 20, 2002, pages 135 - 141
CARLSON ERIK D. ET AL: "Engineered ribosomes with tethered subunits for expanding biological function", NATURE COMMUNICATIONS, vol. 10, no. 1, 1 December 2019 (2019-12-01), pages 1 - 13, XP055873368, Retrieved from the Internet <URL:https://www.nature.com/articles/s41467-019-11427-y.pdf> DOI: 10.1038/s41467-019-11427-y *
CARLSON ET AL., NATURE COMMUNICATIONS, vol. 10, 2019, pages 1 - 13
CARR ET AL., NUCLEIC ACIDS RESEARCH, vol. 40, no. 17, 2012, pages el32
CERMAK ET AL., NUCL. ACIDS RES, 2011, pages 1 - 11
CHOO ET AL., CURR. OPIN. STRUCT. BIOL., vol. 10, 2000, pages 411 - 416
COSTANTINO, PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, vol. 100, 2003, pages 15748 - 15753
DE KONING ET AL., PLOS GENETICS, vol. 7, 2011
DEDKOVA ET AL., BIOCHEMISTRY, vol. 51, 2012, pages 401 - 415
DOUDNASZOSTAK, MOLECULAR AND CELLULAR BIOLOGY, vol. 9, 1989, pages 5480 - 5483
DOUTHWAITE, JOURNAL OF BACTERIOLOGY, vol. 174, 1992, pages 1333 - 1338
ESVELT ET AL., NATURE, vol. 472, 2011, pages 499 - 503
ETTAYEBI ET AL., JOURNAL OF BACTERIOLOGY, vol. 162, 1985, pages 551 - 557
GAJ ET AL., TRENDS IN BIOTECHNOLOGY, vol. 31, 2013, pages 397 - 405
GAUDELLI ET AL., NATURE, vol. 551, 2017, pages 464 - 471
GREGG ET AL., NUCLEIC ACIDS RESEARCH, vol. 42, no. 7, 2014, pages 4779 - 90
GUOCECH, RNA, vol. 8, 2002, pages 647 - 658
HALPERIN ET AL., NATURE, vol. 564, 2018, pages 444 - 448
HEDBERG A. ET AL., MOB DNA, vol. 4, no. 1, 2013, pages 17
HSU ET AL., CELL, vol. 157, 2014, pages 1262 - 1278
HUIBOER, PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, vol. 84, 1987, pages 4762 - 4766
ISALAN ET AL., NATURE BIOTECHNOL., vol. 19, 2001, pages 656 - 660
JINEK ET AL., SCIENCE, vol. 337, no. 6096, 2012, pages 816 - 21
KANTOR ET AL., INT J MOL SCI., vol. 21, no. 17, September 2020 (2020-09-01), pages 6240
KIM ET AL., J. BIOL. CHEM., vol. 269, 1994, pages 978 - 31
KIM ET AL., PROC. NATL. ACAD. SCI. USA., vol. 91, 1994, pages 883 - 887
KIMKIM, NATURE REVIEWS GENETICS, vol. 15, 2014, pages 321 - 334
KRUGER ET AL., CELL, vol. 31, 1982, pages 147 - 157
LAJOIE ET AL., SCIENCE, vol. 342, no. 6121, 2013, pages 357 - 360
LAMMBOWITZ AM ET AL., COLD SPRING HARB PERSPECT BIOL., vol. 3, no. 8, 2011, pages a003616
LI ET AL., PROC. NATL. ACAD. SCI. USA, vol. 90, 1993, pages 2764 - 2768
LI ET AL., PROC., NATL. ACAD. SCI. USA, vol. 89, 1992, pages 4275 - 4279
MAINI ET AL., JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, vol. 137, 2015, pages 11206 - 11209
MELO CZEKSTER ET AL., JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, vol. 138, 2016, pages 1962 - 1969
MERUTKASTELLWAGEN, BIOCHEMISTRY, vol. 30, 1991, pages 4245 - 8
MILLER ET AL., NATURE BIOTECHNOL, vol. 29, 2011, pages 143
OHTA ET AL., CURRENT OPINION IN CHEMICAL BIOLOGY, vol. 12, 2008, pages 159 - 167
ORELLE, C. ET AL., NATURE, vol. 524, 2015, pages 119 - 124
PABO ET AL., ANN. REV. BIOCHEM., vol. 70, 2001, pages 313 - 340
POSFAI ET AL., SCIENCE, vol. 312, 2006, pages 1044 - 1046
POVERENNAYAROYTBERG: "Spliceosomal Introns: Features, Functions, and Evolution", BIOCHEMISTRY (MOSCOW), vol. 85, 2020, pages 725 - 734, XP037190760, DOI: 10.1134/S0006297920070019
RACKHAMCHIN, NAT CHEM BIOL, vol. 1, 2005, pages 159 - 166
RACKHAMCHIN, NAT, vol. 1, 2005, pages 159 - 166
RADFORD ET AL.: "Targeted editing and evolution of engineered ribosomes in vivo by filtered editing", NAT COMMUN., vol. 13, no. 1, 10 January 2022 (2022-01-10), pages 180
RADFORD FELIX ET AL: "Targeted editing and evolution of engineered ribosomes in vivo by filtered editing", NATURE COMMUNICATIONS, vol. 13, no. 1, 10 January 2022 (2022-01-10), XP093025604, Retrieved from the Internet <URL:https://www.nature.com/articles/s41467-021-27836-x.pdf> DOI: 10.1038/s41467-021-27836-x *
ROBERTS ET AL., NUCLEIC ACIDS RES., vol. 31, 2003, pages 418 - 420
SAPRANAUSKAS ET AL., NUCLEIC ACIDS RESEARCH, vol. 39, 2011, pages 9275 - 9282
SEGAL ET AL., CURR. OPIN. BIOTECHNOL., vol. 12, 2001, pages 632 - 637
ST-PIERRE, F.O. ET AL., ACS SYNTHETIC BIOLOGY, vol. 2, 2013, pages 537 - 541
WAH ET AL., PROC. NATL. ACAD. SCI. USA, vol. 95, 1998, pages 10564 - 10569
WANG, H.H. ET AL., NATURE, vol. 460, 2009, pages 894 - 898
YAN ET AL., BIOCHEMISTRY, vol. 46, 2007, pages 8517 - 24
YOUNGSCHULTZ, ACS CHEMICAL BIOLOGY, vol. 13, 2018, pages 854 - 870
ZARRINKAR ET AL., NUCLEIC ACIDS RESEARCH, vol. 24, 1996, pages 854 - 858
ZHANG ET AL., RNA, vol. 1, 1995, pages 284

Similar Documents

Publication Publication Date Title
US11345933B2 (en) CRISPR enabled multiplexed genome engineering
Simon et al. Synthetic evolution
Wannier et al. Recombineering and MAGE
Tee et al. Polishing the craft of genetic diversity creation in directed evolution
Wang et al. An improved recombineering approach by adding RecA to λ red recombination
EP3204513A2 (en) Crispr oligonucleotides and gene editing
WO2018031950A1 (en) Protein engineering methods
US20200370035A1 (en) Methods for in vitro site-directed mutagenesis using gene editing technologies
WO2015052231A2 (en) Multiplex editing system
WO2011053957A2 (en) Compositions and methods for the regulation of multiple genes of interest in a cell
JP2020519304A (en) New method for direct cloning of large genomic fragments and construction of DNA multi-molecules
Kannan et al. One step engineering of the small-subunit ribosomal RNA using CRISPR/Cas9
WO2015168600A2 (en) Methods and apparatus for transformation of naturally competent cells
Meers et al. Transposon-encoded nucleases use guide RNAs to selfishly bias their inheritance
US11608570B2 (en) Targeted in situ protein diversification by site directed DNA cleavage and repair
WO2023070043A1 (en) Compositions and methods for targeted editing and evolution of repetitive genetic elements
CN111386343A (en) Methods for Kluyveromyces host cell genomic integration
Sengupta et al. CRISPR-Cas mediated genome engineering of cyanobacteria
US11859172B2 (en) Programmable and portable CRISPR-Cas transcriptional activation in bacteria
Gelsinger et al. Bacterial genome engineering using CRISPR RNA-guided transposases
Sung et al. Scarless chromosomal gene knockout methods
van den Brink et al. MOSAIC: a highly efficient, one-step recombineering approach to plasmid editing and diversification
CA3221684A1 (en) Crispr-transposon systems for dna modification
WO2024038003A1 (en) Methods and systems for generating nucleic acid diversity in crispr-associated genes
Bennis et al. Expanding the genome editing toolbox of Saccharomyces cerevisiae with the endonuclease Er Cas12a

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22817477

Country of ref document: EP

Kind code of ref document: A1