WO2022272293A1 - Compositions and methods for efficient retron production and genetic editing - Google Patents

Compositions and methods for efficient retron production and genetic editing Download PDF

Info

Publication number
WO2022272293A1
WO2022272293A1 PCT/US2022/073129 US2022073129W WO2022272293A1 WO 2022272293 A1 WO2022272293 A1 WO 2022272293A1 US 2022073129 W US2022073129 W US 2022073129W WO 2022272293 A1 WO2022272293 A1 WO 2022272293A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
nucleic acid
ribozyme
retron
stabilizing
Prior art date
Application number
PCT/US2022/073129
Other languages
French (fr)
Inventor
Kevin R. ROY
Justin D. Smith
Robert P. St. Onge
Lars M. Steinmetz
Original Assignee
The Board Of Trustees Of The Leland Stanford Junior University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Board Of Trustees Of The Leland Stanford Junior University filed Critical The Board Of Trustees Of The Leland Stanford Junior University
Publication of WO2022272293A1 publication Critical patent/WO2022272293A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/907Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/30Chemical structure
    • C12N2310/35Nature of the modification
    • C12N2310/351Conjugate
    • C12N2310/3519Fusion with another nucleic acid

Definitions

  • Retrons are two-component systems found in the genomes of many prokaryotic species and consist of a reverse transcriptase and a unique single-stranded DNA/RNA hybrid called multicopy single-stranded DNA (msDNA).
  • msDNA multicopy single-stranded DNA
  • Previous work in bacteria, yeast, and mammalian cells has shown that retron systems can function outside of their native prokaryotic hosts and that they can be reprogrammed to produce single-stranded deoxyribonucleic acid (ssDNA) of any desired sequence.
  • This sequence can be inserted into a loop region of the msd component of the retron non-coding ribonucleic acid (RNA) region, which is converted into ssDNA by the retron reverse transcriptase.
  • HHR hammerhead ribozyme
  • HDR Homology-directed repair
  • Retron donor-guide constructs can be purchased in pooled format from array synthesis providers (e.g. Twist Bioscience, South San Francisco, CA; Agilent Technologies, Inc., Santa Clara, CA), where each donor-guide pair cost >3 orders of magnitude less than the Megamer ssDNA equivalent.
  • array synthesis providers e.g. Twist Bioscience, South San Francisco, CA; Agilent Technologies, Inc., Santa Clara, CA
  • delivering ssDNA into the cell requires it to first pass through the cytoplasm. It is known that in many cell types, excessive cytoplasmic DNA can trigger a viral immune response and lead to apoptosis. Delivery of the retron as a vector would he less proue to cause this response, as the msDNA would only be produced once the retron vector has entered the nucleus.
  • retron complementary DNA also known as the msd or reverse transcribed region of the msDNA
  • cDNA retron complementary DNA
  • RNA-hindhig proteins RNA-hindhig proteins
  • other RNAs such as spliceosomal RNAs or microRNAs
  • Retrons could be used for single-cell lineage tracing experiments, where the msd loop region in the retron could be mutagenized by some random process such as CRISPR- AID (Hess et ai., Nat Methods, 2016 PMCID: PMC5557288), The retron would amplify the barcode levels in each cell through accumulation of many cDNA copies, which could be assayed by amplicon polymerase chain reaction (PCR) at the single-cell or bulk level and coupled to next generation sequencing (NGS) for barcode counting.
  • CRISPR- AID Hess et ai., Nat Methods, 2016 PMCID: PMC5557288
  • the instant disclosure provides efficient approaches to generate substantial levels of engineered retron msDNA in eukaryotic cells, and for increasing the efficiency of genetic editing in cells.
  • nucleic acids encoding a retron that include (i) a stabilizing 5' ribozyme sequence, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, wherein said nucleic acid does not include a guide RNA region.
  • the nucleic acids include a donor sequence for homology directed repair (HDR).
  • the nucleic acids include a 3' ribozyme sequence
  • the 5' ribozyme sequence or sequences are selected from the group consisting of Hammerhead Ribozyme, HDV ribozyme, RiboJ, CPEB3, Agam_1__1, Agam_2_2, Pmar_1, Bflo_1, Bflo_2, 8pur__11 Spur_2, Spur_3, Spur_4, Ppac_1, Cjap_1, Fpra_1, CIV_J , Dpap__1, Tatr_1, CPEB3, G HDV, A HDV, Canis familiaris/i/3 73, Felis catus domestic cat/ 1/3 74,
  • PongoAbelii_SumatranOrangutan//l 66 MicrocebusMurinus_MouseLemur/l/l 66, TupaiaBelangeri_NorthemTreesh/l 66, Rabbit/84/4 75, Human .Chr 10/290/4 75, Chimp PanTroglodytes/49/4 75, Rhesus/23/4 75, MacacajMulatta/1/1 70, SorexAraneus_CommonShrew/l/l 66, Mouse .chr 19_CPEB3/70/4 75, Rat.
  • the 3’ ribozyme is a Hammerhead ribozyme, HD V, RiboJ, or CPEB3.
  • nucleic acid furthers comprises a stem-loop sequence located between the stabilizing 5' ribozyme sequence and the msr sequence.
  • nucleic acids encoding a retron that include (i) a stem-loop stabilizing 5' ribozyme sequence, (ii) an msr sequence, (hi) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, and a guide RNA region.
  • the nucleic acids include a donor sequence for homology directed repair (HDR).
  • the subject expression sequence comprises a donor sequence for homology directed repair (HDR).
  • the stem loop-stabilizing 5' ribozyme sequence is HDV or RiboJ.
  • the nucleic acids include a 3' ribozyme sequence.
  • the 3' ribozyme sequence is HDV or RiboJ.
  • nucleic acids encoding a retron that include (i) a stabilizing 5 ’-end sequence -specific RNA cleavage site sequence, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, in some embodiments, the first inverted repeat sequence is located at the 5" end of the retron and the second inverted repeat sequence is located at the 3’ end of the retron.
  • the entire retron is flanked by the first and second inverted repeat sequences.
  • the nucleic acid does not comprise a guide RNA coding region.
  • the nucleic acids include a donor sequence for homology directed repair (HDR).
  • tire nucleic acids include a 3' ribozyme sequence.
  • the nucleic acids comprise a 3' stabilizing stem-loop structure or 3' ribozyme sequence which leaves behind a stabilizing RNA structure.
  • the stabilizing 5'-end sequence specific RNA cleavage site sequence is an RNase III target motif.
  • cleavage of the stabilizing 5 '-end sequence specific RNA cleavage site sequence results in a stabilizing structure 3’ of the cleavage site that is attached to the 5’ end of the msr sequence.
  • any of the above nucleic acids can further include a structure-forming nucleic acid within the msd sequence.
  • sequence specific endonuclease is a CRISPR-associated endonuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease
  • the sequence specific endonuclease cuts both strands of a target DNA sequence, thereby generating a double-strand break in the DNA sequence.
  • the sequence specific endonuclease cuts a single strand of a target DNA sequence, thereby generating a nick in one strand of the DNA sequence.
  • the method further comprises contacting the cell with a guide RNA or a nucleic acid encoding the same, and the sequence specific endonuclease is a CRISPR-associated endonucl ease .
  • the sequence specific endonuclease is a CRISPR-associated endonuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease
  • the method further comprises administering a guide RNA or a nucleic acid encoding the same to the subject, and the sequence specific endonuclease is a CRISPR-associated endonuclease.
  • the CRISPR-associated endonuclease can be Cast, Cas1B, Cas2, C2c1, Cas3, Cas4, Cas5, CasSe (CasD), Cas6, Cas6e, Cas6f, Cas7, CasBa1, Cas8a2, CasSb, CasSe, Cas9 (Csn1 or Csx12), SpCas9, Fokl- dCas9, Cas10, Cas10d, Cast 2, Cas12a, Mad 7TM, CasX, CasY, Cas ⁇ >, CasF, CasG, CasH, Csy1, Csy2, Csy3, Cse1 (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Cs
  • nucleic acid of the disclosure for use as a medicament in treating a genetic disease in a subject. Also provided is a nucleic acid of the disclosure for use in the treatment of a genetic disease in a subject.
  • nucleic acids encoding (a) a retron comprising: (i) a stabilizing 5' sequence; (ii) an msr sequence; (iii) au msd sequence; (iv) a subject expression sequence within the msd sequence; and (v) a first inverted repeat sequence and a second inverted repeat sequence; and (b) optionally, a guide RNA region.
  • the subject expression sequence comprises a donor sequence for homology directed repair (HDR).
  • the stabilizing 5' sequence comprises a stable RNA structure or a G -quadruples.
  • tire nucleic acid further comprises a 3 ’ ribozyme sequence, a 3’ stabilizing stem-loop structure or 3' ribozyme sequence which leaves behind a stabilizing RNA structure.
  • the 3' ribozyme sequence is HDV or RiboJ.
  • the nucleic acid does not comprise a guide RNA region.
  • a panel of ribozymes on the 5' and 3' side of the retron donor- guide cassete were tested, including no ribozyme (none), the hammerhead ribozyme (HHR), the hepatitis delta virus (HDV) ribozyme, the anti-genomic HDV (agFIDV) ribozyme, the U5 small nuclear ribonucleic acid (snRNA) stem-looop cleaved by the yeast RNase III enzyme Rntlp (U5 Rntlp SL), and the RiboJ ribozyme, which is a HHR-related ribozyme from the satellite R.NA of tobacco ringspot virus (sTRSV) followed by a 23 nucleotide synthetic stem loop (SL).
  • sTRSV tobacco ringspot virus
  • SL 23 nucleotide synthetic stem loop
  • FIG. IB The location of ribozymes in relation to the 5' cap and 3' po!yadenylic acid (polyA) tail.
  • polyA polyadenylic acid
  • the retron RNA is reverse transcribed by the reverse transcriptase (RT) at a conserved guanosine (G) to create an unusual 2 '-5' RNA-DNA branched structure.
  • RT reverse transcriptase
  • G conserved guanosine
  • the RNA in the RNA-DNA hybrid generated by the RT is degraded by host-cell RNAse H, leaving a !ooped-out single-stranded donor deoxyribonucleic acid (DNA) as a template for homology directed repair.
  • FIGS. 2A-2D show the impact of 5' and 3' ribozymes on editing efficiency and retron complementary DNA (cDNA) production.
  • FIG. 2A shows editing efficiency for all combinations of HDV, HHR, and no ribozyme in the 5' and 3' positions of a retron donor-guide cassette targeting the ADE2 locus.
  • the guide is an 18-mer with designed mismatches at positions 19 and 20 from the protospacer adjacent motif (PAM).
  • PAM protospacer adjacent motif
  • the retron donor introduces a CC-to-TG mutation which results in a premature termination codon.
  • the y-axis indicates the editing efficiency quantified as the % of genomic reads mapping to the designed donor sequence.
  • the x-axis is the number of generations in galactose, which induces Cas9, the RT, and the retron-donor guide transcript.
  • FIG. 2B shows editing efficiency for ail possible ribozymes in the 5' position paired with the HDV fixed in the 3' position.
  • FIG. 2C shows a diagram illustrating next-generation sequencing (NGS)-based quantification of retron donor cDNA levels in the absence of Cas9, Primers are designed to amplify both the single stranded donor template and the genomic target.
  • NGS next-generation sequencing
  • Hie donor encodes a CC-to-TG mutation in the middle (asterisk), so the ratio of reads containing the donor mutation relative to the wild type (WT) genomic locus is proportional to the ratio of donor cDNA to genome copies.
  • FIG. 2D shows the ribozyme combinations sorted left to right by greatest to least retron cDNA produced in galactose.
  • FIGS. 3A-3C show the impact of donor strand and RNA structures on retron msDNA levels and the HDK efficacy of retron donor, in FIG. 3A, the location of different RNA structure elements is shown relative to the terminal inverted repeats (triangles), msr, msd , and donor elements of the retron (left column), with how these structures fold up in the retron RNA (middle column) and msDNA (right column).
  • FIG. 3B shows the impact of each of these structures on retron cDNA levels quantified by NGS.
  • cDNA levels were tested with the RT under the control of the GAL 10 promoter and its no RT control (top two panels), or the ADffl promoter and its no RT control (bottom two panels), in either galactose (left) or glucose (right).
  • the retron donor was expressed from the SNR52-tRNA(Tyr) hybrid promoter with a 5' HDV ribozyme.
  • FIG. 3C the panel of LexA RNA and DMA structures was tested in a multiplexed competition experiment, where barcodes were inserted at the CAN! locus adjacent to two guide RNA target sites denoted as “+” and according to the strandedness of the guide RNAs.
  • Plasmids harboring each LexA structure variant along with either Cas9 only (left) or Cas9 + RT (right) were transformed into distinct barcoded strains such that the barcode uniquely identifies the combination of donor and Cas9 plasmid.
  • This approach allowed all strains to be pooled together equally prior to transforming a third plasmid containing only the guide RNA. Both strands of donor were tested and indicated by (+) for donors which bind to the bottom genomic strand (same as the strand to which the “+” guide binds) or (-) for donors binding to the top strand (same as the strand to which the guide binds).
  • both the barcode and edit site are sequenced, enabling quantitation of both editing efficiency (ratio of donor to WT on y-axis) and editing survival (total abundance on y-axis).
  • different lengths of 20 nt or 17 nt were used for the guide RNA from the constitutive small nucleolar RNA (SNR52) promoter.
  • SNR52 small nucleolar RNA
  • the RNase P Ribonucieoprotein (RPRI) promoter generates a stable leader sequence on the guide RNA while the SNR52 leader is efficiently processed by yeast RNase PI Rntlp. Hie fact that the RPRI 18mer shows lower efficiency than the SNR52 17mer clearly demonstrates the negative impact of extraneous 5' sequence on guide activity.
  • FIGS. 4A-4C show' retron production in human cells.
  • FIG. 4A show's transfections of HEK293 cells with 250 ng of retron donor/RT plasmid.
  • a donor DNA sequence to introduce an Xbal edit in the CACNA1D gene was inserted into the Ec86 (Ecol) retron msd loop and expressed with the Cytomegalovirus (CMV) promoter, a 5' HDV ribozyme, a 3' processing element from the metastasis-associated lung adenocarcinoma transcript 1 (MALAX 1 ), and the poly-adeny!ation signal from the bovine growth hormone (bGH) gene.
  • CMV Cytomegalovirus
  • MALAX 1 metastasis-associated lung adenocarcinoma transcript 1
  • bGH bovine growth hormone
  • Human-codon optimized Ec86 (Ecol) reverse transcriptase from the Ec86 retron was expressed from the human elongation factor 1 alpha-encoding gene (EF-1 alpha) promoter with its first intron along with Puromycm resistance gene (PuroR) and enhanced green fluorescent protein (GFP) genes separated by P2A and T2A peptide cleavage sequences, respectively.
  • EF-1 alpha human elongation factor 1 alpha-encoding gene
  • PuroR Puromycm resistance gene
  • GFP enhanced green fluorescent protein
  • Xbal like most restriction enzymes, will only cleave double-stranded DNA, so even though the retron cDNA has an Xbal site it is single-stranded and will not be cleaved, and neither will the genomic DNA reference template. Primers flanking the Xbal site were used to quantify the template levels by two different methods, first with qPCR and then with NGS.
  • FIG. 4B shows quantification of retron cDNA levels by qPCR.
  • DNA was extracted from HEK293 ceils expressing the indicated retron construct with or without RT, Each DNA sample was normalized by Qubit dsDNA assay and analyzed by qPCR with a primer set amplifying either the CACNA1D donor (top) or the plasmid backbone (bottom) as a control.
  • a CACNAID retron donor with no RT, as well as a retron donor for a different gene were used (HEK3), in which case the primers only amplify’ the genomic CACNA1D.
  • Both the plasmid backbone primers and HEK3 irrelevant donor showed no change +/- Xbal, as expected.
  • Xbal treatment removed the variability due to residual plasmid observed in the mock treatment between replicates.
  • each cell has at least 4 strands of template for CACNAID, as the HEK293 cells are at least diploid for all chromosomes with some chromosomes being triploid, and with each allele having 2 strands.
  • the reiron donor is single-stranded, so subtracting the donor: genome ratio of -RT from the +RT gives 20, which must then be multiplied by 4 strands for the genomic alleles to yield approximately 80 retron cDNA copies per cell.
  • the present invention provides compositions and methods for high-throughput genome editing and screening.
  • the invention provides methods comprising the use of retrons and retron-guide RNA cassettes, vectors comprising said cassettes, and retron donor DNA-guide molecules of the present invention to modify nucleic acids of interest at target loci of interest, and to screen genetic loci of interest, in the genomes of host cells.
  • the present invention also provides compositions and methods for preventing or treating genetic diseases by enhancing precise genome editing to correct a mutation in target genes associated with the diseases.
  • Kits for genome editing and screening are also provided.
  • the present invention can be used with any cell type and at any gene locus that is amenable to nuclease- mediated genome editing technology.
  • nucleic acids sizes are given in either kilobases (kb), base pairs (bp), or nucleotides (nt). Sizes of single-stranded DNA and/or RNA can be given in nucleotides. These are estimates derived from agarose or acrylamide gel electrophoresis, from sequenced nucleic acids, or from published DNA sequences. For proteins, sizes are given in kilodaltons (kDa) or amino acid residue numbers. Protein sizes are estimated from gel electrophoresis, from sequenced proteins, from derived amino acid sequences, or from published protein sequences.
  • Oligonucleotides that are not commercially available can be chemically synthesized, e.g., according to the solid phase phosphoramidite triester method first described by Beaucage and Caruthers, Tetrahedron Let, 22: 1859-1862 (1981), using an automated synthesizer, as described in Van Devan ter et. ah. Nucleic Acids Res. 12:6159-6168 (1984). Purification of oligonucleotides is performed using any art-recognized strategy, e.g., native acrylamide gel electrophoresis or anion-exchange high performance liquid chromatography (HPLC) as described in Pearson and Reamer, J, Chrom. 255: 137-149 (1983),
  • HPLC high performance liquid chromatography
  • the term “about” means a range of values including the specified value, which a person of ordinary ' skill in the art would consider reasonably similar to the specified value. In embodiments, about means within a standard deviation using measurements generally acceptable in the art. In embodiments, about means a range extending to +/- 10% of the specified value (e.g., +/- 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, or 10% of the specified value). In embodiments, about means the specified value.
  • the term “genome editing” refers to a type of genetic engineering in which DNA is inserted, replaced, or removed from a target DNA (e.g., the genome of a cell) using one or more nucleases and/or nickases.
  • Tire nucleases create specific double-strand breaks (DSBs) at desired locations in the genome and harness the cell's endogenous mechanisms to repair the induced break by homology-directed repair (HDR) (e.g., homologous recombination) or by nonhomologous end joining (NHEJ).
  • HDR homology-directed repair
  • NHEJ nonhomologous end joining
  • two nickases can be used to create two single -strand breaks on opposite strands of a target DNA, thereby generating a blunt or a sticky end.
  • Any suitable DNA nuclease can be introduced into a cell to induce genome editing of a target DNA sequence.
  • the term “retron” is used in accordance with its plain ordinary' meaning and refers to a DNA sequence found in the genome of many bacteria species that codes for reverse transcriptase and a unique single-stranded ONA/RNA hybrid called multicopy single-stranded DNA (msDNA).
  • the Retron msr-msd RNA is the non-coding RNA produced by retron elements and is the immediate precursor to the synthesis of msDNA. The retron msr RNA folds into a characteristic secondary structure that contains a conserved guanosine residue at the end of a stem loop.
  • RNA/RNA chimera which is composed of small single-stranded DNA linked to small single-stranded RNA.
  • the RNA strand is joined to the 5' end of the DNA chain via a 2.'— 5' phosphodiester linkage that occurs from the 2' position of the conserved internal guanosine residue.
  • the retron operon carries a promoter sequence P that controls the synthesis of an RNA transcript carrying three loci: msr, nisei, and ret. Hie ret gene product, a reverse transcriptase, processes the msd/msr portion of the RNA transcript into msDNA.
  • Retron elements are about 2 kb long. They contain a single operon controlling the synthesis of an RNA transcript carrying three loci, msr, rnsd , and ret, that are involved in msDNA synthesis.
  • the DNA portion of msDNA is encoded by the msd region
  • the RNA portion is encoded by the msr region
  • the product of the ref open-reading frame is a reverse transcriptase similar to the RTs produced by retroviruses and other types of retroelements.
  • the retron RT contains seven regions of conserved amino acids, including a highly conserved tyr-ala-asp-asp (YADD) sequence associated with the catalytic core.
  • the ret gene product is responsible for processing the msd/msr portion of the RNA transcript into msDNA.
  • reverse transcriptase refers to its plain and ordinary meaning as an enzyme used to generate complementary DNA (cDNA) from an RNA template, a process termed reverse transcription.
  • the terms “complementary” or “complementarity” refers to polynucleotides that are able to form base pairs with one another. Base pairs are typically formed by hydrogen bonds between nucleotide units in an anti-parallel orientation between polynucleotide strands. Complementary' polynucleotide strands can base pair in a Watson- Crick manner (e.g., A to T, A to U, C to G), or in any other manner that allows for the formation of duplexes. As persons skilled in the art are aware, when using RNA as opposed to DNA, uracil (U) rather than thymine (T) is the base that is considered to be complementary to adenosine.
  • uracil when a uracil is denoted in the context of the present disclosure, the ability to substitute a thymine is implied, unless otherwise stated.
  • “Complementarity” may exist between two RNA strands, two DNA strands, or between a RNA strand and a DNA strand, it is generally understood that two or more polynucleotides may be “complementary” and able to form a duplex despite having less than perfect or less than 100% complementarity.
  • Two sequences are "perfectly complementary” or “100% complementary” if at least a contiguous portion of each polynucleotide sequence, comprising a region of complementarity, perfectly base pairs with the other polynucleotide without any mismatches or interruptions within such region.
  • Two or more sequences are considered “perfectly complementary” or “100% complementary” even if either or both polynucleotides contain additional non-complementary sequences as long as the contiguous region of complementarity within each polynucleotide is able to perfectly hybridize with the other.
  • "Less than perfect" complementarity refers to situations where less than all of the contiguous nucleotides within such region of complementarity' are able to base pair with each other.
  • a gRNA may comprise a sequence "complementary" to a target sequence (e.g., major or minor allele), capable of sufficient base-pairing to form a duplex (i.e., the gRNA hybridizes with the target sequence). Additionally, the gRNA may comprise a sequence complementary' to a sequence adjacent to a PAM sequence, wherein the gRNA also hybridizes with the sequence adjacent to a PAN! sequence in a target DNA.
  • hybridize and “hybridization” refer to the formation of complexes between nucleotide sequences which are sufficiently complementary to form duplexes via Watson-Crick base pairing.
  • DNA nuclease refers to an enzyme capable of cleaving the phosphodiester bonds between the nucleotide subunits of DNA and may be an endonuclease or an exonuclease.
  • the DNA nuclease may be an engineered (e.g., programmable or targetable) DNA nuclease which can be used to induce genome editing of a target DNA sequence.
  • Any suitable DNA nuclease can be used including, but not limited to, CRISPR-associated protein (Cas) nucleases, other endo- or exo-nucleases, variants thereof, fragments thereof, and combinations thereof.
  • Cas CRISPR-associated protein
  • double-strand break or “double-strand cut” refers to the severing or cleavage of both strands of the DNA double helix.
  • the DSB may result in cleavage of both stands at the same posi tion leading to “blunt ends” or staggered cleavage resulting in a region of single -stranded DNA at the end of each DNA fragment, or “sticky ends”.
  • a DSB may arise from the action of one or more DNA nucleases.
  • nonhomologous end joining or “NHEJ” refers to a pathway that repairs double -strand DMA breaks in which the break ends are directly ligated without the need for a homologous template.
  • HDR homologous recombination
  • nucleic acid refers to deoxyribonucleic acids (DNA), ribonucleic acids (RNA) and polymers thereof in either single-, double- or multi- stranded form.
  • the term includes, but is not limited to, single-, double- or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and/or pyrimidine bases or other natural, chemically modified, biochemically modified, non-natural, synthetic or derivatized nucleotide bases, in some embodiments, a nucleic acid can comprise a mixture of DNA, RNA and analogs thereof.
  • nucleic acids containing known analogs of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions), alleles, orthoiogs, single nucleotide polymorphisms (SNPs), and complementary sequences as well as tire sequence explicitly indicated.
  • degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosme residues (Batzer et al., Nucleic Acid Res. 19:5081 (1991); Qhtsukaet al., J. Biol. Chem. 260:2605-2608 (1985); and Rossol ii et al., Mol. Cell. Probes 8:91-98 (1994)).
  • single nucleo tide polymorphism or “8NP” refers to a change of a single nucleotide within a polynucleotide, including within an allele. Tins can include the replacement of one nucleotide by another, as well as the deletion or insertion of a single nucleotide. Most typically, SNPs are bialielic markers although tri- and tetra-alielic markers can also exist. By way of non-limiting example, a nucleic acid molecule comprising SNP A ⁇ C may include a C or A at the polymorphic position.
  • the term “gene” means the segment of DNA involved in producing a ribonucleic acid polymer, which in the case of protein coding genes can then be translated into a polypeptide chain.
  • the DNA segment may include regions preceding and following the coding region (leader and trailer) involved in the transcription/translation of the gene product and the regulation of the transcription/translation, as well as intervening sequences (interns) between individual coding segments (exons).
  • cassette refers to a combination of genetic sequence elements that may be introduced as a single element and may function together to achieve a desired result.
  • a cassette typically comprises polynucleotides in combinations that are not found in nature.
  • operably linked refers to two or more genetic elements, such as a polynucleotide coding sequence and a promoter, placed in relative positions that permit the proper biological functioning of the elements, such as the promoter directing transcription of the coding sequence.
  • inducible promoter refers to a promoter that responds to environmental factors and/or external stimuli that can be artificially controlled in order to modify the expression of, or the level of expression of, a polynucleotide sequence or refers to a combination of elements, for example an exogenous promoter and an additional element such as a trans-activator operably linked to a separate promoter.
  • An inducible promoter may respond to abiotic factors such as oxygen levels or to chemical or biological molecules, in some embodiments, the chemical or biological molecules may be molecules not naturally present in humans.
  • vector and “expression vector” refer to a nucleic acid construct, generated reeombmantly or synthetically, with a series of specified nucleic acid elements that permit transcription of a particular polynucleotide sequence in a host cell .
  • An expression vector may be part of a plasmid, viral genome, or nucleic acid fragment.
  • an expression vector includes a polynucleotide to be transcribed, operably linked to a promoter.
  • promoter is used herein to refer to an array of nucleic acid control sequences that direct transcription of a nucleic acid.
  • a promoter includes necessary nucleic acid sequences near the start site of transcription, such as, in the case of a polymerase 11 type promoter, a TATA element,
  • a promoter also optionally includes distal enhancer or repressor elements, which can be located as much as several thousand base pairs from die start site of transcription.
  • Oilier elements that may be present in an expression vector include those that enhance transcription (e.g., enhancers) and terminate transcription (e.g., terminators).
  • Recombinant refers to a genetically modified polynucleotide, polypeptide, cell, tissue, or organism.
  • a recombinant polynucleotide or a copy or complement of a recombinant polynucleotide is one that has been manipulated using well known methods.
  • a recombinant expression cassette comprising a promoter operahly linked to a second polynucleotide can include a promoter that is heterologous to the second polynucleotide as the result of human manipulation (e.g., by methods described in Sambrook et ah, Molecular Cloning — A Laboratory' Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., (1989) or Current Protocols m Molecular Biology Volumes 1-3, John Wiley & Sons, Inc. (1994-1998)).
  • a recombinant expression cassette (or expression vector) typically comprises polynucleotides in combinations that are not found in nature.
  • recombinant protein is one that is expressed from a recombinant polynucleotide
  • recombinant cells, tissues, and organisms are those that comprise recombinant sequences (polynucleotide and/or polypeptide).
  • heterologous refers to biological material that is introduced, inserted, or incorporated into a recipient (e.g., host) organism that originates from another organism.
  • the heterologous material that is introduced into the recipient organism e.g., a host cell
  • Heterologous material can include, but is not limited to, nucleic acids, ammo acids, peptides, proteins, and structural elements such as genes, promoters, and cassettes,
  • a host cell can be, but is not limited to, a bacterium, a yeast cell, a mammalian cell, or a plant cell.
  • Tire introduction of heterologous material into a host cell or organism can result, in some instances, in the expression of additional heterologous material in or by the host cell or organism.
  • the transformation of a yeast host cell with an expression vector that contains DNA sequences encoding a bacterial protein may result in the expression of the bacterial protein by the yeast ceil.
  • the incorporation of heterologous material may be permanent or transient.
  • the expression of heterologous material may be permanent or transient.
  • reporter and “selectable marker” can be used interchangeably and refer to a gene product that permits a cell expressing that gene product to be identified and/or isolated from a mixed population of cells. Such isolation might be achieved through the selective killing of cells not expressing the selectable marker, which may be, as a non-limiting example, an antibiotic resistance gene.
  • the selectable marker may permit identification and/or subsequent isolation of cells expressing the marker as a result of the expression of a fluorescent protein such as GFP or the expression of a cell surface marker which permits isolation of cells by fluorescence-activated cell sorting (FACS), magnetic-activated cell sorting (MACS), or analogous methods.
  • FACS fluorescence-activated cell sorting
  • MCS magnetic-activated cell sorting
  • Suitable cell surface markers include CDS, CD 19, and truncated CD19.
  • ceil surface markers used for isolating desired cells are non-signaling molecules, such as subunit or truncated forms of CD8, CD 19, or CD20. Suitable markers and techniques are known in the art.
  • culture when referring to cell culture itself or the process of culturing, can be used interchangeably to mean that a cell (e.g., yeast cell) is maintained outside its normal environment under controlled conditions, e.g., under conditions suitable for survival.
  • a cell e.g., yeast cell
  • Cultured cells are allowed to survive, and culturing can result in cell growth, stasis, differentiation or division. The term does not imply that all cells in the culture survive, grow, or divide, as some may naturally die or senesce.
  • Cells are typically cultured in media, which can be changed during the course of the culture.
  • the terms “subject,” “individual,” and “patient” are used interchangeably herein to refer to a vertebrate, preferably a mammal, more preferably a human. Mammals include, but are not limited to, murines, simians, humans, faun animals, sport animals, and pets. Tissues, ceils and their progeny of a biological entity obtained in vivo or cultured in vitro are also encompassed.
  • administering includes oral administration, topical contact, administration as a suppository, intravenous, intraperitoneal, intramuscular, intralesionai, intrathecal, intranasal, or subcutaneous administration to a subject. Administration is by any route, including parenteral and transmueosal (e.g., buccal, sublingual, palatal, gingival, nasal, vaginal, rectal, or transdermal). Parenteral administration includes, e.g., intravenous, intramuscular, intra-arteriole, intradermal, subcutaneous, intraperitoneal, intraventricular, and intracranial.
  • Administering also refers to delivery of material, including biological material such as nucleic acids and/or proteins, into ceils by transformation, transfection, transduction, ballistic methods and/or electroporation.
  • treating refers to an approach for obtaining beneficial or desired results including, but not limited to, a therapeutic benefit and/or a prophylactic benefit.
  • therapeutic benefit is meant any therapeutically relevant improvement in or effect on one or more diseases, conditions, or symptoms under treatment.
  • compositions may be administered to a subject at risk of developing a particular disease, condition, or symptom, or to a subject reporting one or more of the physiological symptoms of a disease, even though the disease, condition, or symptom may not have yet been manifested.
  • the term “effective amount” or “sufficient amount” refers to the amount of an agent that is sufficient to effect beneficial or desired results.
  • the therapeutically effective amount may vary depending upon one or more of: the subject and disease condition being treated, the weight and age of the subject, the severity of the disease condition, the manner of administration and the like, which can readily be determined by one of ordinary’ skill in the art.
  • the specific amount may van ’ depending on one or more of: the particular agent chosen, the host cell type, the location of the host cell in the subject, the dosing regimen to be followed, whether it is administered in combination with other compounds, timing of administration, and the physical delivery system in which it is earned.
  • pharmaceutically acceptable carrier refers to a substance that aids the administration of an active agent to a cell, an organism, or a subject.
  • “Pharmaceutically acceptable earner” refers to a earner or excipient that can be included in the compositions of the invention and that causes no significant adverse toxicological effect on the patient.
  • Mon- limiting examples of pharmaceutically acceptable carrier include water, NaCl, normal saline solutions, lactated Ringer's, normal sucrose, normal glucose, cell culture media, and the like.
  • pharmaceutically acceptable carrier include water, NaCl, normal saline solutions, lactated Ringer's, normal sucrose, normal glucose, cell culture media, and the like.
  • a stabilizing 5' sequence specific RNA cleavage site sequence refers to a nucleic acid sequence 5' to a retron that, upon expression as an RNA, can be cleaved from the RNA and leaves a stabilizing sequence on the remaining retron.
  • a stabilizing sequence can be the cleavage product of the Hepatitis Delta Virus (HDV) ribozyme, a stabilizing stem loop structure, a stem loop with a highly stable tetraloop, such as a GNRA or UNCG tetraloop, or a pseudoknot.
  • HDV Hepatitis Delta Virus
  • a stabilizing 5' ribozyme sequence refers to a ribozyme 5' to a retron that, upon expression as an RNA, cleaves itself from the RNA and leaves a stabilizing sequence on the remaining retron.
  • a stabilizing sequence can be the cleavage product of the Hepatitis Delta Virus (HDV) ribozyme, a stabilizing stem loop structure, a stem loop with a highly stable tetraloop, such as a GNRA or UNCG tetraloop, or a pseudoknot.
  • HDV Hepatitis Delta Virus
  • a stem loop-stabilizing 5' ribozyme sequence refers to a ribozyme 5' to a retron that, upon expression as an RNA, cleaves itself from the RNA and leaves a stabilizing sequence such as a stabilizing stem loop structure.
  • a stabilizing sequence such as a stabilizing stem loop structure.
  • a 3' ribozyme sequence 5 refers to a ribozyme 3' to a retron.
  • Non-limiting examples include a Hammerhead ribozyme, HDV, RiboJ, or CPEB3.
  • Tlie tenn “ribozyme” refers to an RNA molecule that is capable of catalyzing a biochemical reaction, in some instances, ribozyrnes function in protein synthesis, catalyzing the linking of amino acids in the ribosome. In other instances, ribozyrnes participate in various other RNA processing functions, such as splicing, viral replication, and tRNA biosynthesis, in some instances, ribozyrnes can be self-cleaving.
  • Non-limiting examples of ribozyrnes include the HDV ribozyme, the Lariat capping ribozyme (formally called GIR1 branching ribozyme), the glmS ribozyme, group I and group II self-splicing introns, the hairpin ribozyme, the hammerhead ribozyme, various rRNA molecules, RNase P, the twister ribozyme, the VS ribozyme, the pistol ribozyme, and the hatchet ribozyme.
  • the HDV ribozyme the Lariat capping ribozyme (formally called GIR1 branching ribozyme)
  • the glmS ribozyme group I and group II self-splicing introns
  • the hairpin ribozyme the hammerhead ribozyme
  • various rRNA molecules RNase P
  • the twister ribozyme the VS ribozyme
  • ribozyme-containing R2 elements examples include the selfcleaving ribozyme-containing R2 elements, the LITc retrotransposon found in Trypanosoma cruzi, short interspaced nuclear elements (SINEs) in Schistosomes, Penelope-like elements and retrozymes.
  • SINEs short interspaced nuclear elements
  • Penelope-like elements and retrozymes retrozymes.
  • ribozyrnes see, e.g., Doherty, et al. Ann. Rev. Biophys. Biomol. Struct. 30: 457-475 (2001) and Weinberg, et ah, Nucleic Acids Research , (47) 18: 9480-9494 (2.019); incorporated herein by reference in its entirety for all purposes.
  • a structure -forming nucleic acid within tire msd sequence refers to an exogenous nucl eic acid sequence inserted within the loop-forming structure of the msd sequence that is able to form a structured region of nucleic acid when expressed as a retron ncRNA.
  • Hie exogenous nucleic acid sequence can be placed adjacent to the programmed ssDNA sequence (i.e., donor) in the same loop of the msd region, in the retron ncRNA form, the structure resides 3' of the donor or programmed ssDNA sequence. In the retron msDNA form, this becomes the 5' end.
  • This structure can also be placed on the other side of the programmed ssDNA sequence. While not wishing to be held by theory ⁇ , this structure may aid the proper folding of the msr/msd structure in the retron ncRNA to enhance reverse transcription, or may enhance the stability of the msDNA and protect it from cellular nucleases.
  • Percent similarity in the context of polynucleotide or peptide sequences, is determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the sequence (c.g . an msr locus sequence) in the comparison window may comprise additions or deletions (i ,e., gaps) as compared to the reference sequence which does not comprise additions or deletions, for optimal alignment of the two sequences.
  • the percentage is calculated by determining the number of positions at which the identical nucleotide or amino acid occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of similarity (e.g., sequence similarity).
  • a polynucleotide or peptide has at least about 70% similarity (e.g., sequence similarity), preferably at least about 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93, 94%, 95%, 96%, 97%, 98%, 99%, or 100% similarity, to a reference sequence, when compared and aligned for maximum correspondence over a comparison window, or designated region as measured using one of the following sequence comparison algorithms or by manual alignment and visual inspection, such sequences are then said to be “substantially similar.”
  • this definition also refers to the complement of a test sequence.
  • sequence comparison typically one sequence acts as a reference sequence, to which test sequences are compared.
  • test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters can be used, or alternative parameters can be designated.
  • sequence comparison algorithm then calculates the percent sequence similarities for the test sequences relative to the reference sequence, based on the program parameters. For sequence comparison of nucleic acids and proteins, the BLAST and BLAST 2.0 algorithms and the default parameters discussed below are used.
  • BLAST and BLAST 2.0 algorithms are described in Altschul et ah, (1990) J. Mol. Biol. 215: 403-410 and Altschul et al. (1977) Nucleic Acids Res. 25: 3389-3402, respectively.
  • Software for performing BLAST analyses is publicly available at the National Center for Biotechnology Information website, ncbi.nlm.nih.gov.
  • the algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive -valued threshold score T when aligned with a word of the same length in a database sequence.
  • HSPs high scoring sequence pairs
  • T is referred to as the neighborhood word score threshold (Altschul et ah, supra). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always >0) and N (penalty score for mismatching residues; always ⁇ 0).
  • the BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin and Altschul, Proe. Nafl. Acad. Sei. USA, 90:5873-5787 (1993)).
  • One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or ammo acid sequences would occur by chance.
  • P(N) the smallest sum probability
  • a nucleic acid is considered similar to a reference sequence if the smallest sum probabil ity in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.2, more preferably less than about 0.01, and most preferably less than about 0.001.
  • Another method of establishing percent identity in the context of the present disclosure is to use tire MPSRCH package of programs copyrighted by the University of Edinburgh, developed by John F. Collins and Shane S, Sturrok, and distributed by TntelliGenetics, Inc. (Mountain View, CA). From this suite of packages, the Smith Waterman algorithm can be employed where default parameters are used for the scoring table (for example, gap open penalty of 12, gap extension penalty of one, and a gap of six). From the data generated the "Match" value reflects "sequence identity.”
  • Other suitable programs for calculating the percent identity or similarity between sequences are generally known in the art, for example, another alignment program is BLAST, used with default parameters.
  • homology can be determined by hybridization of polynucleotides under conditions which form stable duplexes between homologous regions, followed by digestion with single stranded specific nuclease(s), and size determination of the digested fragments.
  • DNA sequences that are substantially homologous can be identified in a Southern hybridization experiment under, for example, stringent conditions, as defined for that particular system. Defining appropriate hybridization conditions is within the skill of the art. See, e.g,, Sambrook et al., supra,' DNA Cloning, supra; Nucleic Acid Hybridization, supra.
  • homologous region refers to a region of a nucleic acid with homology to another nucleic acid region. Thus, whether a "homologous region” is present in a nucleic acid molecule is determined with reference to another nucleic acid region in the same or a different molecule. Further, since a nucleic acid is often double-stranded, the term “homologous, region,” as used herein, refers to the ability of nucleic acid molecules to hybridize to each other. For example, a single-stranded nucleic acid molecule can have two homologous regions which are capable of hybridizing to each other. Thus, the term “homologous region” includes nucleic acid segments with complementary sequences.
  • Homologous regions may vary in length but will typically be between 4 and 500 nucleotides (e.g., from about 4 to about 40, from about 40 to about 80, from about 80 to about 120, from about 120 to about 160, from about 160 to about 200, from about 200 to about 240, from about 240 to about 280, from about 280 to about 320, from about 320 to about 360, from about 360 to about 400, front about 400 to about 440, etc.).
  • Genetic disease refers to a disease, partially or completely, directly or indirectly, caused by one or more abnormalities in the genome, especially a condition that is present from birth.
  • the abnormality may be a mutation, an insertion or a deletion.
  • the abnormality' may affect the coding sequence of the gene or its regulatory sequence.
  • the genetic disease rnay be selected from the group consisting of an inherited muscle disease (e.g., congenital myopathy or a muscular dystrophy), a lysosomal storage disease, a heritable disorder of connective tissue, a neurodegenerative disorder, and a skeletal dysplasia.
  • the genetic disease may be, but is not limited to, Duchenne muscular dystrophy (DMD), Becker's muscular dystrophy, Lamb-girdle muscular dystrophy, dysferlinopathy, dystroglyeanopatliy, aspartylglucosaminuria, Batten disease, cystinosis, Fabry- 7 disease, Gaucher disease, Pompe disease, Tay Sachs disease, Sandhoff disease, meiachromatic leukodystrophy, mucolipidosis, mucopolysaccharide storage diseases, Niemann-Pick disease, Schindler disease, Krabbe disease, Ehlers-Danlos syndrome, epidermolysis bullosa, Marfan syndrome, neurofibromatosis, spinal muscular atrophy, amyotrophic lateral sclerosis, progressive muscular atrophy, fragile X syndrome, Charcot-Marie-Tooth disease, osteogenesis imperfecta, achondroplasia, or osteopetrosis.
  • DMD Duchenne muscular dystrophy
  • Other genetic diseases include hemophilia, cystic fibrosis, Huntington's chorea, familial hypercholesterolemia (LDL receptor defect), hepatoblastoma, Wilson's disease, congenital hepatic porphyria, inherited disorders of hepatic metabolism, Lesch Nyhan syndrome, sickle cell anemia, thalassaemias, xeroderma pigmentosum, Faneoms anemia, retinitis pigmentosa, ataxia telangiectasia, Bloom's syndrome, retinoblastoma, and Tay-Saehs disease.
  • transfection is used to refer to the uptake of foreign DNA by a cell.
  • a cell has been "transfected” when exogenous nucleic acids have beeu introduced inside the cell membrane.
  • transfection techniques are generally known in the art. See, e.g., Graham etal. (1973) Virology, 52:456, Sarnbrook etal. (2001) Molecular Cloning, a laboratory- ⁇ manual, 3rd edition, Cold Spring Harbor Laboratories, New York, Davis et al. (1995) Basic Methods in Molecular Biology, 2nd edition, McGraw-Hill, and Chu et al. (1981) Gene 13: 197.
  • Such techniques can be used to introduce one or more exogenous nucleic acid moieties into suitable host cells.
  • the term refers to both stable and transient uptake of the genetic material and includes uptake of peptide- or antibody-linked nucleic acids.
  • donor polynucleotide or “ ' donor sequence” refers to a polynucleotide that provides a sequence of an intended edit to be integrated into the genome at a target locus by HDR II, Compositions
  • nucleic acids encoding a retron that include (a) a stabilizing 5' ribozymc sequence, (h) an msr sequence, (c) an msd sequence, (d) a subject expression sequence within the msd sequence, and (e) a first inverted repeat sequence and a second inverted repeat sequence, wherein the nucleic acid does not include a guide RNA coding region.
  • compositions lead to significantly higher gene editing efficiency when employed with gene editing technology. Such effects were not seen in prior-described constructs wherein the nucleic acid includes a guide RNA region.
  • the subject expression sequence is a donor sequence for homology directed repair (HDR).
  • the nucleic acid includes a 3' ribozyme sequence. Any 5' ribozymc can be used as long as it leaves a stabilizing sequence when cleaved from the retron.
  • the 5' ribozyme sequence or sequences can be a Hammerhead Ribozyme, HDY ribozyme, RiboJ, CPEB3, Again 1 1, Agam 2 2, Pmar 1, Bflo 1, Bflo 2, Spur 1, Spur 2, Spur 3, Spur 4, Ppae_L Cjap_l, FpraJ, CJVJ, Dpap_L Tatr _1, CPEB3, G HDY. AJHDV,
  • PongoAbclii SumatranOrangutan//! 66 MicrocebusMurmus MouseLemur/i/l 66, TupaiaBelangeri__NorthemTreesh/l 66, Rabbit/84/4 75, Human. ChrlO/290/4 75,
  • SorexArane us CommonShrew,/ 1 / 1 66, Mouse .cirri 9__ CPEB3/70/4 75, Rat. Chr 1/411/4 74, EquusCaballus/I/I 69, Lamajpacos__Alpaca/T/l 70, Opossum/55/4 75, Macropus__eugenii_Tammar_Wahab/l 72, Monodelphis__domestica__Grey_Sho/l 71, CaviaPorcellus GuineaPig/1/ 1 66, OchotonaPrinceps AmericanPike/71 66,
  • nucleic acid encoding the retron furthers comprises a stem-loop sequence located between (a) the stabilizing 5' ribozyme sequence and (b) tire msr sequence.
  • nucleic acids encoding (a) a retron that includes (i) a stem loop-stabilizing 5' ribozyme sequence, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, and (b) a guide RNA coding region.
  • a retron that includes (i) a stem loop-stabilizing 5' ribozyme sequence, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, and (b) a guide RNA coding region.
  • the nucleic acid includes an HDV ribozyme.
  • the subject expression sequence is a donor sequence for homology directed repair (HDR).
  • the nucleic acid includes a 3' ribozyme sequence. Any 5' ribozyme can be used as long as it leaves a stem loop sequence or pseudoknot when cleaved from the retron.
  • An example of a 3' ribozyme includes, but is not limited to RihoJ.
  • nucleic acids encoding a retron that include (i) a stabilizing 5' sequence specific RNA cleavage site sequence, (ii) an msr sequence, (iii) an rnsd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat, sequence, and a guide RNA coding region.
  • the nucleic acids include a donor sequence for homology directed repair (HDR).
  • the nucleic acids include a 3' ribozyme sequence. Many site-specific RNAse cleavage sites are known in the art.
  • RNAse which the host cell already has
  • RNAse which the host cell doesn't have, in which case one also would supply the RNAse (for example, the Pumilio-RNase fusion or the Csy4 nuclease below).
  • Hie advantage of the first method is that the RNase does not have to be expressed separately.
  • the second approach is more generalizable.
  • Rnase ill target motifs can be used.
  • Rntlp is the RNase ⁇ P enzyme
  • stem-loops with NGNN tetraloops and at. least 10-14 bp of stem, which can include imperfect complementarity can be targeted for cleavage.
  • Drosha is the primary nuclear RNase III enzyme. Drosha will process primary miRNAs (pri-miRNAs), which contain one or more characteristic hairpin structures.
  • miRNA hairpins are recognized and cleaved by the nuclear Microprocessor complex, a heterotrimeric complex consisting of one molecule of DROSHA, an RNase III enzyme, and two molecules of DGCR8, a double-stranded RNA (dsRNA)-binding protein, to release ⁇ 60- 80 nt precursor miRNAs (pre-miRNAs).
  • dsRNA double-stranded RNA
  • Pumilio domains can be fused to a general RNA cleavage domain to recognize specific targets for cleavage or, for examples, one can utilize a specific stem-loop (5 '-GUUC ACUGCCGUAUAGGCAGCU-3 ') targeted by the CRISPR Csy4 nuclease.
  • the nucleic acid can further include a structure-forming nucleic acid within the msd sequence.
  • nucleic acids encoding a retron can include other stabilizing sequences or structures at the 5’ end, such as G-quadrup!exes.
  • nucleic acids encoding (a) a retron that includes (i) a 5’ stem loop stabilizing sequence, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, and (b) a guide RNA coding region.
  • the 5" stem loop stabilizing sequence comprises a G-quadraplex.
  • retrons comprising msr, msd, and inverted repeat sequences that can be used in the nucleic acids of the disclosure are provided in Table 1.
  • the retrons in Table 1 also express reverse transcriptases that can be used in the methods of the disclosure.
  • the retron encoded by the nucleic acids described herein is a Retron-Ecol (Ec86) retron. IV. Methods of use
  • nucleic acid compositions described above encoding a retron that includes (i) a stabilizing 5' ribozyme sequences, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, wherein the nucleic acid does not comprise a guide RNA and (b) a reverse transcriptase or a nucleic acid encoding the same.
  • the subject expression sequence includes a donor sequence tor homology directed repair (HDR).
  • the nucleic acid further includes a 3' ribozyme sequence.
  • the subject expression sequence includes a donor sequence for homology directed repair (HDR).
  • the nucleic acid further includes a 3' ribozyme sequence.
  • Also provided herein are methods of editing DNA in a cell comprising contacting the ceil with (a) any of the compositions described above encoding a retron that includes (i) a stabilizing 5' ribozyme sequences, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, wherein said nucleic acid does not comprise a guide RNA coding region, (b) a reverse transcriptase or a nucleic acid encoding the same, and (c) a sequence specific endonuclease or a nucleic acid encoding, thereby editing the DNA of the cell.
  • Also provided herein are methods of editing DNA in a cell comprising contacting the ceil with (a) any of the compositions described above encoding a retron that includes (i) a stem loop-stabilizing 5' ribozyme sequences, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, (b) a reverse transcriptase or a nucleic acid encoding the same, and (e) a sequence specific endonuclease or a nucleic acid encoding, thereby editing the DNA of the cell.
  • a retron that includes (i) a stem loop-stabilizing 5' ribozyme sequences, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first in
  • sequence specific endonuclease is a CRISPR-associated endonuclease, a Zinc-finger nuclease, a Transcription acfivator-like effector nuclease (TALEN), or a meganuclease.
  • sequence specific endonuclease is a CRISPR-associated endonuclease, and wherein the method comprises administering to the subject one or more a guide RNAs (gRNAs), or one or more nucleic acids encoding the same.
  • gRNAs guide RNAs
  • nucleic acid compositions described above encoding a retron that includes (i) a stabilizing 5' ribozyme sequences, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, wherein the nucleic acid does not comprise a guide RNA and (b) a reverse transcriptase or a nucleic acid encoding the same, and (c) a sequence specific endonuclease or a nucleic acid encoding the same, thereby editing the DNA of the cell.
  • Also provided herein are methods of treating a genetic disease in a subject in need comprising administering to the subject (a) any of the nucleic acid compositions described above encoding a retron that includes (i) a stem loop-stabilizing 5' ribozyme sequences, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second invested repeat sequence, (b) a reverse transcriptase or a nucleic acid encoding the same, and (c) a sequence specific endonuclease or a nucleic acid encoding the same, thereby editing the DNA of the cell.
  • a retron that includes (i) a stem loop-stabilizing 5' ribozyme sequences, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence
  • sequence specific endonuclease is a CRISPR-associated endonuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease
  • the sequence specific endonuclease is a CRISPR-associated endonuclease
  • the method comprises administering to the subject one or more a guide RNAs (gRNAs), or one or more nucleic acids encoding the same.
  • gRNAs guide RNAs
  • Genome editing may be performed on a single cell or a population of cells of interest and can be performed on any type of cell, including any cell from a prokaryotic, eukaryotic, or archaeon organism, including bacteria, archaea, fungi, protists, plants, and animals.
  • Cells from tissues, organs, and biopsies, as well as recombinant cells, genetically modified cells, cells from eel! lines cultured in vitro , and artificial cells (e.g., nanoparticles, liposomes, polymersomes, or microcapsules encapsulating nucleic acids) may all be used in the practice of the present disclosure.
  • the methods of the disclosure are also applicable to editing of nucleic acids in cellular fragments, cell components, or organelles comprising nucleic acids (e.g., mitochondria in animal and plant cells, plastids (e.g., chloroplasts) in plant cells and algae).
  • Cells may be cultured or expanded prior to or after performing genome editing as described herein.
  • the cells are yeast cells.
  • the cells are mammalian cells.
  • RNA-guided nuclease can be targeted to a particular genomic sequence (i.e., genomic target sequence to be modified) by altering its guide RNA sequence.
  • a target-specific guide RNA comprises a nucleotide sequence that is complementary ' ⁇ to a genomic target sequence, and thereby mediates binding of the nuclease-gRNA complex by hybridization at the target site.
  • the gRNA can be designed with a sequence complementary to the sequence of a minor allele to target the nuclease-gRNA complex to the site of a mutation.
  • the mutation may comprise an insertion, a deletion, or a substitution.
  • the mutation may include a single nucleotide variation, gene fusion, translocation, inversion, duplication, frame shift, missense, nonsense, or other mutation associated with a phenotype or disease of interest.
  • the targeted minor allele may be a common genetic variant or a rare genetic variant.
  • the gRNA is designed to selectively bind to a minor allele with single base-pair discrimination, for example, to allow binding of the nuclease-gRNA complex to a single nucleotide polymorphism (8NP).
  • the gRNA may he designed to target disease-relevant mutations of interest for the purpose of genome editing to remove the mutation from a gene.
  • the gRNA can be designed with a sequence complementary 7 to the sequence of a major or wild-type allele to target the nuclease-gRNA complex to the allele for the purpose of genome editing to introduces a mutation into a gene in the genomic DNA of the ceil, such as an insertion, deletion, or substitution.
  • Such genetically modified ceils can be used, for example, to alter phenotype, confer new properties, or produce disease models for drug screening.
  • the RNA-guided nuclease used for genome modification is a clustered regularly interspaced short palindromic repeats (CRJSPR) system Cas nuclease.
  • CRJSPR regularly interspaced short palindromic repeats
  • Any RNA-guided Cas nuclease capable of catalyzing site-directed cleavage of DNA to allow integration of donor polynucleotides by the HDR mechanism can be used in genome editing, including CRISPR system type I, type II, or type III Cas nucleases.
  • Cas proteins examples include Cast, Cas IB, Cas2, Cas3, Cas4, Cas5, CasSe (CasD), Cas6, Cas6e, Cas6f, Cas7, CasSa1, Cas8a2, CasBb, Cas8c, Cas9 (Csnl or Csx12), Cas 10, Cas10d, Cas 12, Cas 12a (e.g.
  • Cpfl, Mad7 CasX, CasY, CasO, CasF, CasG, CasH, Csy1, Csy2, Csy3, Cse1 (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Csc1, Csc2, Csa.5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm 6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, and Cu1966, and homologs or modified versions thereof.
  • a type ii CR1SPR system Cas9 endonuclease is used.
  • Cas9 nucleases from any species, or biologically active fragments, variants, analogs, or derivatives thereof that retain Cas9 endonuclease activity i.e., catalyze site-directed cleavage of DNA to generate double-strand breaks
  • the Cas9 need not be physically derived from an organism but may be synthetically or recombinantly produced, Cas9 sequences from a number of bacterial species are well known in the art and listed in the National Center for Biotechnology Information (NCBI) database.
  • NCBI National Center for Biotechnology Information
  • YP_002342100 all of which sequences (as entered by the date of filing of this application) are herein incorporated by reference. Any of these sequences or a variant thereof comprising a sequence having at least about 70-100% sequence identity thereto, including any percent identity within this range, such as 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% sequence identity thereto, can be used for genome editing, as described herein. See also Fonfara et al. (2014) Nucleic Acids Res.
  • the bacterial type P CRISPR system uses the endonuclease, Cas9, which forms a complex with a guide RNA (gRNA) that specifically hybridizes to a complementary' genomic target sequence, where the Cas9 endonuclease catalyzes cleavage to produce a double-stranded break.
  • Cas9 endonuclease
  • gRNA guide RNA
  • PAM 3' protospacer-adjacent motif
  • the genomic target site will typically comprise a nucleotide sequence that is complementary to the gRNA and may further comprise a protospacer ad j acent motif (PAM).
  • the target site comprises 20-30 base pairs in addition to a 3 base pair PAM.
  • the first nucleotide of a PAM can be any nucleotide, while the two other nucleotides will depend on the specific Cas9 protein that is chosen.
  • Exemplary PAM sequences are known to those of skill m the art and include, without limitation, NNG, NGN, NAG, and NGG, wherein N represents any nucleotide, in certain embodiments, the allele targeted by a gRNA comprises a mutation that creates a PAM within the allele, wiierein the PAM promotes binding of the Cas9-gRNA complex to the allele.
  • the gRNA is 5-50 nucleotides, 10-30 nucleotides, 15-25 nucleotides, 18-22 nucleotides, or 19-21 nucleotides in length, or any length between the stated ranges, including, for example, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides in length.
  • the guide RNA may be a single guide RNA comprising crRNA and traeiRNA sequences in a single RNA molecule, or the guide RNA may comprise two RNA molecules with crRNA and tracrRNA sequences residing in separate RNA molecules.
  • Cpfl is another class P CRISPR/Cas system RNA-guided nuclease with similarities to Cas9 and may be used analogously. Unlike Cas9, Cpfl does not require a tracrRNA and only depends on a crRNA in its guide RNA, wdiich provides the advantage that shorter guide RNAs can be used with Cpfl for targeting than Cas9. Cpfl is capable of cleaving either DNA or RNA.
  • the PAM sites recognized by Cpfl have the sequences 5'-YTN-3' (where "Y” is a pyrimidine and “N” is any nucleobase) or 5'-TTN-3', in contrast to the G-rich PAM site recognized by Cas9.
  • Cpfl cleavage of DNA produces double-stranded breaks with a sticky -ends having a 4 or 5 nucleotide overhang.
  • C2clis another class 11 CRISPR/Cas system RNA-guided nuclease that may be used.
  • C2cl similarly to Cas9, depends on both a crRNA and tracrRNA for guidance to target sites.
  • RNA-guided Fold nucleases comprise fusions of inactive Cas9 (dCas9) and the Fokl endonuclease (FokI-dCas9), wherein the dCas9 portion confers guide RNA-dependent targeting on Fokl.
  • dCas9 inactive Cas9
  • FokI-dCas9 Fokl endonuclease
  • engineered RNA-guided Fokl nucleases see, e.g., Havlicek et al. (2017) Mol. Ther. 25(2): 342-355, Pan et al. (2016) Sci Rep. 6:35794, Tsai et al. (2014) Nat Biotechnol. 32(6):569-576; herein incorporated by reference.
  • the RNA-guided nuclease can be provided in the form of a protein, such as the nuclease complexed with a gRNA, or provided by a nucleic acid encoding the RNA-guided nuclease, such as an RNA (e.g., messenger RNA) or DNA (expression vector). Codon usage may be optimized to improve production of an RNA-guided nuclease in a particular cell or organism.
  • RNA e.g., messenger RNA
  • DNA expression vector
  • a nucleic acid encoding an RNA-guided nuclease can be modified to substitute codons having a higher frequency of usage in a yeast cell, a bacterial cell, a human ceil, a non-human cell, a mammalian cell, a rodent cell, a mouse ceil, a rat cell, or any other host cell of interest, as compared to the naturally occurring polynucleotide sequence.
  • the protein can be transiently, conditionally, or eonstitutively expressed m the ceil.
  • Donor polynucleotides and gRNAs are readily synthesized by standard techniques, e.g., solid phase synthesis via phosphoramidite chemistry, as disclosed in U.8. Patent Nos. 4,458,066 and 4,415,732, incorporated herein by reference; Beaucage et al., Tetrahedron (1992) 48:2223-2311 ; and Applied Biosystems User Bulletin No. 13 (1 April 1987), Other chemical synthesis methods include, for example, the phosphotriester method described by Narang et al.,Meth. Enzymoi, (1979) 68:90 and the phosphodiester method disclosed by Brown et al., Meth.
  • gRNA-donor polynucleotide cassettes can be produced by standard oligonucleotide synthesis techniques and subsequently ligated into vectors. Moreover, libraries of gRNA-donor polynucleotide cassettes directed against thousands of genomic targets can be readily created using highly parallel array-based oligonucleotide library synthesis methods (see, e.g., Cleary et al. (2004) Nature Methods 1 : 241-248, Svensen et al. (2011) PLoS One 6(9):e24906).
  • adapter sequences can be added to oligonucleotides to facilitate high- throughput amplification or sequencing .
  • a pair of adapter sequences can be added at the 5' and 3' ends of an oligonucleotide to allow amplification or sequencing of multiple oligonucleotides simultaneously by the same set of primers.
  • restriction sites can be incorporated into oligonucleotides to facilitate cloning of oligonucleotides into vectors.
  • oligonucleotides comprising gRNA-donor polynucleotide cassettes can be designed with a common 5' restriction site and a common 3' restriction site to facilitate ligation into the genome modification vectors.
  • a restriction digest that selectively cleaves each oligonucleotide at the common 5' restriction site and the common 3' restriction site is performed to produce restriction fragments that can be cloned into vectors (e.g., plasmids or viral vectors), followed by transformation of cells with the vectors comprising the gRNA-donor polynucleotide cassetes.
  • vectors e.g., plasmids or viral vectors
  • Amplification of polynucleotides encoding gRNA-donor polynucleotide cassettes may be performed, for example, before ligation into genome modification vectors or before sequencing after barcoding. Any method for amplifying oligonucleotides may be used, including, but not limited to polymerase chain reaction (PCR), isothermal amplification, nucleic acid sequence-based amplification (NASBA), transcription mediated amplification (TMA), strand displacement amplification (SDA), and ligase chain reaction (LCR).
  • the genome editing cassettes comprise common 5' and 3' priming sites to allow amplification of the gRNA-donor polynucleotide sequences in parallel with a set of universal primers.
  • a set of selective primers is used to selectively amplify a subset of the gRNA-donor polynucleotides from a pooled mixture.
  • Cells that are transformed with recombinant polynucleotides comprising the genome editing cassettes may be prokaryotic cells or eukaryotic cells, and are preferably designed for high-efficiency incorporation of gRNA-donor polynucleotide libraries by transformation.
  • Methods of introducing nucleic acids into a host ceil arc well known in tire art. Commonly used methods of transformation include chemically induced transformation, typically using divalent cations (e.g., CaCh), and electroporation. See, e.g., Sambrook etal. (2001) Molecular Cloning, a laboratory manual, 3 rd edition. Cold Spring Harbor Laboratories, New York, Davis et ai. (1995) Basic Methods in Molecular Biology, 2 nd edition, McGraw-Hill, and Chu et ai. (1981) Gene 13: 197 ; herein incorporated by reference in their entireties.
  • Retrons are bacterial genes which encode a reverse transcriptase (RT), a non-coding ribonucleic acid RNA (ncRNA) which is converted into multi-copy single-stranded deoxyribonucleic acid DNA (msDNA) by the reverse transcriptase (RT) activity, and an effector protein which together function in defense against phages (Millman et al ., Cell, 2020). Recent w ork had established that retrons contained a loop region w'hieh could be expanded to accommodate large insertions of desired sequence for generating single-stranded donor DN A (Farzarfard et al.
  • ribozymes on the 5' and 3' side of the retron donor-guide cassete were tested, including no ribozyme (none), the hammerhead ribozyme (HHR), the hepatitis delta virus (HDV) ribozyme, the anti -genomic HDV (agHDV) ribozyme, the U5 small nuclear ribonucleic acid (snRNA) stem-looop cleaved by the yeast RNase III enzyme Rntlp (U5 Rntlp 8L), and RiboJ ribozyme, which is a HHR-related ribozyme from the satellite RNA of tobacco ringspot virus (sTRSV) followed by a 23 nucleotide synthetic stem loop (SL).
  • the superscript scissors denote the location of cleavage sites relative to the ribozymes (or Rntlp stem loop for U5).
  • the transcript is then processed by ribozymes on the 5' and 3' ends to remove the 5' 7-methylguanosine cap and 3' poly A tail, respectively, in order to prevent nuclear export and subsequent translation (FIG. IB).
  • the retron-specific reverse transcriptase (RT) binds to the msr region of the retron and copies the msd region of the retron RNA into DNA, which is engineered to include the donor sequence.
  • Hie host cell Rihonuciease H (RNase H) degrades the RNA in the resulting RNA-DNA hybrid to generate single-stranded donor ONA (FIG. IB).
  • IB shows the location of ribozymes in relation to the 5' cap and 3' poiyadenylic acid (poly A) tail.
  • poly A poiyadenylic acid
  • their RNA structures can stay bound to the processed retron transcript or be released, exposing the 5' end of the retron or 3' end of the guide RNA.
  • the retron RNA is reverse transcribed by the re verse transcriptase (RT) at a conserved guanosine (G) to create an unusual 2’-5' RNA-DNA branched structure.
  • RNA in the RNA-DNA hybrid generated by the RT is degraded by host-cell RNAse H, leaving a looped-out single-stranded donor deoxyribonucleic acid (DNA) as a template for homology directed repair.
  • DNA deoxyribonucleic acid
  • HHR-HDV ribozyrne combination was previously published (Sharon et al Cell, 2018), where the hammerhead ribozyrne (HHR) is situated on the 5 ' side, and the hepatitis delta virus (HDV) ribozyrne is located on the 3' side. Ribozyrne identity could be a major factor in efficiency and therefore all combinations of the HHR, HDV, or no ribozyrne at the 5' and 3' ends were analyzed (see FIG. 1A).
  • FIG. 1A The HHR-HDV ribozyrne combination was previously published (Sharon et al Cell, 2018), where the hammerhead ribozyrne (HHR) is situated on the 5 ' side, and the hepatitis delta virus (HDV) ribozyrne is located on the 3' side. Ribozyrne identity could be a major factor in efficiency and therefore all combinations of the HHR, HDV,
  • FIG. 2A shows the editing efficiency for all combinations of HDV, HHR, and no ribozyrne in the 5' and 3' positions of a retron donor-guide cassette targeting the ADE2 locus.
  • the guide is an 18-mer with designed mismatches at positions 19 and 20 from the protospacer adjacent motif (PAM).
  • PAM protospacer adjacent motif
  • the retron donor introduces CC-to-TG mutation which results in a premature termination codon.
  • This retron donor-guide cassete with the HHR-HDV ribozyrne was previously published (Sharon et ah, Cell, 2018).
  • the y-axis indicates the editing efficiency quantified as the % of genomic reads mapping to the designed donor sequence.
  • the x-axis is the number of generations in galactose, which induces Cas9, the RT, and the retron-donor guide transcript.
  • FIG. 2B shows the editing efficiency for all possible ribozymes in the 5' position paired with the HDV fixed in the 3' position.
  • RiboJ-HDV is the only ribozyrne combination to reach >90% editing efficiency after 18 generations of editing.
  • the published HHR-HDV (CRISPEY) system only reaches 18% editing efficiency.
  • the ribozyme combinations are sorted left to right by greatest to least retron cDNA produced in galactose.
  • glucose both the RT and retron donor transcription are repressed, yet the most efficient cassettes still produce detectable retron cDNA.
  • the primers also amplify the double-stranded donor on the retron guide cassette, which resides on a low copy centromeric vector. To control for this, the same retron cassetes were transformed into ceils lacking an RT, and the donor: genome ratio in such cells grown in galactose was first subtracted from the levels observed in the cells with the RT.
  • the genome has two strands which can bind both primers in the first round of polymerase chain reaction (PCR), while the donor has only one strand, so the donor: genome ratio is multiplied by 2 to obtain the values on the y- axis, “cDNA copies per genome equivalent.”
  • the ribozyme constructs had a substantial impact on cDNA levels, with effects spanning close to 3 orders of magnitude (FIG. 2D). With no ribozyme in the 3' position, the retron cDNA abundances were on the order of 750 to 900 copies per ceil for 5' HDV, 50 copies for no 5' ribozyme, and 1 copy for 5' HHR (FIG. 2D).
  • Retron cDNA levels were quantified with both galactose- inducible RT and constitutive alcohol/acetaldehyde dehydrogenase (ADHl-KT) in glucose, with retron non-coding RNA expressed from a hybrid SNR52-tRNA(Tyr) polymerase (Pol) III promoter with the 5' HDV (FIG. 3B).
  • results showed that separate expression of the retron donor and the guide enabled optimal processing of each component.
  • the higher retron cDNA levels achieved with the 5' HDV and the absence of a 3' HHR, and the higher guide efficacy achieved with the 5' HHR and 3' HDV (in the context of RNA polymerase II promoter ) or with the RNA polymerase III promoter outweighed the benefits from guide-mediated recruitment of the retron donor to the target site when they w r ere physically linked.
  • these results emphasize that the retron cDNA production is enhanced when expressed by Pol II with appropriate ribozymes flanking the retron construct (5'HDV with 3 'HHR or 5 'HDV with no ribozyme in the 3' position).
  • RNA polymerase II transcription is important for protecting of the retron RNA from cellular 5 ' -3 ' and 3'-5' exonucleases.
  • the HDV ribozyme and RiboJ ribozyme both of which leave structured RNA to protect the 5' side of the retron, enhanced retron activity.
  • Other ribozymes which self-cleave on their 5' ends or cleave internally and leave behind a protective structure (like RiboJ) can be used in this manner.
  • ribozymes which self-cleave on their 3' ends (like HHR) could he used to enhance retron RNA 3 '-end stability by placing them on the 3' end of the retron.
  • Additional RNA structures such as RNA stem loops could also be protective on the 3' end of the retron, or could assist in the proper folding of the msr and msd regions of the retron non-coding RNA, for example by insulating the programmed, inserted sequence (e.g. donor sequence) from interfering with the structured regions of the ribozyme through competing for base-pairing.
  • stabilizing retrons with structured RNA domains on either side of the retron RNA increases retron expression levels.
  • FIGS. 4A-4C Retron production in human cells is shown in FIGS. 4A-4C.
  • FIG. 4A shows transfections of HEK293 ceils with 250 ng of retron donor plasmid.
  • FIG. 4B shows quantification of retron cDNA levels by qPCR.
  • DNA was extracted from HEK293 cells expressing the indicated retron construct with or without RT. After Xbal treatment, the CACNA1D with RT consistently gives lower Ct values than the CACNA1D without RT.
  • FIG. 4C the CACNA1D amplicons from Xbal-treated gDNA extractions were subjected to NGS, and the ratio of reads harboring an Xbal site to those containing the genomic sequence was calculated.
  • the data m Fig. 4 shows that expressing the retron donor with 5' HDV produces detectable retron cONA in human HEK293 cells.
  • a nucleic acid encoding a retron comprising: a stabilizing 5' ribozyme sequence; an msr sequence; an msd sequence; a subject expression sequence within tire msd sequence; and a first inverted repeat sequence and a second inverted repeat sequence; wherein said nucleic acid does not comprise a guide RNA region.
  • nucleic acid of embodiment 1, wherein the subject expression sequence comprises a donor sequence for homology directed repair (HDR).
  • HDR homology directed repair
  • nucleic acid of embodiment 1 further comprising a 3' ribozyme sequence.
  • nucleic acid of any one of embodiments 1-3 wherein the 5' ribozyme sequence or sequences are selected from the group consisting of Hammerhead Ribozyme, HDV ribozyme, RiboJ, CPEB3, Agam_1_1, Agam_2_2, Pmar_1, Bfio__1, Bflo 2, Spur_1, Spur__2, Spur 3, Spur 4, PpaeJ, CjapJ, FpraJ, CIV 1, Dpap 1, Tatr 1, CPEB3, G HDV, A HDV, Canis familiaris/1/3 73, Felis_catus_domestic_cat/l/3 74,
  • PongoAbelii_SumatranOrarigutan//T 66 MicrocebusMurinusJViouseLemur/I/1 66, TupaiaBelangeri NorthemTreesh/T 66, Rabbit/84/4 75, Human. ChrlO/290/4 75,
  • Macropus eugenii Tammar W ailab/T 72 Monodelphis domestica .. Grey Sho/ 1 71, CaviaPorcellus_GuineaPig1l/1 66, QehotonaPrinceps__AmeneanPike//l 66, Dasypus_novemeinetus_Nme_Band/2 73, Choloepus_ho ⁇ Tmanni_Hofmanns_t/4 75,
  • nucleic acid of embodiment 3 or 4 wherein the 3' ribozyme is a Hammerhead ribozyme, HDV, RiboJ, or CPEB3.
  • nucleic acid of embodiment 7 or 8 wherein the stem loop-stabilizing 5' ribozyme sequence is HDV or RiboJ.
  • nucleic acid of embodiment 10, wherein the 3' ribozyme sequence is HDV or RiboJ.
  • a nucleic acid encoding a retron comprising:
  • nucleic acid does not comprise a guide RNA coding region.
  • nucleic acid of embodiment 12, wherein the subject expression sequence comprises a donor sequence for homology directed repair (HOR).
  • HOR homology directed repair
  • a method of generating retron nucleic acid in a cell comprising contacting the cell with a nucleic acid of any one of embodiments 1 to 16 and a reverse transcriptase or a nucleic acid encoding the same.
  • a method of editing a nucleic acid of a cell comprising contacting the ceil with a nucleic acid of any one of embodiments 1 to 16, a reverse transcriptase or a nucleic acid encoding the same, and a sequence specific endonuclease or a nucleic acid encoding the same, thereby altering the genomic sequence of the cell.
  • sequence specific endonuclease is a CRISPR-associated endonuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease.
  • sequence specific endonuclease is a CRISPR-associated endonuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease.
  • CRISPR-associated endonuclease is selected from Cas1, Cas1B, Cas2, C2c1, Cas3, Cas4, Cas5, Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8al, Cas8a2, Cas8b, Cas8c, Cas9 (Csn1 or Csx12), SpCas9, FokI-dCas9, Cas10, Cas10d, Cas12, Cas12a, Mad 7TM, CasX, CasY, Casd), CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel1 (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm
  • a method of treating a genetic disease in a subject in need comprising administering to the subject an effective amount of a nucleic acid of any one of embodiments 1 to 16, a reverse transcriptase or a nucleic acid encoding the same, and a sequence-specific endonuclease or a nucleic acid encoding the same.
  • sequence-specific endonuclease is a CRISPR-associated endonuclease, a Zinc-finger nuclease, a Transcription activator-hke effector nuclease (TALEN), or a meganuciease.
  • (b) optionally, a guide RNA region.
  • nucleic acid of embodiment 27, wherein the subject expression sequence comprises a donor sequence for homology directed repair (HOR).
  • HOR homology directed repair
  • Tire nucleic acid of embodiments 27 or 28, wherein the stabilizing 5' sequence comprises a stable RNA structure or a G-quadruplex.
  • RiboJ. 32 The nucleic acid of any one of embodiments 27 to 31, wherein said nucleic acid does not comprise a guide RNA region.
  • Feiis_catus_domestic_cat/l/3 74
  • MicrocebusMurinus_MouseLemur/l/l 66
  • Macaca mulatta/l/l 70 [0136] Macaca mulatta/l/l 70:
  • EquusCaballus/1/1 69 [0140] EquusCaballus/1/1 69:
  • Opossum/55/4 75 AGGGGGCCATAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCATAGGAAT
  • CTGCGAATT CTGCTGCAC SEQ ID NO: 18
  • CTGCGAATT CTGCTGCAC SEQ ID NO: 19
  • Ca31ithrix_jacchus_Coramon_mami/I 69 GGGGGGCACAGCAGAAGCATTCACTTCGTGGCCCCTGTCAGATTCTAGTGAATCT GCGAATT CTGCTGT (SEQ ID NO:27)
  • AAAC AUGGC UAAAUU GAGAGGG (SEQ ID NO: 35)
  • a HD V A HD V :
  • HDV GATGGCCGGCATGGTCCCAGCCTCCTCGCTGGCGCCGGCTGGGCAACACCTTCGG
  • TCTGAGTTACTGTCTGTTTTCCT (SEQ ID NO:56) (first fragment), programmable loop, AGGAAACCCGTTTCTTCTGACGTAAGGGTGCGCA (SEQ ID NQ:57) (second fragment with inverted repeat).

Abstract

Provided herein, inter alia, are compositions and methods for nucleic acids encoding a retron, comprising a stabilizing 5' ribozyme sequence, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence and (v) a first inverted repeat sequence and a second inverted repeat sequence, and may or may not include a guide RNA region. In aspects, the compositions and methods are used for genetic editing.

Description

COMPOSITIONS AND METHODS FOR EFFICIENT RETRON PRODUCTION
AND GENETIC EDITING
[0001] This application claims the benefit of priority to U.8. Provisional Application No. 63/214,197, filed June 23, 2021, the disclosure of which is hereby incorporated by reference in its entirety for all purposes.
STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER
FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT [0002] This invention was made with Government support under Grant Nos. P01 HG000205, R01GM121932, and UQ1GM110706 awarded by the National institutes of Health. Hie Government has certain rights in the invention.
BACKGROUND
[0003] Retrons are two-component systems found in the genomes of many prokaryotic species and consist of a reverse transcriptase and a unique single-stranded DNA/RNA hybrid called multicopy single-stranded DNA (msDNA). Previous work in bacteria, yeast, and mammalian cells has shown that retron systems can function outside of their native prokaryotic hosts and that they can be reprogrammed to produce single-stranded deoxyribonucleic acid (ssDNA) of any desired sequence. This sequence can be inserted into a loop region of the msd component of the retron non-coding ribonucleic acid (RNA) region, which is converted into ssDNA by the retron reverse transcriptase. Previous studies employing the retron in eukaryotic cells have used the hammerhead ribozyme (HHR) to release the retron RNA from the 5' end of the primary RNA polymerase II transcript. 5' processing with HHR had been shown to enhance the function of other non-coding RNAs transcribed by RNA polymerase II, such as clustered regularly interspaced short palindromic repeats guide RNAs (CRISPR-gRNAs), but the importance of 5' ribozymes on retron function has not been systematically explored.
[0004] Homology-directed repair (HDR) efficiencies are a major bottleneck for genome editing in many organisms and cell lines in both basic and translational research. Whie specialized methods to generate small variants such as base editing and prime editing exist, there is a need for efficient approaches to introduce genome edits of arbitrary size, for which HDR-based approaches are uniquely suited. Single-stranded donor DNA has proven to be more effective for HDR than double-stranded DNA (dsDNA) in a wide range of organisms, but long single -stranded DNA (ssDNA) is expensive to prepare. For example, a single Megamer® ssDNA Fragment (201-500 bases) from IDT costs $450.00 US (Integrated DNA Technologies, Coraiville, IA). Furthermore, long ssDNA is not amenable to multiplexed editing approaches due to both the high cost and the need to have each donor be paired with a specific guide. On the other hand, multiplexed approaches pairing retron donors with guides are straightforward to implement. Retron donor-guide constructs can be purchased in pooled format from array synthesis providers (e.g. Twist Bioscience, South San Francisco, CA; Agilent Technologies, Inc., Santa Clara, CA), where each donor-guide pair cost >3 orders of magnitude less than the Megamer ssDNA equivalent. Furthermore, delivering ssDNA into the cell requires it to first pass through the cytoplasm. It is known that in many cell types, excessive cytoplasmic DNA can trigger a viral immune response and lead to apoptosis. Delivery of the retron as a vector would he less proue to cause this response, as the msDNA would only be produced once the retron vector has entered the nucleus.
[0005] Beyond genome editing, the expression of retrons m eukaryotic cells may have other uses for basic and applied science and medicine. For example, the retron complementary DNA (cDNA, also known as the msd or reverse transcribed region of the msDNA), could be targeted to bind to niRNAs at specific regions to compete with RNA-hindhig proteins or other RNAs, such as spliceosomal RNAs or microRNAs, to modulate alternative splicing and/or modify RNA stability. Retrons could be used for single-cell lineage tracing experiments, where the msd loop region in the retron could be mutagenized by some random process such as CRISPR- AID (Hess et ai., Nat Methods, 2016 PMCID: PMC5557288), The retron would amplify the barcode levels in each cell through accumulation of many cDNA copies, which could be assayed by amplicon polymerase chain reaction (PCR) at the single-cell or bulk level and coupled to next generation sequencing (NGS) for barcode counting.
[0006] The instant disclosure provides efficient approaches to generate substantial levels of engineered retron msDNA in eukaryotic cells, and for increasing the efficiency of genetic editing in cells.
BRIEF SUMMARY
[0007] Provided herein are nucleic acids encoding a retron that include (i) a stabilizing 5' ribozyme sequence, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, wherein said nucleic acid does not include a guide RNA region. In aspects, the nucleic acids include a donor sequence for homology directed repair (HDR). In aspects, the nucleic acids include a 3' ribozyme sequence, in some embodiments, the 5' ribozyme sequence or sequences are selected from the group consisting of Hammerhead Ribozyme, HDV ribozyme, RiboJ, CPEB3, Agam_1__1, Agam_2_2, Pmar_1, Bflo_1, Bflo_2, 8pur__11 Spur_2, Spur_3, Spur_4, Ppac_1, Cjap_1, Fpra_1, CIV_J , Dpap__1, Tatr_1, CPEB3, G HDV, A HDV, Canis familiaris/i/3 73, Felis catus domestic cat/ 1/3 74,
Ailuropoda melanoleuca Giant p/3 73, Elephant/ 113/4 75,
PongoAbelii_SumatranOrangutan//l 66, MicrocebusMurinus_MouseLemur/l/l 66, TupaiaBelangeri_NorthemTreesh/l 66, Rabbit/84/4 75, Human .Chr 10/290/4 75, Chimp PanTroglodytes/49/4 75, Rhesus/23/4 75, MacacajMulatta/1/1 70, SorexAraneus_CommonShrew/l/l 66, Mouse .chr 19_CPEB3/70/4 75, Rat. Chr 1/411/4 74, EquusCaballus/1/1 69, Lamajpacos_A!paca/l/l 70, Opossum/55/4 75, Macropus eugenii Tammar Wailab/1 72, Monodelphis domestica Grey 8ho/l 71, CaviaPofcellus GuineaPig/1/Ί 66, OchotonaPrinceps AmencanPike/71 66, Dasypus_novemcinctus_Nine_Band/2 73, Choloepus_hofi&nanni_Hofmanns_t/4 75, MyotisLucifugus_BrownBat/l/l 60, Cow/112/4 75, Callithrixjacclius_Common_marm/l 69, Tursiops truncatiis Bottlenose /I 70, EchinopsTelfairi HedgehogTenre/272, Sus 8crofa/l/2 71, Dipodomys_ordsii_Ords_Kangaroo/3 72, Pteropus_vampyrus_Malayan_Flyi/l 69, Gorilla_gorilla, self-cleaving ribozyme-containing R2 elements, the LITc retrotransposon found in Trypanosoma cruzi, short interspaced nuclear elements (SlNEs) in Schistosomes, Penelope-like elements, and retrozymes. In some embodiments, the 3’ ribozyme is a Hammerhead ribozyme, HD V, RiboJ, or CPEB3. In some embodiments, nucleic acid furthers comprises a stem-loop sequence located between the stabilizing 5' ribozyme sequence and the msr sequence.
[0008] Also provided herein are nucleic acids encoding a retron that include (i) a stem-loop stabilizing 5' ribozyme sequence, (ii) an msr sequence, (hi) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, and a guide RNA region. In aspects, the nucleic acids include a donor sequence for homology directed repair (HDR). in some embodiments, the subject expression sequence comprises a donor sequence for homology directed repair (HDR). in some embodiments, the stem loop-stabilizing 5' ribozyme sequence is HDV or RiboJ. In aspects, the nucleic acids include a 3' ribozyme sequence. In some embodiments, the 3' ribozyme sequence is HDV or RiboJ. [0009] Also provided herein are nucleic acids encoding a retron that include (i) a stabilizing 5 ’-end sequence -specific RNA cleavage site sequence, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, in some embodiments, the first inverted repeat sequence is located at the 5" end of the retron and the second inverted repeat sequence is located at the 3’ end of the retron. in some embodiments, the entire retron is flanked by the first and second inverted repeat sequences. In some embodiments, the nucleic acid does not comprise a guide RNA coding region. In aspects, the nucleic acids include a donor sequence for homology directed repair (HDR). in aspects, tire nucleic acids include a 3' ribozyme sequence. In some embodiments, the nucleic acids comprise a 3' stabilizing stem-loop structure or 3' ribozyme sequence which leaves behind a stabilizing RNA structure. In some embodiments, the stabilizing 5'-end sequence specific RNA cleavage site sequence is an RNase III target motif. In some embodiments, cleavage of the stabilizing 5 '-end sequence specific RNA cleavage site sequence results in a stabilizing structure 3’ of the cleavage site that is attached to the 5’ end of the msr sequence.
[0010] In aspects, any of the above nucleic acids can further include a structure-forming nucleic acid within the msd sequence.
[0011] Also provided are methods of generating retron nucleic acid in a cell by contacting the cell with any of the above nucleic acids and a reverse transcriptase or a nucleic acid encoding tire same.
[0012] Also provided are methods of editing a nucleic acid of a cell by contacting the cell with any of the above nucleic acids, a reverse transcriptase or a nucleic acid encoding the same, and a sequence specific endonuclease or a nucleic acid encoding the same, thereby altering the genomic sequence of the cell In some embodiments, the sequence specific endonuclease is a CRISPR-associated endonuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease, in some embodiments, the sequence specific endonuclease cuts both strands of a target DNA sequence, thereby generating a double-strand break in the DNA sequence. In some embodiments, the sequence specific endonuclease cuts a single strand of a target DNA sequence, thereby generating a nick in one strand of the DNA sequence. In some embodiments, the method further comprises contacting the cell with a guide RNA or a nucleic acid encoding the same, and the sequence specific endonuclease is a CRISPR-associated endonucl ease . [0013] Also provided are methods of treating a genetic disease in a subject in need, the method comprising administering to the subject an effective amount of any of the above nucleic acids, a reverse transcriptase or a nucleic acid encoding the same, and a sequence-specific endonuclease or a nucleic acid encoding the same. In some embodiments, the sequence specific endonuclease is a CRISPR-associated endonuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease, in some embodiments, the method further comprises administering a guide RNA or a nucleic acid encoding the same to the subject, and the sequence specific endonuclease is a CRISPR-associated endonuclease.
[0014] In any of the aspects or embodiments described herein, the CRISPR-associated endonuclease can be Cast, Cas1B, Cas2, C2c1, Cas3, Cas4, Cas5, CasSe (CasD), Cas6, Cas6e, Cas6f, Cas7, CasBa1, Cas8a2, CasSb, CasSe, Cas9 (Csn1 or Csx12), SpCas9, Fokl- dCas9, Cas10, Cas10d, Cast 2, Cas12a, Mad 7™, CasX, CasY, CasΦ >, CasF, CasG, CasH, Csy1, Csy2, Csy3, Cse1 (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, or Cu1966, or homologs or modified versions thereof.
|0015J Also provided is a nucleic acid of the disclosure for use as a medicament in treating a genetic disease in a subject. Also provided is a nucleic acid of the disclosure for use in the treatment of a genetic disease in a subject.
[0016] Also provided herein are nucleic acids encoding (a) a retron comprising: (i) a stabilizing 5' sequence; (ii) an msr sequence; (iii) au msd sequence; (iv) a subject expression sequence within the msd sequence; and (v) a first inverted repeat sequence and a second inverted repeat sequence; and (b) optionally, a guide RNA region. In some embodiments, the subject expression sequence comprises a donor sequence for homology directed repair (HDR). In some embodiments, the stabilizing 5' sequence comprises a stable RNA structure or a G -quadruples. In some embodiments, tire nucleic acid further comprises a 3 ribozyme sequence, a 3’ stabilizing stem-loop structure or 3' ribozyme sequence which leaves behind a stabilizing RNA structure. In some embodiments, the 3' ribozyme sequence is HDV or RiboJ. In some embodiments, the nucleic acid does not comprise a guide RNA region. BRIEF DESCRIPTION OF THE DRAWINGS [0017] FIGS. 1A-1B. shows an overview of the retron donor-guide cassette and the role of 5' and 3' ribozymes. FIG. 1A, A panel of ribozymes on the 5' and 3' side of the retron donor- guide cassete were tested, including no ribozyme (none), the hammerhead ribozyme (HHR), the hepatitis delta virus (HDV) ribozyme, the anti-genomic HDV (agFIDV) ribozyme, the U5 small nuclear ribonucleic acid (snRNA) stem-looop cleaved by the yeast RNase III enzyme Rntlp (U5 Rntlp SL), and the RiboJ ribozyme, which is a HHR-related ribozyme from the satellite R.NA of tobacco ringspot virus (sTRSV) followed by a 23 nucleotide synthetic stem loop (SL). The superscript scissors denote the location of cleavage sites relative to the ribozymes (or Rntlp stem loop for U5, which is cleaved on both sides), FIG. IB, The location of ribozymes in relation to the 5' cap and 3' po!yadenylic acid (polyA) tail. Depending on whether the ribozymes cleave on their 5' end or 3' end, their RNA structures can stay hound to the processed retron transcript or be released, exposing the 5' end of the retron or 3' end of the guide RNA. After ribozyme cleavage, the retron RNA is reverse transcribed by the reverse transcriptase (RT) at a conserved guanosine (G) to create an unusual 2 '-5' RNA-DNA branched structure. The RNA in the RNA-DNA hybrid generated by the RT is degraded by host-cell RNAse H, leaving a !ooped-out single-stranded donor deoxyribonucleic acid (DNA) as a template for homology directed repair.
[0018] FIGS. 2A-2D show the impact of 5' and 3' ribozymes on editing efficiency and retron complementary DNA (cDNA) production. FIG. 2A shows editing efficiency for all combinations of HDV, HHR, and no ribozyme in the 5' and 3' positions of a retron donor-guide cassette targeting the ADE2 locus. The guide is an 18-mer with designed mismatches at positions 19 and 20 from the protospacer adjacent motif (PAM). The retron donor introduces a CC-to-TG mutation which results in a premature termination codon. The y-axis indicates the editing efficiency quantified as the % of genomic reads mapping to the designed donor sequence. The x-axis is the number of generations in galactose, which induces Cas9, the RT, and the retron-donor guide transcript. FIG. 2B shows editing efficiency for ail possible ribozymes in the 5' position paired with the HDV fixed in the 3' position. FIG. 2C shows a diagram illustrating next-generation sequencing (NGS)-based quantification of retron donor cDNA levels in the absence of Cas9, Primers are designed to amplify both the single stranded donor template and the genomic target. Hie donor encodes a CC-to-TG mutation in the middle (asterisk), so the ratio of reads containing the donor mutation relative to the wild type (WT) genomic locus is proportional to the ratio of donor cDNA to genome copies. FIG. 2D shows the ribozyme combinations sorted left to right by greatest to least retron cDNA produced in galactose.
[O019] FIGS. 3A-3C show the impact of donor strand and RNA structures on retron msDNA levels and the HDK efficacy of retron donor, in FIG. 3A, the location of different RNA structure elements is shown relative to the terminal inverted repeats (triangles), msr, msd , and donor elements of the retron (left column), with how these structures fold up in the retron RNA (middle column) and msDNA (right column). FIG. 3B shows the impact of each of these structures on retron cDNA levels quantified by NGS. cDNA levels were tested with the RT under the control of the GAL 10 promoter and its no RT control (top two panels), or the ADffl promoter and its no RT control (bottom two panels), in either galactose (left) or glucose (right). The retron donor was expressed from the SNR52-tRNA(Tyr) hybrid promoter with a 5' HDV ribozyme. In FIG. 3C, the panel of LexA RNA and DMA structures was tested in a multiplexed competition experiment, where barcodes were inserted at the CAN! locus adjacent to two guide RNA target sites denoted as “+” and according to the strandedness of the guide RNAs. Plasmids harboring each LexA structure variant along with either Cas9 only (left) or Cas9 + RT (right) were transformed into distinct barcoded strains such that the barcode uniquely identifies the combination of donor and Cas9 plasmid. This approach allowed all strains to be pooled together equally prior to transforming a third plasmid containing only the guide RNA. Both strands of donor were tested and indicated by (+) for donors which bind to the bottom genomic strand (same as the strand to which the “+” guide binds) or (-) for donors binding to the top strand (same as the strand to which the guide binds). In each NGS read, both the barcode and edit site are sequenced, enabling quantitation of both editing efficiency (ratio of donor to WT on y-axis) and editing survival (total abundance on y-axis). Cas9 and RT w'ere expressed from the constitutive TEF1 mdADHl promoters, respectively. To simulate weaker guides, different lengths of 20 nt or 17 nt were used for the guide RNA from the constitutive small nucleolar RNA (SNR52) promoter. The RNase P Ribonucieoprotein (RPRI) promoter generates a stable leader sequence on the guide RNA while the SNR52 leader is efficiently processed by yeast RNase PI Rntlp. Hie fact that the RPRI 18mer shows lower efficiency than the SNR52 17mer clearly demonstrates the negative impact of extraneous 5' sequence on guide activity.
[0020] FIGS. 4A-4C show' retron production in human cells. FIG. 4A show's transfections of HEK293 cells with 250 ng of retron donor/RT plasmid. A donor DNA sequence to introduce an Xbal edit in the CACNA1D gene was inserted into the Ec86 (Ecol) retron msd loop and expressed with the Cytomegalovirus (CMV) promoter, a 5' HDV ribozyme, a 3' processing element from the metastasis-associated lung adenocarcinoma transcript 1 (MALAX 1 ), and the poly-adeny!ation signal from the bovine growth hormone (bGH) gene. Human-codon optimized Ec86 (Ecol) reverse transcriptase from the Ec86 retron was expressed from the human elongation factor 1 alpha-encoding gene (EF-1 alpha) promoter with its first intron along with Puromycm resistance gene (PuroR) and enhanced green fluorescent protein (GFP) genes separated by P2A and T2A peptide cleavage sequences, respectively. After gDNA extraction of the cells in each transfection rvell, there is residual untransfected plasmid which causes a substantial background signal when quantifying the retron levels by PCR-based approaches. To reduce this background plasmid, the samples were treated with the Xbal restriction enzyme as the retron donor encodes an Xbal site as the edit. Xbal, like most restriction enzymes, will only cleave double-stranded DNA, so even though the retron cDNA has an Xbal site it is single-stranded and will not be cleaved, and neither will the genomic DNA reference template. Primers flanking the Xbal site were used to quantify the template levels by two different methods, first with qPCR and then with NGS. FIG. 4B shows quantification of retron cDNA levels by qPCR. DNA was extracted from HEK293 ceils expressing the indicated retron construct with or without RT, Each DNA sample was normalized by Qubit dsDNA assay and analyzed by qPCR with a primer set amplifying either the CACNA1D donor (top) or the plasmid backbone (bottom) as a control. As biological controls, a CACNAID retron donor with no RT, as well as a retron donor for a different gene were used (HEK3), in which case the primers only amplify’ the genomic CACNA1D. Both the plasmid backbone primers and HEK3 irrelevant donor showed no change +/- Xbal, as expected. Xbal treatment removed the variability due to residual plasmid observed in the mock treatment between replicates. With Xbal treatment, the CACNAID with RT consistently gave lower Ct values than the CACNA1 D without RT. There w as a Ct value of 27 for genome only (2 templates per genome). Tims, a Ct of 22 for -RT (genome-tplasmid) means there are 25 or 32 genome equivalents of plasmid leftover (64 plasmids). There is a Ct value of 21 for +RT (genome+piasmid-+-cDNA). indicating an additional 32 genome equivalents, which translates to roughly 128 retron copies per cell, because i genome equivalent has two chromosomes each with 2 strands (4 total strands) in a human diploid cell line. In FIG. 4C, the CACNAID amplicons from Xbal -treated gDNA extractions were subjected to NGS, and the ratio of reads harboring an Xbal site to those containing the genomic sequence was calculated. Each cell has at least 4 strands of template for CACNAID, as the HEK293 cells are at least diploid for all chromosomes with some chromosomes being triploid, and with each allele having 2 strands. The reiron donor is single-stranded, so subtracting the donor: genome ratio of -RT from the +RT gives 20, which must then be multiplied by 4 strands for the genomic alleles to yield approximately 80 retron cDNA copies per cell.
DETAILED DESCRIPTION
[0021] Provided herein are composition and methods for generating expressed programmed msDNA from retron constructs in a cell at a surprisingly greater level over prior art methods and with substantially improved kinetics of genome editing by homology-directed repair (HDR) over prior art methods. The present invention provides compositions and methods for high-throughput genome editing and screening. The invention provides methods comprising the use of retrons and retron-guide RNA cassettes, vectors comprising said cassettes, and retron donor DNA-guide molecules of the present invention to modify nucleic acids of interest at target loci of interest, and to screen genetic loci of interest, in the genomes of host cells. The present invention also provides compositions and methods for preventing or treating genetic diseases by enhancing precise genome editing to correct a mutation in target genes associated with the diseases. Kits for genome editing and screening are also provided. The present invention can be used with any cell type and at any gene locus that is amenable to nuclease- mediated genome editing technology.
[0022] The practice of the present invention employs, unless otherwise indicated, conventional techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics and recombinant DMA, which are within the skill of the art. See Sambrook, Fritsch and Maniatis, Molecular Cloning: A Laboratory' Manual, 2nd edition (1989), Current Protocols in Molecular Biology (F. M. Ausubel, et al. eds., (1987)), the series Methods in Enzymology (Academic Press, Inc.): PCR 2: A Practical Approach (M. J. MacPherson, B. D. Hames and G. R. Taylor eds. (1995)), Harlow and Lane, eds. (1988) Antibodies, A Laboratory' Manual, and Animal Ceil Culture (R. I. Freshney, ed. (1987)).
[0023] For nucleic acids, sizes are given in either kilobases (kb), base pairs (bp), or nucleotides (nt). Sizes of single-stranded DNA and/or RNA can be given in nucleotides. These are estimates derived from agarose or acrylamide gel electrophoresis, from sequenced nucleic acids, or from published DNA sequences. For proteins, sizes are given in kilodaltons (kDa) or amino acid residue numbers. Protein sizes are estimated from gel electrophoresis, from sequenced proteins, from derived amino acid sequences, or from published protein sequences. [0024] Oligonucleotides that are not commercially available can be chemically synthesized, e.g., according to the solid phase phosphoramidite triester method first described by Beaucage and Caruthers, Tetrahedron Let, 22: 1859-1862 (1981), using an automated synthesizer, as described in Van Devan ter et. ah. Nucleic Acids Res. 12:6159-6168 (1984). Purification of oligonucleotides is performed using any art-recognized strategy, e.g., native acrylamide gel electrophoresis or anion-exchange high performance liquid chromatography (HPLC) as described in Pearson and Reamer, J, Chrom. 255: 137-149 (1983),
L Definitions
[0025] Before the present invention is further described, it is to be understood that this invention is not strictly limited to particular embodiments described, as such may of course vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the claims.
[0026] It must be noted that as used herein and m die appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context dearly dictates otherwise. It should further be understood that as used herein, the term “a” entity or “an” entity refers to one or more of that entity. For example, a nucleic acid molecule refers to one or more nucleic acid molecules. As such, the terms “a”, “an”, “one or more” and “at least one” can be used interchangeably. Similarly, the terms “comprising”, “including” and “having” can be used interchangeably .
[0027] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary'· skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present in vention, the preferred methods and materials are now described. All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. Hie publications discussed herein are pro vided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates, which may need to be independently confirmed. [0028] it is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination. All combinations of the embodiments are specifically embraced by the present invention and are disclosed herein just as if each and every combination were individually and explicitly disclosed. In addition, all sub-combinations are also specifically embraced by the present invention and are disclosed herein just as if each and every' such sub-combination were individually and explicitly disclosed herein.
[0029] It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,” ‘'only” and the like in connection with the recitation of claim elements, or use of a “negative” limitation.
[0030] As used herein, the term “about” means a range of values including the specified value, which a person of ordinary' skill in the art would consider reasonably similar to the specified value. In embodiments, about means within a standard deviation using measurements generally acceptable in the art. In embodiments, about means a range extending to +/- 10% of the specified value (e.g., +/- 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, or 10% of the specified value). In embodiments, about means the specified value.
[0031] The term “genome editing” refers to a type of genetic engineering in which DNA is inserted, replaced, or removed from a target DNA (e.g., the genome of a cell) using one or more nucleases and/or nickases. Tire nucleases create specific double-strand breaks (DSBs) at desired locations in the genome and harness the cell's endogenous mechanisms to repair the induced break by homology-directed repair (HDR) (e.g., homologous recombination) or by nonhomologous end joining (NHEJ). The nickases create specific single-strand breaks at desired locations in the genome. In one non-limiting example, two nickases can be used to create two single -strand breaks on opposite strands of a target DNA, thereby generating a blunt or a sticky end. Any suitable DNA nuclease can be introduced into a cell to induce genome editing of a target DNA sequence.
[0032] As used herein, the term “retron” is used in accordance with its plain ordinary' meaning and refers to a DNA sequence found in the genome of many bacteria species that codes for reverse transcriptase and a unique single-stranded ONA/RNA hybrid called multicopy single-stranded DNA (msDNA). The Retron msr-msd RNA is the non-coding RNA produced by retron elements and is the immediate precursor to the synthesis of msDNA. The retron msr RNA folds into a characteristic secondary structure that contains a conserved guanosine residue at the end of a stem loop. Synthesis of DNA by the retron-encoded reverse transcriptase (RT) results in a DNA/RNA chimera which is composed of small single-stranded DNA linked to small single-stranded RNA. The RNA strand is joined to the 5' end of the DNA chain via a 2.'— 5' phosphodiester linkage that occurs from the 2' position of the conserved internal guanosine residue. The retron operon carries a promoter sequence P that controls the synthesis of an RNA transcript carrying three loci: msr, nisei, and ret. Hie ret gene product, a reverse transcriptase, processes the msd/msr portion of the RNA transcript into msDNA. Retron elements are about 2 kb long. They contain a single operon controlling the synthesis of an RNA transcript carrying three loci, msr, rnsd , and ret, that are involved in msDNA synthesis. The DNA portion of msDNA is encoded by the msd region, the RNA portion is encoded by the msr region, while the product of the ref open-reading frame is a reverse transcriptase similar to the RTs produced by retroviruses and other types of retroelements. Like other reverse transcriptases, the retron RT contains seven regions of conserved amino acids, including a highly conserved tyr-ala-asp-asp (YADD) sequence associated with the catalytic core. The ret gene product is responsible for processing the msd/msr portion of the RNA transcript into msDNA.
[0033] As used herein, the term “reverse transcriptase” refers to its plain and ordinary meaning as an enzyme used to generate complementary DNA (cDNA) from an RNA template, a process termed reverse transcription.
[0034] As used herein, the terms "complementary" or "complementarity" refers to polynucleotides that are able to form base pairs with one another. Base pairs are typically formed by hydrogen bonds between nucleotide units in an anti-parallel orientation between polynucleotide strands. Complementary' polynucleotide strands can base pair in a Watson- Crick manner (e.g., A to T, A to U, C to G), or in any other manner that allows for the formation of duplexes. As persons skilled in the art are aware, when using RNA as opposed to DNA, uracil (U) rather than thymine (T) is the base that is considered to be complementary to adenosine. However, when a uracil is denoted in the context of the present disclosure, the ability to substitute a thymine is implied, unless otherwise stated. "Complementarity" may exist between two RNA strands, two DNA strands, or between a RNA strand and a DNA strand, it is generally understood that two or more polynucleotides may be "complementary" and able to form a duplex despite having less than perfect or less than 100% complementarity. Two sequences are "perfectly complementary" or "100% complementary" if at least a contiguous portion of each polynucleotide sequence, comprising a region of complementarity, perfectly base pairs with the other polynucleotide without any mismatches or interruptions within such region. Two or more sequences are considered "perfectly complementary" or "100% complementary" even if either or both polynucleotides contain additional non-complementary sequences as long as the contiguous region of complementarity within each polynucleotide is able to perfectly hybridize with the other. "Less than perfect" complementarity refers to situations where less than all of the contiguous nucleotides within such region of complementarity' are able to base pair with each other. Determining the percentage of complementarity between two polynucleotide sequences is a matter of ordinary skill in the art. For purposes of Cas9 targeting, a gRNA may comprise a sequence "complementary" to a target sequence (e.g., major or minor allele), capable of sufficient base-pairing to form a duplex (i.e., the gRNA hybridizes with the target sequence). Additionally, the gRNA may comprise a sequence complementary' to a sequence adjacent to a PAM sequence, wherein the gRNA also hybridizes with the sequence adjacent to a PAN! sequence in a target DNA.
[0035] The terms "hybridize" and "hybridization" refer to the formation of complexes between nucleotide sequences which are sufficiently complementary to form duplexes via Watson-Crick base pairing.
[0036] The term “DNA nuclease” refers to an enzyme capable of cleaving the phosphodiester bonds between the nucleotide subunits of DNA and may be an endonuclease or an exonuclease. According to the present invention, the DNA nuclease may be an engineered (e.g., programmable or targetable) DNA nuclease which can be used to induce genome editing of a target DNA sequence. Any suitable DNA nuclease can be used including, but not limited to, CRISPR-associated protein (Cas) nucleases, other endo- or exo-nucleases, variants thereof, fragments thereof, and combinations thereof.
[0037] The term “double-strand break” or “double-strand cut” refers to the severing or cleavage of both strands of the DNA double helix. The DSB may result in cleavage of both stands at the same posi tion leading to “blunt ends” or staggered cleavage resulting in a region of single -stranded DNA at the end of each DNA fragment, or “sticky ends”. A DSB may arise from the action of one or more DNA nucleases. [0038] The term “nonhomologous end joining" or “NHEJ” refers to a pathway that repairs double -strand DMA breaks in which the break ends are directly ligated without the need for a homologous template.
[0039] The term “homology-directed repair” or “HDR” refers to a mechanism in cells to accurately and precisely repair double-strand DNA breaks using a homologous template to guide repair. The most common form of HDR is homologous recombination (HR), a type of genetic recombination in which nucleotide sequences are exchanged between two similar or identical molecules of DNA.
[0040] The term “nucleic acid,” “nucleotide,” or “polynucleotide” refers to deoxyribonucleic acids (DNA), ribonucleic acids (RNA) and polymers thereof in either single-, double- or multi- stranded form. The term includes, but is not limited to, single-, double- or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and/or pyrimidine bases or other natural, chemically modified, biochemically modified, non-natural, synthetic or derivatized nucleotide bases, in some embodiments, a nucleic acid can comprise a mixture of DNA, RNA and analogs thereof. Unless specifically limited, the term encompasses nucleic acids containing known analogs of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides. Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions), alleles, orthoiogs, single nucleotide polymorphisms (SNPs), and complementary sequences as well as tire sequence explicitly indicated. Specifically, degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosme residues (Batzer et al., Nucleic Acid Res. 19:5081 (1991); Qhtsukaet al., J. Biol. Chem. 260:2605-2608 (1985); and Rossol ii et al., Mol. Cell. Probes 8:91-98 (1994)).
[0041] The term “single nucleo tide polymorphism” or “8NP” refers to a change of a single nucleotide within a polynucleotide, including within an allele. Tins can include the replacement of one nucleotide by another, as well as the deletion or insertion of a single nucleotide. Most typically, SNPs are bialielic markers although tri- and tetra-alielic markers can also exist. By way of non-limiting example, a nucleic acid molecule comprising SNP A\C may include a C or A at the polymorphic position. [0042] The term “gene” means the segment of DNA involved in producing a ribonucleic acid polymer, which in the case of protein coding genes can then be translated into a polypeptide chain. The DNA segment may include regions preceding and following the coding region (leader and trailer) involved in the transcription/translation of the gene product and the regulation of the transcription/translation, as well as intervening sequences (interns) between individual coding segments (exons).
[0043] The term “cassette” refers to a combination of genetic sequence elements that may be introduced as a single element and may function together to achieve a desired result. A cassette typically comprises polynucleotides in combinations that are not found in nature.
[0044] The term “operably linked” refers to two or more genetic elements, such as a polynucleotide coding sequence and a promoter, placed in relative positions that permit the proper biological functioning of the elements, such as the promoter directing transcription of the coding sequence.
[0045] The term “inducible promoter” refers to a promoter that responds to environmental factors and/or external stimuli that can be artificially controlled in order to modify the expression of, or the level of expression of, a polynucleotide sequence or refers to a combination of elements, for example an exogenous promoter and an additional element such as a trans-activator operably linked to a separate promoter. An inducible promoter may respond to abiotic factors such as oxygen levels or to chemical or biological molecules, in some embodiments, the chemical or biological molecules may be molecules not naturally present in humans.
[0046] The terms “vector” and “expression vector” refer to a nucleic acid construct, generated reeombmantly or synthetically, with a series of specified nucleic acid elements that permit transcription of a particular polynucleotide sequence in a host cell . An expression vector may be part of a plasmid, viral genome, or nucleic acid fragment. Typically, an expression vector includes a polynucleotide to be transcribed, operably linked to a promoter. The term “promoter” is used herein to refer to an array of nucleic acid control sequences that direct transcription of a nucleic acid. As used herein, a promoter includes necessary nucleic acid sequences near the start site of transcription, such as, in the case of a polymerase 11 type promoter, a TATA element, A promoter also optionally includes distal enhancer or repressor elements, which can be located as much as several thousand base pairs from die start site of transcription. Oilier elements that may be present in an expression vector include those that enhance transcription (e.g., enhancers) and terminate transcription (e.g., terminators).
|O047j “Recombinant” refers to a genetically modified polynucleotide, polypeptide, cell, tissue, or organism. For example, a recombinant polynucleotide (or a copy or complement of a recombinant polynucleotide) is one that has been manipulated using well known methods. A recombinant expression cassette comprising a promoter operahly linked to a second polynucleotide (e.g., a coding sequence) can include a promoter that is heterologous to the second polynucleotide as the result of human manipulation (e.g., by methods described in Sambrook et ah, Molecular Cloning — A Laboratory' Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., (1989) or Current Protocols m Molecular Biology Volumes 1-3, John Wiley & Sons, Inc. (1994-1998)). A recombinant expression cassette (or expression vector) typically comprises polynucleotides in combinations that are not found in nature. For instance, human manipulated restriction sites or plasmid vector sequences can flank or separate the promoter from other sequences. A recombinant protein is one that is expressed from a recombinant polynucleotide, and recombinant cells, tissues, and organisms are those that comprise recombinant sequences (polynucleotide and/or polypeptide).
|O048] As used herein, the term “'heterologous” refers to biological material that is introduced, inserted, or incorporated into a recipient (e.g., host) organism that originates from another organism. Typically, the heterologous material that is introduced into the recipient organism (e.g., a host cell) is not normally found in that organism. Heterologous material can include, but is not limited to, nucleic acids, ammo acids, peptides, proteins, and structural elements such as genes, promoters, and cassettes, A host cell can be, but is not limited to, a bacterium, a yeast cell, a mammalian cell, or a plant cell. Tire introduction of heterologous material into a host cell or organism can result, in some instances, in the expression of additional heterologous material in or by the host cell or organism. As a non-limiting example, the transformation of a yeast host cell with an expression vector that contains DNA sequences encoding a bacterial protein may result in the expression of the bacterial protein by the yeast ceil. The incorporation of heterologous material may be permanent or transient. Also, the expression of heterologous material may be permanent or transient.
[0049] The terms “reporter” and “selectable marker” can be used interchangeably and refer to a gene product that permits a cell expressing that gene product to be identified and/or isolated from a mixed population of cells. Such isolation might be achieved through the selective killing of cells not expressing the selectable marker, which may be, as a non-limiting example, an antibiotic resistance gene. Alternatively, the selectable marker may permit identification and/or subsequent isolation of cells expressing the marker as a result of the expression of a fluorescent protein such as GFP or the expression of a cell surface marker which permits isolation of cells by fluorescence-activated cell sorting (FACS), magnetic-activated cell sorting (MACS), or analogous methods. Suitable cell surface markers include CDS, CD 19, and truncated CD19. Preferably, ceil surface markers used for isolating desired cells are non-signaling molecules, such as subunit or truncated forms of CD8, CD 19, or CD20. Suitable markers and techniques are known in the art.
[005Q] The terms '‘culture,” “culturing,” “grow,” “growing,” “maintain,” “maintaining,”' “expand,” “expanding,” etc., when referring to cell culture itself or the process of culturing, can be used interchangeably to mean that a cell (e.g., yeast cell) is maintained outside its normal environment under controlled conditions, e.g., under conditions suitable for survival. Cultured cells are allowed to survive, and culturing can result in cell growth, stasis, differentiation or division. The term does not imply that all cells in the culture survive, grow, or divide, as some may naturally die or senesce. Cells are typically cultured in media, which can be changed during the course of the culture.
[0051] The terms “subject,” “individual,” and “patient” are used interchangeably herein to refer to a vertebrate, preferably a mammal, more preferably a human. Mammals include, but are not limited to, murines, simians, humans, faun animals, sport animals, and pets. Tissues, ceils and their progeny of a biological entity obtained in vivo or cultured in vitro are also encompassed.
[0052] As used herein, the term “administering” includes oral administration, topical contact, administration as a suppository, intravenous, intraperitoneal, intramuscular, intralesionai, intrathecal, intranasal, or subcutaneous administration to a subject. Administration is by any route, including parenteral and transmueosal (e.g., buccal, sublingual, palatal, gingival, nasal, vaginal, rectal, or transdermal). Parenteral administration includes, e.g., intravenous, intramuscular, intra-arteriole, intradermal, subcutaneous, intraperitoneal, intraventricular, and intracranial. Other modes of delivery include, but are not limited to, the use of liposomal formulations, intravenous infusion, transdermal patches, etc. Administering also refers to delivery of material, including biological material such as nucleic acids and/or proteins, into ceils by transformation, transfection, transduction, ballistic methods and/or electroporation. [0053] The term “treating” refers to an approach for obtaining beneficial or desired results including, but not limited to, a therapeutic benefit and/or a prophylactic benefit. By therapeutic benefit is meant any therapeutically relevant improvement in or effect on one or more diseases, conditions, or symptoms under treatment. For prophylactic benefit, the compositions may be administered to a subject at risk of developing a particular disease, condition, or symptom, or to a subject reporting one or more of the physiological symptoms of a disease, even though the disease, condition, or symptom may not have yet been manifested.
[0054] The term “effective amount” or “sufficient amount” refers to the amount of an agent that is sufficient to effect beneficial or desired results. The therapeutically effective amount may vary depending upon one or more of: the subject and disease condition being treated, the weight and age of the subject, the severity of the disease condition, the manner of administration and the like, which can readily be determined by one of ordinary’ skill in the art. The specific amount may van depending on one or more of: the particular agent chosen, the host cell type, the location of the host cell in the subject, the dosing regimen to be followed, whether it is administered in combination with other compounds, timing of administration, and the physical delivery system in which it is earned.
|0055J The term “pharmaceutically acceptable carrier” refers to a substance that aids the administration of an active agent to a cell, an organism, or a subject. “Pharmaceutically acceptable earner” refers to a earner or excipient that can be included in the compositions of the invention and that causes no significant adverse toxicological effect on the patient. Mon- limiting examples of pharmaceutically acceptable carrier include water, NaCl, normal saline solutions, lactated Ringer's, normal sucrose, normal glucose, cell culture media, and the like. One of skill m the art will recognize that other pharmaceutical carriers are useful in the present invention.
[0056] As used herein, the term “a stabilizing 5' sequence specific RNA cleavage site sequence” refers to a nucleic acid sequence 5' to a retron that, upon expression as an RNA, can be cleaved from the RNA and leaves a stabilizing sequence on the remaining retron. Nonlimiting examples of a stabilizing sequence can be the cleavage product of the Hepatitis Delta Virus (HDV) ribozyme, a stabilizing stem loop structure, a stem loop with a highly stable tetraloop, such as a GNRA or UNCG tetraloop, or a pseudoknot.
[0057] As used herein, die term “a stabilizing 5' ribozyme sequence” refers to a ribozyme 5' to a retron that, upon expression as an RNA, cleaves itself from the RNA and leaves a stabilizing sequence on the remaining retron. Non-limiting examples of a stabilizing sequence can be the cleavage product of the Hepatitis Delta Virus (HDV) ribozyme, a stabilizing stem loop structure, a stem loop with a highly stable tetraloop, such as a GNRA or UNCG tetraloop, or a pseudoknot.
[0058] As used herein, the term “a stem loop-stabilizing 5' ribozyme sequence” refers to a ribozyme 5' to a retron that, upon expression as an RNA, cleaves itself from the RNA and leaves a stabilizing sequence such as a stabilizing stem loop structure. A non-limiting example is RiboJ.
|O059j As used herein, the term “a 3' ribozyme sequence5’ refers to a ribozyme 3' to a retron. Non-limiting examples include a Hammerhead ribozyme, HDV, RiboJ, or CPEB3.
[0060] Tlie tenn “ribozyme” refers to an RNA molecule that is capable of catalyzing a biochemical reaction, in some instances, ribozyrnes function in protein synthesis, catalyzing the linking of amino acids in the ribosome. In other instances, ribozyrnes participate in various other RNA processing functions, such as splicing, viral replication, and tRNA biosynthesis, in some instances, ribozyrnes can be self-cleaving. Non-limiting examples of ribozyrnes include the HDV ribozyme, the Lariat capping ribozyme (formally called GIR1 branching ribozyme), the glmS ribozyme, group I and group II self-splicing introns, the hairpin ribozyme, the hammerhead ribozyme, various rRNA molecules, RNase P, the twister ribozyme, the VS ribozyme, the pistol ribozyme, and the hatchet ribozyme. Other examples include the selfcleaving ribozyme-containing R2 elements, the LITc retrotransposon found in Trypanosoma cruzi, short interspaced nuclear elements (SINEs) in Schistosomes, Penelope-like elements and retrozymes. For more information regarding ribozyrnes, see, e.g., Doherty, et al. Ann. Rev. Biophys. Biomol. Struct. 30: 457-475 (2001) and Weinberg, et ah, Nucleic Acids Research , (47) 18: 9480-9494 (2.019); incorporated herein by reference in its entirety for all purposes.
[0061] As used herein, the tenn “a structure -forming nucleic acid within tire msd sequence” refers to an exogenous nucl eic acid sequence inserted within the loop-forming structure of the msd sequence that is able to form a structured region of nucleic acid when expressed as a retron ncRNA. Hie exogenous nucleic acid sequence can be placed adjacent to the programmed ssDNA sequence (i.e., donor) in the same loop of the msd region, in the retron ncRNA form, the structure resides 3' of the donor or programmed ssDNA sequence. In the retron msDNA form, this becomes the 5' end. This structure can also be placed on the other side of the programmed ssDNA sequence. While not wishing to be held by theory·, this structure may aid the proper folding of the msr/msd structure in the retron ncRNA to enhance reverse transcription, or may enhance the stability of the msDNA and protect it from cellular nucleases.
|0062] “Percent similarity,” or “percent identity” in the context of polynucleotide or peptide sequences, is determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the sequence (c.g . an msr locus sequence) in the comparison window may comprise additions or deletions (i ,e., gaps) as compared to the reference sequence which does not comprise additions or deletions, for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleotide or amino acid occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of similarity (e.g., sequence similarity).
[0063] When a polynucleotide or peptide has at least about 70% similarity (e.g., sequence similarity), preferably at least about 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93, 94%, 95%, 96%, 97%, 98%, 99%, or 100% similarity, to a reference sequence, when compared and aligned for maximum correspondence over a comparison window, or designated region as measured using one of the following sequence comparison algorithms or by manual alignment and visual inspection, such sequences are then said to be “substantially similar.” With regard to polynucleotide sequences, this definition also refers to the complement of a test sequence.
[0064] For sequence comparison, typically one sequence acts as a reference sequence, to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters can be used, or alternative parameters can be designated. The sequence comparison algorithm then calculates the percent sequence similarities for the test sequences relative to the reference sequence, based on the program parameters. For sequence comparison of nucleic acids and proteins, the BLAST and BLAST 2.0 algorithms and the default parameters discussed below are used.
[0065] Methods of alignment of sequences for comparison are well-known in the art. Optimal alignment of sequences for comparison can he conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv, Appl. Math. 2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Nat'l. Acad. Sci. USA 85:2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA m tire Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, Wis.), or by manual alignment and visual inspection (see, e.g., Current Protocols in Molecular Biology (Ausubel et al., eds. 1995 supplement)).
[0066] Additional examples of algorithms that are suitable for determining percent sequence similarity are the BLAST and BLAST 2.0 algorithms, which are described in Altschul et ah, (1990) J. Mol. Biol. 215: 403-410 and Altschul et al. (1977) Nucleic Acids Res. 25: 3389-3402, respectively. Software for performing BLAST analyses is publicly available at the National Center for Biotechnology Information website, ncbi.nlm.nih.gov. The algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive -valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et ah, supra). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always >0) and N (penalty score for mismatching residues; always <0). The BLASTN program (for nucleotide sequences) uses as defaults a word size (W) of 28, an expectation (E) of 10, M=1, N=-2, and a comparison of both strands. For amino acid sequences, the BLAST? program uses as defaults a word size (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (see, e.g., Henikoff and Henikoff, Proe. Natl. Acad. Sci. USA 89: 10915 (1989)).
|0067] The BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin and Altschul, Proe. Nafl. Acad. Sei. USA, 90:5873-5787 (1993)). One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or ammo acid sequences would occur by chance. For example, a nucleic acid is considered similar to a reference sequence if the smallest sum probabil ity in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.2, more preferably less than about 0.01, and most preferably less than about 0.001. [0068] Another method of establishing percent identity in the context of the present disclosure is to use tire MPSRCH package of programs copyrighted by the University of Edinburgh, developed by John F. Collins and Shane S, Sturrok, and distributed by TntelliGenetics, Inc. (Mountain View, CA). From this suite of packages, the Smith Waterman algorithm can be employed where default parameters are used for the scoring table (for example, gap open penalty of 12, gap extension penalty of one, and a gap of six). From the data generated the "Match" value reflects "sequence identity." Other suitable programs for calculating the percent identity or similarity between sequences are generally known in the art, for example, another alignment program is BLAST, used with default parameters. For example, BLASTN and BLASTP can be used with the following default parameters: genetic code = standard; fdter = none; strand = both; cutoff = 60; expect = 10; Matrix = BLOSUM62; Descriptions = 50 sequences; sort by = HIGH SCORE; Databases = non-redundant, GenBank + EMBL + DDBJ + PDB + GenBank CDS translations + Swiss protein + Spupdate + FIR. Details of these programs are readily available,
[0069] Alternatively, homology can be determined by hybridization of polynucleotides under conditions which form stable duplexes between homologous regions, followed by digestion with single stranded specific nuclease(s), and size determination of the digested fragments. DNA sequences that are substantially homologous cart be identified in a Southern hybridization experiment under, for example, stringent conditions, as defined for that particular system. Defining appropriate hybridization conditions is within the skill of the art. See, e.g,, Sambrook et al., supra,' DNA Cloning, supra; Nucleic Acid Hybridization, supra.
[0070] The term "homologous region" refers to a region of a nucleic acid with homology to another nucleic acid region. Thus, whether a "homologous region" is present in a nucleic acid molecule is determined with reference to another nucleic acid region in the same or a different molecule. Further, since a nucleic acid is often double-stranded, the term "homologous, region," as used herein, refers to the ability of nucleic acid molecules to hybridize to each other. For example, a single-stranded nucleic acid molecule can have two homologous regions which are capable of hybridizing to each other. Thus, the term "homologous region" includes nucleic acid segments with complementary sequences. Homologous regions may vary in length but will typically be between 4 and 500 nucleotides (e.g., from about 4 to about 40, from about 40 to about 80, from about 80 to about 120, from about 120 to about 160, from about 160 to about 200, from about 200 to about 240, from about 240 to about 280, from about 280 to about 320, from about 320 to about 360, from about 360 to about 400, front about 400 to about 440, etc.). [0071] “Genetic disease” as used herein refers to a disease, partially or completely, directly or indirectly, caused by one or more abnormalities in the genome, especially a condition that is present from birth. The abnormality may be a mutation, an insertion or a deletion. The abnormality' may affect the coding sequence of the gene or its regulatory sequence. The genetic disease rnay be selected from the group consisting of an inherited muscle disease (e.g., congenital myopathy or a muscular dystrophy), a lysosomal storage disease, a heritable disorder of connective tissue, a neurodegenerative disorder, and a skeletal dysplasia. For example, the genetic disease may be, but is not limited to, Duchenne muscular dystrophy (DMD), Becker's muscular dystrophy, Lamb-girdle muscular dystrophy, dysferlinopathy, dystroglyeanopatliy, aspartylglucosaminuria, Batten disease, cystinosis, Fabry-7 disease, Gaucher disease, Pompe disease, Tay Sachs disease, Sandhoff disease, meiachromatic leukodystrophy, mucolipidosis, mucopolysaccharide storage diseases, Niemann-Pick disease, Schindler disease, Krabbe disease, Ehlers-Danlos syndrome, epidermolysis bullosa, Marfan syndrome, neurofibromatosis, spinal muscular atrophy, amyotrophic lateral sclerosis, progressive muscular atrophy, fragile X syndrome, Charcot-Marie-Tooth disease, osteogenesis imperfecta, achondroplasia, or osteopetrosis. Other genetic diseases include hemophilia, cystic fibrosis, Huntington's chorea, familial hypercholesterolemia (LDL receptor defect), hepatoblastoma, Wilson's disease, congenital hepatic porphyria, inherited disorders of hepatic metabolism, Lesch Nyhan syndrome, sickle cell anemia, thalassaemias, xeroderma pigmentosum, Faneoms anemia, retinitis pigmentosa, ataxia telangiectasia, Bloom's syndrome, retinoblastoma, and Tay-Saehs disease.
[0072] The term "transfection" is used to refer to the uptake of foreign DNA by a cell. A cell has been "transfected" when exogenous nucleic acids have beeu introduced inside the cell membrane. A number of transfection techniques are generally known in the art. See, e.g., Graham etal. (1973) Virology, 52:456, Sarnbrook etal. (2001) Molecular Cloning, a laboratory-· manual, 3rd edition, Cold Spring Harbor Laboratories, New York, Davis et al. (1995) Basic Methods in Molecular Biology, 2nd edition, McGraw-Hill, and Chu et al. (1981) Gene 13: 197. Such techniques can be used to introduce one or more exogenous nucleic acid moieties into suitable host cells. The term refers to both stable and transient uptake of the genetic material and includes uptake of peptide- or antibody-linked nucleic acids.
|O073] The term "donor polynucleotide" or “'donor sequence” refers to a polynucleotide that provides a sequence of an intended edit to be integrated into the genome at a target locus by HDR II, Compositions
[0074] Provided herein, in an aspect, are nucleic acids encoding a retron that include (a) a stabilizing 5' ribozymc sequence, (h) an msr sequence, (c) an msd sequence, (d) a subject expression sequence within the msd sequence, and (e) a first inverted repeat sequence and a second inverted repeat sequence, wherein the nucleic acid does not include a guide RNA coding region. Applicants surprisingly found that such nucleic acids, when expressed in a cell, allowed for greater retron production. Applicants also found that such compositions lead to significantly higher gene editing efficiency when employed with gene editing technology. Such effects were not seen in prior-described constructs wherein the nucleic acid includes a guide RNA region.
[0075] In aspects, the subject expression sequence is a donor sequence for homology directed repair (HDR). in aspects, the nucleic acid includes a 3' ribozyme sequence. Any 5' ribozymc can be used as long as it leaves a stabilizing sequence when cleaved from the retron. The 5' ribozyme sequence or sequences can be a Hammerhead Ribozyme, HDY ribozyme, RiboJ, CPEB3, Again 1 1, Agam 2 2, Pmar 1, Bflo 1, Bflo 2, Spur 1, Spur 2, Spur 3, Spur 4, Ppae_L Cjap_l, FpraJ, CJVJ, Dpap_L Tatr _1, CPEB3, G HDY. AJHDV,
Canis__familiaris/l/3 73, Felis_catus_domestic_cat/l/3 74,
Ailuropoda melanoleuca Giant p/3 73, Elephant'' 113/4 75,
PongoAbclii SumatranOrangutan//! 66, MicrocebusMurmus MouseLemur/i/l 66, TupaiaBelangeri__NorthemTreesh/l 66, Rabbit/84/4 75, Human. ChrlO/290/4 75,
Chimp PanTroglodytes/49/4 75, Rhesus/23/4 75, Macacajnulatta/1/1 70,
SorexArane us CommonShrew,/ 1 / 1 66, Mouse .cirri 9__ CPEB3/70/4 75, Rat. Chr 1/411/4 74, EquusCaballus/I/I 69, Lamajpacos__Alpaca/T/l 70, Opossum/55/4 75, Macropus__eugenii_Tammar_Wahab/l 72, Monodelphis__domestica__Grey_Sho/l 71, CaviaPorcellus GuineaPig/1/ 1 66, OchotonaPrinceps AmericanPike/71 66,
Dasypus novemcinctus Nine Band/2 73, Choloepus hoffmanni Hofmanns t/4 75, MyotisLucifugiis_BrownBat/I/i 60, Cow/112/4 75, Callitbrix_jacclius__Cominon__mann/i 69, Tursiops_truneatus_Botlenose_/l 70, EchinopsTelfairi__HedgehogTenre/2 72, Sus_Scrofa/l/2 71, Dipodomys ordsii Ords Kangaroo/3 72, or Pteropus vampyms Malayan Flyi/ 1 69, Goril!a__goril!a. Examples of a 3' ribozyme include, but are not limited to Hammerhead ribozyme, HDV, RiboJ, or CPEB3. In some embodiments, nucleic acid encoding the retron furthers comprises a stem-loop sequence located between (a) the stabilizing 5' ribozyme sequence and (b) tire msr sequence. [0076] In another aspect, provided herein are nucleic acids encoding (a) a retron that includes (i) a stem loop-stabilizing 5' ribozyme sequence, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, and (b) a guide RNA coding region. Applicants surprising found that such nucleic acids, when expressed in a cell, allowed for greater retron production. Applicants also found that such compositions lead to significantly higher gene editing efficiency when employed with gene editing technology. Such effects were not seen in prior-described constructs wherein the nucleic acid includes an HDV ribozyme. In aspects, the subject expression sequence is a donor sequence for homology directed repair (HDR). In aspects, the nucleic acid includes a 3' ribozyme sequence. Any 5' ribozyme can be used as long as it leaves a stem loop sequence or pseudoknot when cleaved from the retron. An example of a 3' ribozyme includes, but is not limited to RihoJ.
[0077] Also provided herein are nucleic acids encoding a retron that include (i) a stabilizing 5' sequence specific RNA cleavage site sequence, (ii) an msr sequence, (iii) an rnsd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat, sequence, and a guide RNA coding region. In aspects, the nucleic acids include a donor sequence for homology directed repair (HDR). In aspects, the nucleic acids include a 3' ribozyme sequence. Many site-specific RNAse cleavage sites are known in the art. Among these include two general strategies: (1) include a motif for an RNAse which the host cell already has, or (2) include a motif for an RNAse which the host cell doesn't have, in which case one also would supply the RNAse (for example, the Pumilio-RNase fusion or the Csy4 nuclease below). Hie advantage of the first method is that the RNase does not have to be expressed separately. The second approach is more generalizable.
|0078J For example, Rnase ill target motifs can be used. In S.cerevisiae where Rntlp is the RNase ΪP enzyme, stem-loops with NGNN tetraloops and at. least 10-14 bp of stem, which can include imperfect complementarity, can be targeted for cleavage. This includes targets from the 35S ribosomai RNA precursor, precursors for the small nuclear RNAs U 1, U2, U4, and U5, the small nucleolar RNAs U3, 1114, SNR52, SNR47, SNR48, and others, and targets within rnRNAs, including BDF2, RPL18A. RPS22B, among others. See e.g. Roy KR, Chanfreau G. The diverse functions of fungal RNase III enzymes in RNA metabolism. Hie Enzymes. 2012. Volume 31 part A: 123-145. doi: 10.1016/B978-0-12-404740-2.00010-0. PMID: 27166447 (book chapter). [0079] In humans, Drosha is the primary nuclear RNase III enzyme. Drosha will process primary miRNAs (pri-miRNAs), which contain one or more characteristic hairpin structures. These miRNA hairpins are recognized and cleaved by the nuclear Microprocessor complex, a heterotrimeric complex consisting of one molecule of DROSHA, an RNase III enzyme, and two molecules of DGCR8, a double-stranded RNA (dsRNA)-binding protein, to release ~60- 80 nt precursor miRNAs (pre-miRNAs). See, e.g. RNA Biol. 2018; 15(2): 186-193. Published online 2017 Dec 12. doi: 10.1080/15476286.2017.1405210 PMCID: PMC5798959 PMID: 29171328
[0080] in other aspects, Pumilio domains can be fused to a general RNA cleavage domain to recognize specific targets for cleavage or, for examples, one can utilize a specific stem-loop (5 '-GUUC ACUGCCGUAUAGGCAGCU-3 ') targeted by the CRISPR Csy4 nuclease.
[0081] In addition to any of the above nucleic acids disclosed above, the nucleic acid can further include a structure-forming nucleic acid within the msd sequence.
[0082] in addition to stabilizing 5' ribozyme sequences, it will be understood that the nucleic acids encoding a retron can include other stabilizing sequences or structures at the 5’ end, such as G-quadrup!exes. Thus, iu another aspect, provided herein are nucleic acids encoding (a) a retron that includes (i) a 5’ stem loop stabilizing sequence, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, and (b) a guide RNA coding region. In some embodiments, the 5" stem loop stabilizing sequence comprises a G-quadraplex.
III. Reirons
Exemplary’ retrons comprising msr, msd, and inverted repeat sequences that can be used in the nucleic acids of the disclosure are provided in Table 1. The retrons in Table 1 also express reverse transcriptases that can be used in the methods of the disclosure.
Table 1.
Figure imgf000028_0001
Figure imgf000029_0001
Research, Volume 47, Issue 21, 02 December 2019, Pages 11007 -- 11019).
[0083] In some embodiments, the retron encoded by the nucleic acids described herein is a Retron-Ecol (Ec86) retron. IV. Methods of use
[0084] Provided herein are methods of generating retron nucleic acid in a cell by contacting the ceil with (a) any of the nucleic acid compositions described above encoding a retron that includes (i) a stabilizing 5' ribozyme sequence sequences, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, wherein the nucleic acid does not comprise a guide RNA and (b) a reverse transcriptase or a nucleic acid encoding the same. In embodiments, the subject expression sequence includes a donor sequence tor homology directed repair (HDR). In aspects, the nucleic acid further includes a 3' ribozyme sequence.
[0085] Also provided herein are methods of generating retron nucleic acid in a cell by- contacting the cell with (a) any of the nucleic acid compositions described above encoding a retron that includes (i) a stem loop-stabilizing 5' ribozyme sequence, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, (b) a guide RNA coding region, and (c) a reverse transcriptase or a nucleic acid encoding the same. In embodiments, the subject expression sequence includes a donor sequence for homology directed repair (HDR). In aspects, the nucleic acid further includes a 3' ribozyme sequence.
[0086] Also provided herein are methods of editing DNA in a cell, comprising contacting the ceil with (a) any of the compositions described above encoding a retron that includes (i) a stabilizing 5' ribozyme sequence sequences, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, wherein said nucleic acid does not comprise a guide RNA coding region, (b) a reverse transcriptase or a nucleic acid encoding the same, and (c) a sequence specific endonuclease or a nucleic acid encoding, thereby editing the DNA of the cell.
[0087] Also provided herein are methods of editing DNA in a cell, comprising contacting the ceil with (a) any of the compositions described above encoding a retron that includes (i) a stem loop-stabilizing 5' ribozyme sequence sequences, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, (b) a reverse transcriptase or a nucleic acid encoding the same, and (e) a sequence specific endonuclease or a nucleic acid encoding, thereby editing the DNA of the cell. [0088] In aspects, the sequence specific endonuclease is a CRISPR-associated endonuclease, a Zinc-finger nuclease, a Transcription acfivator-like effector nuclease (TALEN), or a meganuclease. In aspects, the sequence specific endonuclease is a CRISPR-associated endonuclease, and wherein the method comprises administering to the subject one or more a guide RNAs (gRNAs), or one or more nucleic acids encoding the same.
[0089] Provided herein are methods of treating a genetic disease in a subject in need comprising administering to the subject (a) any of the nucleic acid compositions described above encoding a retron that includes (i) a stabilizing 5' ribozyme sequence sequences, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second inverted repeat sequence, wherein the nucleic acid does not comprise a guide RNA and (b) a reverse transcriptase or a nucleic acid encoding the same, and (c) a sequence specific endonuclease or a nucleic acid encoding the same, thereby editing the DNA of the cell.
[0090] Also provided herein are methods of treating a genetic disease in a subject in need comprising administering to the subject (a) any of the nucleic acid compositions described above encoding a retron that includes (i) a stem loop-stabilizing 5' ribozyme sequence sequences, (ii) an msr sequence, (iii) an msd sequence, (iv) a subject expression sequence within the msd sequence, and (v) a first inverted repeat sequence and a second invested repeat sequence, (b) a reverse transcriptase or a nucleic acid encoding the same, and (c) a sequence specific endonuclease or a nucleic acid encoding the same, thereby editing the DNA of the cell.
[0091] In aspects, the sequence specific endonuclease is a CRISPR-associated endonuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease, in aspects, the sequence specific endonuclease is a CRISPR-associated endonuclease, and wherein the method comprises administering to the subject one or more a guide RNAs (gRNAs), or one or more nucleic acids encoding the same.
[0092] Genome editing may be performed on a single cell or a population of cells of interest and can be performed on any type of cell, including any cell from a prokaryotic, eukaryotic, or archaeon organism, including bacteria, archaea, fungi, protists, plants, and animals. Cells from tissues, organs, and biopsies, as well as recombinant cells, genetically modified cells, cells from eel! lines cultured in vitro , and artificial cells (e.g., nanoparticles, liposomes, polymersomes, or microcapsules encapsulating nucleic acids) may all be used in the practice of the present disclosure. The methods of the disclosure are also applicable to editing of nucleic acids in cellular fragments, cell components, or organelles comprising nucleic acids (e.g., mitochondria in animal and plant cells, plastids (e.g., chloroplasts) in plant cells and algae). Cells may be cultured or expanded prior to or after performing genome editing as described herein. In one embodiment, the cells are yeast cells. In another embodiment, the cells are mammalian cells.
[0093] An RNA-guided nuclease can be targeted to a particular genomic sequence (i.e., genomic target sequence to be modified) by altering its guide RNA sequence. A target-specific guide RNA comprises a nucleotide sequence that is complementary'· to a genomic target sequence, and thereby mediates binding of the nuclease-gRNA complex by hybridization at the target site. For example, the gRNA can be designed with a sequence complementary to the sequence of a minor allele to target the nuclease-gRNA complex to the site of a mutation. The mutation may comprise an insertion, a deletion, or a substitution. For example, the mutation may include a single nucleotide variation, gene fusion, translocation, inversion, duplication, frame shift, missense, nonsense, or other mutation associated with a phenotype or disease of interest. The targeted minor allele may be a common genetic variant or a rare genetic variant. In certain embodiments, the gRNA is designed to selectively bind to a minor allele with single base-pair discrimination, for example, to allow binding of the nuclease-gRNA complex to a single nucleotide polymorphism (8NP). In particular, the gRNA may he designed to target disease-relevant mutations of interest for the purpose of genome editing to remove the mutation from a gene. Alternatively, the gRNA can be designed with a sequence complementary7 to the sequence of a major or wild-type allele to target the nuclease-gRNA complex to the allele for the purpose of genome editing to introduces a mutation into a gene in the genomic DNA of the ceil, such as an insertion, deletion, or substitution. Such genetically modified ceils can be used, for example, to alter phenotype, confer new properties, or produce disease models for drug screening.
[0094] In certain embodiments, the RNA-guided nuclease used for genome modification is a clustered regularly interspaced short palindromic repeats (CRJSPR) system Cas nuclease. Any RNA-guided Cas nuclease capable of catalyzing site-directed cleavage of DNA to allow integration of donor polynucleotides by the HDR mechanism can be used in genome editing, including CRISPR system type I, type II, or type III Cas nucleases. Examples of Cas proteins include Cast, Cas IB, Cas2, Cas3, Cas4, Cas5, CasSe (CasD), Cas6, Cas6e, Cas6f, Cas7, CasSa1, Cas8a2, CasBb, Cas8c, Cas9 (Csnl or Csx12), Cas 10, Cas10d, Cas 12, Cas 12a (e.g. Cpfl, Mad7), CasX, CasY, CasO, CasF, CasG, CasH, Csy1, Csy2, Csy3, Cse1 (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Csc1, Csc2, Csa.5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm 6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, and Cu1966, and homologs or modified versions thereof.
[0095] In certain embodiments, a type ii CR1SPR system Cas9 endonuclease is used. Cas9 nucleases from any species, or biologically active fragments, variants, analogs, or derivatives thereof that retain Cas9 endonuclease activity (i.e., catalyze site-directed cleavage of DNA to generate double-strand breaks) may be used to perform genome modification as described herein. The Cas9 need not be physically derived from an organism but may be synthetically or recombinantly produced, Cas9 sequences from a number of bacterial species are well known in the art and listed in the National Center for Biotechnology Information (NCBI) database. See, tor example, NCBI entries for Cas9 from: Streptococcus pyogenes (WP 002989955, WP_ 038434062, WP_011528583); Campylobacter jejuni (WP..022552435, YP_ 002344900), Campylobacter coli (WP_060786i 16); Campylobacter fetus (WP_059434633); Corynebacterium ulcerans (NC 015683, NC 017317); Corynebacteriurn diphtheria (NC 016782, NC 016786); Enterococcus faecalis (WP 033919308); Spiroplasma syrphidicola (NC_Q21284); Prevotella intermedia (NC_017861); Spiroplasma taiwanense (NC_021846); Streptococcus iniae (NC_021314); BellieUa baltica (NC_018010); Psychroflexus lorquisl (NC 018721); Streptococcus thermophilus (YP 820832), Streptococcus mutans (WP 061046374, WP 024786433); Listeria innocua (NP 472073); Listeria monocytogenes (WP_061665472); Legionella pneumophila (WP_062726656); Staphylococcus aureus (WP_001573634); Francisella tularensis (WP_032729892, WP 014548420), Enterococcus faecalis (WP 033919308); Lactobacillus rhamnosus (WP_048482595 , WP_032965177); and Neisseria meningitidis (WP_061704949,
YP_002342100); all of which sequences (as entered by the date of filing of this application) are herein incorporated by reference. Any of these sequences or a variant thereof comprising a sequence having at least about 70-100% sequence identity thereto, including any percent identity within this range, such as 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% sequence identity thereto, can be used for genome editing, as described herein. See also Fonfara et al. (2014) Nucleic Acids Res. 42(4):2577-90; Kapitonov et al. (2015) i Bacterid. 198(5);797-807, Shmakov et al. (2015) Mol. Cell. 60(3):385-397, and Chylinski et al. (2014) Nucleic Acids Res. 42(10):6Q91 -6105); for sequence comparisons and a discussion of genetic diversity and phylogenetic analysis of Cas9. [0096] The CRISPR-Cas system naturally occurs in bacteria and archaea where it plays a role m RNA-mediated adaptive immunity against foreign DNA. The bacterial type P CRISPR system uses the endonuclease, Cas9, which forms a complex with a guide RNA (gRNA) that specifically hybridizes to a complementary' genomic target sequence, where the Cas9 endonuclease catalyzes cleavage to produce a double-stranded break. Targeting of Cas9 typically further relies on the presence of a 3' protospacer-adjacent motif (PAM) in the DNA directly downstream of the gRNA-bindmg site.
[0097] The genomic target site will typically comprise a nucleotide sequence that is complementary to the gRNA and may further comprise a protospacer adjacent motif (PAM). In certain embodiments, the target site comprises 20-30 base pairs in addition to a 3 base pair PAM. Typically, the first nucleotide of a PAM can be any nucleotide, while the two other nucleotides will depend on the specific Cas9 protein that is chosen. Exemplary PAM sequences are known to those of skill m the art and include, without limitation, NNG, NGN, NAG, and NGG, wherein N represents any nucleotide, in certain embodiments, the allele targeted by a gRNA comprises a mutation that creates a PAM within the allele, wiierein the PAM promotes binding of the Cas9-gRNA complex to the allele.
]O098] In certain embodiments, the gRNA is 5-50 nucleotides, 10-30 nucleotides, 15-25 nucleotides, 18-22 nucleotides, or 19-21 nucleotides in length, or any length between the stated ranges, including, for example, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides in length. The guide RNA may be a single guide RNA comprising crRNA and traeiRNA sequences in a single RNA molecule, or the guide RNA may comprise two RNA molecules with crRNA and tracrRNA sequences residing in separate RNA molecules.
[0099] In another embodiment, the CRISPR nuclease from Prevotella and Francisella 1 (Cpfl) may be used. Cpfl is another class P CRISPR/Cas system RNA-guided nuclease with similarities to Cas9 and may be used analogously. Unlike Cas9, Cpfl does not require a tracrRNA and only depends on a crRNA in its guide RNA, wdiich provides the advantage that shorter guide RNAs can be used with Cpfl for targeting than Cas9. Cpfl is capable of cleaving either DNA or RNA. The PAM sites recognized by Cpfl have the sequences 5'-YTN-3' (where "Y" is a pyrimidine and "N" is any nucleobase) or 5'-TTN-3', in contrast to the G-rich PAM site recognized by Cas9. Cpfl cleavage of DNA produces double-stranded breaks with a sticky -ends having a 4 or 5 nucleotide overhang. For a discussion of Cpfl, see, e.g., Ledford et al. (2015) Nature. 526 (7571): 17-17, Zetsche et al. (2015) Cell. 163 (3):759-771, Murovec et al. (2017) Plant Biotechnol. J. I5{8):917-926, Zhang et al. (2017) Front. Plant Sci. 8: 177, Fernandes et al. (2016) Postepy Biochem. 62(3):315-32.6; herein incorporated by reference.
[0100] C2clis another class 11 CRISPR/Cas system RNA-guided nuclease that may be used. C2cl, similarly to Cas9, depends on both a crRNA and tracrRNA for guidance to target sites. For a description of C2cl, see, e.g., Shmakov et al. (2015) Mol Cell. 60(3):385-397, Zhang et al. (2017) Front Plant Sci. 8: 177; herein incorporated by reference.
[0101] In yet another embodiment, an engineered RNA-guided Fokl nuclease may be used. RNA-guided Fold nucleases comprise fusions of inactive Cas9 (dCas9) and the Fokl endonuclease (FokI-dCas9), wherein the dCas9 portion confers guide RNA-dependent targeting on Fokl. For a description of engineered RNA-guided Fokl nucleases, see, e.g., Havlicek et al. (2017) Mol. Ther. 25(2): 342-355, Pan et al. (2016) Sci Rep. 6:35794, Tsai et al. (2014) Nat Biotechnol. 32(6):569-576; herein incorporated by reference.
[0102] The RNA-guided nuclease can be provided in the form of a protein, such as the nuclease complexed with a gRNA, or provided by a nucleic acid encoding the RNA-guided nuclease, such as an RNA (e.g., messenger RNA) or DNA (expression vector). Codon usage may be optimized to improve production of an RNA-guided nuclease in a particular cell or organism. For example, a nucleic acid encoding an RNA-guided nuclease can be modified to substitute codons having a higher frequency of usage in a yeast cell, a bacterial cell, a human ceil, a non-human cell, a mammalian cell, a rodent cell, a mouse ceil, a rat cell, or any other host cell of interest, as compared to the naturally occurring polynucleotide sequence. When a nucleic acid encoding the RNA-guided nuclease is introduced into cells, the protein can be transiently, conditionally, or eonstitutively expressed m the ceil.
[0103] Donor polynucleotides and gRNAs are readily synthesized by standard techniques, e.g., solid phase synthesis via phosphoramidite chemistry, as disclosed in U.8. Patent Nos. 4,458,066 and 4,415,732, incorporated herein by reference; Beaucage et al., Tetrahedron (1992) 48:2223-2311 ; and Applied Biosystems User Bulletin No. 13 (1 April 1987), Other chemical synthesis methods include, for example, the phosphotriester method described by Narang et al.,Meth. Enzymoi, (1979) 68:90 and the phosphodiester method disclosed by Brown et al., Meth. Enzymoi (1979) 68:109, In view of the short lengths of gRNAs (typically about 20 nucleotides in length) and donor polynucleotides (typically about 100-150 nucleotides), gRNA-donor polynucleotide cassettes can be produced by standard oligonucleotide synthesis techniques and subsequently ligated into vectors. Moreover, libraries of gRNA-donor polynucleotide cassettes directed against thousands of genomic targets can be readily created using highly parallel array-based oligonucleotide library synthesis methods (see, e.g., Cleary et al. (2004) Nature Methods 1 : 241-248, Svensen et al. (2011) PLoS One 6(9):e24906).
[0104] In addition, adapter sequences can be added to oligonucleotides to facilitate high- throughput amplification or sequencing . For example, a pair of adapter sequences can be added at the 5' and 3' ends of an oligonucleotide to allow amplification or sequencing of multiple oligonucleotides simultaneously by the same set of primers. Additionally, restriction sites can be incorporated into oligonucleotides to facilitate cloning of oligonucleotides into vectors. For example, oligonucleotides comprising gRNA-donor polynucleotide cassettes can be designed with a common 5' restriction site and a common 3' restriction site to facilitate ligation into the genome modification vectors. A restriction digest that selectively cleaves each oligonucleotide at the common 5' restriction site and the common 3' restriction site is performed to produce restriction fragments that can be cloned into vectors (e.g., plasmids or viral vectors), followed by transformation of cells with the vectors comprising the gRNA-donor polynucleotide cassetes.
[0105] Amplification of polynucleotides encoding gRNA-donor polynucleotide cassettes may be performed, for example, before ligation into genome modification vectors or before sequencing after barcoding. Any method for amplifying oligonucleotides may be used, including, but not limited to polymerase chain reaction (PCR), isothermal amplification, nucleic acid sequence-based amplification (NASBA), transcription mediated amplification (TMA), strand displacement amplification (SDA), and ligase chain reaction (LCR). In one embodiment, the genome editing cassettes comprise common 5' and 3' priming sites to allow amplification of the gRNA-donor polynucleotide sequences in parallel with a set of universal primers. In another embodiment, a set of selective primers is used to selectively amplify a subset of the gRNA-donor polynucleotides from a pooled mixture.
[0106] Cells that are transformed with recombinant polynucleotides comprising the genome editing cassettes may be prokaryotic cells or eukaryotic cells, and are preferably designed for high-efficiency incorporation of gRNA-donor polynucleotide libraries by transformation. Methods of introducing nucleic acids into a host ceil arc well known in tire art. Commonly used methods of transformation include chemically induced transformation, typically using divalent cations (e.g., CaCh), and electroporation. See, e.g., Sambrook etal. (2001) Molecular Cloning, a laboratory manual, 3rd edition. Cold Spring Harbor Laboratories, New York, Davis et ai. (1995) Basic Methods in Molecular Biology, 2nd edition, McGraw-Hill, and Chu et ai. (1981) Gene 13: 197 ; herein incorporated by reference in their entireties.
[0107] It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents, and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes.
EXAMPLES
Example 1: Ribozyme-enhanced production of single-stranded donor DNA by bacterial retrons
[0108] Retrons are bacterial genes which encode a reverse transcriptase (RT), a non-coding ribonucleic acid RNA (ncRNA) which is converted into multi-copy single-stranded deoxyribonucleic acid DNA (msDNA) by the reverse transcriptase (RT) activity, and an effector protein which together function in defense against phages (Millman et al ., Cell, 2020). Recent w ork had established that retrons contained a loop region w'hieh could be expanded to accommodate large insertions of desired sequence for generating single-stranded donor DN A (Farzarfard et al. Science, 2014 PMCID: PMC4266475; Sharon et al., Cell, 2020 PMCID: PMC6563827). Single-stranded donor DNA production by the retron system begins with transcription of the retron cassette by RNA polymerase II (FIG. 1A). As shown in FIG. 1A, a panel of ribozymes on the 5' and 3' side of the retron donor-guide cassete were tested, including no ribozyme (none), the hammerhead ribozyme (HHR), the hepatitis delta virus (HDV) ribozyme, the anti -genomic HDV (agHDV) ribozyme, the U5 small nuclear ribonucleic acid (snRNA) stem-looop cleaved by the yeast RNase III enzyme Rntlp (U5 Rntlp 8L), and RiboJ ribozyme, which is a HHR-related ribozyme from the satellite RNA of tobacco ringspot virus (sTRSV) followed by a 23 nucleotide synthetic stem loop (SL). The superscript scissors denote the location of cleavage sites relative to the ribozymes (or Rntlp stem loop for U5).
[0109] The transcript is then processed by ribozymes on the 5' and 3' ends to remove the 5' 7-methylguanosine cap and 3' poly A tail, respectively, in order to prevent nuclear export and subsequent translation (FIG. IB). The retron-specific reverse transcriptase (RT) binds to the msr region of the retron and copies the msd region of the retron RNA into DNA, which is engineered to include the donor sequence. Hie host cell Rihonuciease H (RNase H) degrades the RNA in the resulting RNA-DNA hybrid to generate single-stranded donor ONA (FIG. IB). FIG. IB shows the location of ribozymes in relation to the 5' cap and 3' poiyadenylic acid (poly A) tail. Depending on whether the ribozymes cleave on their 5' end or 3' end, their RNA structures can stay bound to the processed retron transcript or be released, exposing the 5' end of the retron or 3' end of the guide RNA. After ribozyrne cleavage, the retron RNA is reverse transcribed by the re verse transcriptase (RT) at a conserved guanosine (G) to create an unusual 2’-5' RNA-DNA branched structure. The RNA in the RNA-DNA hybrid generated by the RT is degraded by host-cell RNAse H, leaving a looped-out single-stranded donor deoxyribonucleic acid (DNA) as a template for homology directed repair.
[0110] The HHR-HDV ribozyrne combination was previously published (Sharon et al Cell, 2018), where the hammerhead ribozyrne (HHR) is situated on the 5 ' side, and the hepatitis delta virus (HDV) ribozyrne is located on the 3' side. Ribozyrne identity could be a major factor in efficiency and therefore all combinations of the HHR, HDV, or no ribozyrne at the 5' and 3' ends were analyzed (see FIG. 1A). FIG. 2A shows the editing efficiency for all combinations of HDV, HHR, and no ribozyrne in the 5' and 3' positions of a retron donor-guide cassette targeting the ADE2 locus. The guide is an 18-mer with designed mismatches at positions 19 and 20 from the protospacer adjacent motif (PAM). The retron donor introduces CC-to-TG mutation which results in a premature termination codon. This retron donor-guide cassete with the HHR-HDV ribozyrne was previously published (Sharon et ah, Cell, 2018). The y-axis indicates the editing efficiency quantified as the % of genomic reads mapping to the designed donor sequence. The x-axis is the number of generations in galactose, which induces Cas9, the RT, and the retron-donor guide transcript. These tests revealed that the HDV in the 3' position resulted in detectable editing compared to cassettes that did not have HDV in the 3' position (FIG. 2A), suggesting that removing the poiyadenylic acid (polyA) tail and sequence 3' of the guide RN A increased the editing efficiency of either the retron or the guide RNA, or both. Hie HDV ribozyrne in the 5' position yielded a substantial increase in editing efficiency over the published HHR-HDV system (FIG. 2.A).
[0111 ] The use of the HDV -FID V combination would be prohibitive for library -scale workflows due to the tendency for direct repeats to recombine during cloning and editing, leading to a deletion of the donor-guide region and no editing. Therefore, a series of alternative ribozymes and RNA processing elements in the 5' position was tested. FIG. 2B shows the editing efficiency for all possible ribozymes in the 5' position paired with the HDV fixed in the 3' position. RiboJ-HDV is the only ribozyrne combination to reach >90% editing efficiency after 18 generations of editing. By contrast, the published HHR-HDV (CRISPEY) system only reaches 18% editing efficiency.
[0112] Editing efficiency could he impacted by both donor DNA production and Cas9 guide efficacy. For a more mechanistic understanding of retron-based editing, retron cDNA production was measured independently of editing efficiency. A next-generation sequencing (NGS)-based assay was developed for retron msDNA/cDNA quantification, where primers amplify both the genomic target as weil as the donor in the retron cDNA (FIG. 2C). The donor encodes a CC-to-TG mutation in the middle (see asterisk in FIG. 2C), so the ratio of reads containing the donor mutation relative to the wild type (WT) genomic locus is proportional to the ratio of donor cDNA to genome copies. This assay was performed with cells not expressing Cas9, so that the genomic locus would remain unedited and be a baseline for one copy of polymerase chain reaction (PCR) template per cell.
[0113] In FIG. 2D, the ribozyme combinations are sorted left to right by greatest to least retron cDNA produced in galactose. In glucose, both the RT and retron donor transcription are repressed, yet the most efficient cassettes still produce detectable retron cDNA. The primers also amplify the double-stranded donor on the retron guide cassette, which resides on a low copy centromeric vector. To control for this, the same retron cassetes were transformed into ceils lacking an RT, and the donor: genome ratio in such cells grown in galactose was first subtracted from the levels observed in the cells with the RT. The genome has two strands which can bind both primers in the first round of polymerase chain reaction (PCR), while the donor has only one strand, so the donor: genome ratio is multiplied by 2 to obtain the values on the y- axis, “cDNA copies per genome equivalent.” The ribozyme constructs had a substantial impact on cDNA levels, with effects spanning close to 3 orders of magnitude (FIG. 2D). With no ribozyme in the 3' position, the retron cDNA abundances were on the order of 750 to 900 copies per ceil for 5' HDV, 50 copies for no 5' ribozyme, and 1 copy for 5' HHR (FIG. 2D). Surprisingly, when HDV was located in the 3' position, low amounts of donor cDNA were produced (FIG. 2D), whereas, in contrast, cassettes with HDV in the 3' position resulted in the highest levels of editing efficiency (FIGS. 2A and 2B). In particular, the RiboJ-HDV system, which had much higher levels of editing compared to all other systems, had lower cDNA levels than HDV -HDV, suggesting that ribozymes impact retron cDNA levels and guide efficacy differently, and that what was optimal for one could be detrimental for the other (FIG. 2D vs FIG. 2B). [0114] As HHR cleaves on its 3' end and HDV cleaves on its 5' end, the 5' HHR-3' HDV arrangement leads to complete removal of sequence flanking the retron-guide (FIG. 1A). 5' HHR and 3' HD V each independently led to lower cDNA levels due to a lack of structured RNA to protect the retron at the 5' and 3' ends, respectively. By contrast, previous studies had shown that 5' HHR and 3' HDV increased the effectiveness of guide RNAs. As noted above, 3' HDV resulted in detectable editing, it can be inferred this was solely due to the increase in guide efficacy by 3' HDV, as this actually lowers retron cDNA production (compare FIGS. 2A, 2D). These results support a model where Cas9:guide complexes benefit from having no extraneous sequence on either end, possibly to minimize sterie hindrance as the complex scans the genomic DNA for the target. On the other hand, the RT binds the retron only transiently and does not protect the retron RNA (as Cas9 did for its guide), and therefore benefits from extra structured sequence on both ends. As RiboJ has HHR-like cleavage properties, RiboJ represents a compromise between what is optimal for the guide RNA (efficient processing with no 5' or 3' structures remaining) and what is optimal for the retron (structured 5' end to prevent degradation).
[0115] To extend these results and test the impact of RNA structures specifically, a long stem loop consisting of an inverted repeat of the LexA binding site sequence was inserted at different locations in the retron non-coding RNA (FIG. 3 A (ii), (iii), (iv)). Additionally, shorter stems with ultra-stable tetraloop sequences of GAAA and UUCG were tested, proximally downstream of the donor so that these stem loops would be incorporated into the single- stranded DNA (ssDNA), and potentially provide stability against 5'-3' DNA exonucleases (FIG. 3A (ii)). The UUCG stem loop and MS2 RNA stem-loops downstream of the msd sequence were also tested, where these structures would mainly serve to protect against 3 '-5' RNA exonucleases (FIG. 3A (v)). Retron cDNA levels were quantified with both galactose- inducible RT and constitutive alcohol/acetaldehyde dehydrogenase (ADHl-KT) in glucose, with retron non-coding RNA expressed from a hybrid SNR52-tRNA(Tyr) polymerase (Pol) III promoter with the 5' HDV (FIG. 3B). Only the LexA stem-loop structure when placed on the 3' end of the donor RNA just upstream of the 3' end of the retron led to a substantial increase in the levels of cDNA, to an estimated average of >30 cDNA copies per cell compared to ~1- 2 per cell with other setups (FIG. 3A (ii), FIG. 3B). In this arrangement, the LexA stem-loop ended up on the 5' end of the donor DNA after reverse transcription into msDNA. An assay was then employed which simultaneously tested editing efficiency and survival with these different retron setups (FIG. 3C). The higher cDNA levels observed with the donor-LexA- LexA stem loop resulted in both substantially higher editing survival and editing efficiency (FIG. 3A (if)). Extending the msd stem with inverted LexA repeated flanking the donor appeared to reduce editing efficiency (FIG. 3A (iii)), with LexA-LexA in front of the donor having minimal impact (FIG. 3A (iv)). While the cDNA levels were lower than that obtained with the polymerase (Pol) 11 promoter with the 5' HOV, these results clearly demonstrated the positive impact of RNA or DNA structures on retron cDNA accumulation.
[0116] Results showed that separate expression of the retron donor and the guide enabled optimal processing of each component. The higher retron cDNA levels achieved with the 5' HDV and the absence of a 3' HHR, and the higher guide efficacy achieved with the 5' HHR and 3' HDV (in the context of RNA polymerase II promoter ) or with the RNA polymerase III promoter outweighed the benefits from guide-mediated recruitment of the retron donor to the target site when they wrere physically linked. Furthermore, these results emphasize that the retron cDNA production is enhanced when expressed by Pol II with appropriate ribozymes flanking the retron construct (5'HDV with 3 'HHR or 5 'HDV with no ribozyme in the 3' position).
[0117] These data suggested that structured RNA on the 5' and 3' ends or a 3' end poly- adenylated tail deposited by RNA polymerase II transcription is important for protecting of the retron RNA from cellular 5 ' -3 ' and 3'-5' exonucleases. The HDV ribozyme and RiboJ ribozyme, both of which leave structured RNA to protect the 5' side of the retron, enhanced retron activity. Other ribozymes which self-cleave on their 5' ends or cleave internally and leave behind a protective structure (like RiboJ) can be used in this manner. Conversely, ribozymes which self-cleave on their 3' ends (like HHR) could he used to enhance retron RNA 3 '-end stability by placing them on the 3' end of the retron. Additional RNA structures such as RNA stem loops could also be protective on the 3' end of the retron, or could assist in the proper folding of the msr and msd regions of the retron non-coding RNA, for example by insulating the programmed, inserted sequence (e.g. donor sequence) from interfering with the structured regions of the ribozyme through competing for base-pairing. Generally, stabilizing retrons with structured RNA domains on either side of the retron RNA increases retron expression levels.
[0118] Retron production in human cells is shown in FIGS. 4A-4C. FIG. 4A shows transfections of HEK293 ceils with 250 ng of retron donor plasmid. FIG. 4B shows quantification of retron cDNA levels by qPCR. DNA was extracted from HEK293 cells expressing the indicated retron construct with or without RT. After Xbal treatment, the CACNA1D with RT consistently gives lower Ct values than the CACNA1D without RT. in FIG. 4C, the CACNA1D amplicons from Xbal-treated gDNA extractions were subjected to NGS, and the ratio of reads harboring an Xbal site to those containing the genomic sequence was calculated. In summary, the data m Fig. 4 shows that expressing the retron donor with 5' HDV produces detectable retron cONA in human HEK293 cells.
[0119] Approximately 1000-fold improvement in retron cDNA production and more than 10-fold improvement in editing efficiency kinetics over the current state-of-the-art (CRISPEY, Sharon et a!., Cell, 2018 PMCID: PMC6563827) is demonstrated herein. Additionally, the ability to stabilize the retron alone has broad application beyond CRISPR mediated genome editing (Simon et al. Nucleic Acids Research (2019) 47(21): 11007-11019). A major bottleneck for effective deployment of retron systems in eukaryotes has been achieving high expression levels. Consequently, the compositions and methods provided herein cure this deficiency.
Exemplary Embodiments
[0120] Exemplary embodiments provided in accordance with the presently disclosed subject matter include, but are not limited to, the claims and the following embodiments:
1. A nucleic acid encoding a retron comprising: a stabilizing 5' ribozyme sequence; an msr sequence; an msd sequence; a subject expression sequence within tire msd sequence; and a first inverted repeat sequence and a second inverted repeat sequence; wherein said nucleic acid does not comprise a guide RNA region.
2. The nucleic acid of embodiment 1, wherein the subject expression sequence comprises a donor sequence for homology directed repair (HDR).
3. The nucleic acid of embodiment 1, further comprising a 3' ribozyme sequence.
4. The nucleic acid of any one of embodiments 1-3, wherein the 5' ribozyme sequence or sequences are selected from the group consisting of Hammerhead Ribozyme, HDV ribozyme, RiboJ, CPEB3, Agam_1_1, Agam_2_2, Pmar_1, Bfio__1, Bflo 2, Spur_1, Spur__2, Spur 3, Spur 4, PpaeJ, CjapJ, FpraJ, CIV 1, Dpap 1, Tatr 1, CPEB3, G HDV, A HDV, Canis familiaris/1/3 73, Felis_catus_domestic_cat/l/3 74,
Ailuropoda_me3anoleuca_Giant_p/3 73, Elephant/ 113/4 75,
PongoAbelii_SumatranOrarigutan//T 66, MicrocebusMurinusJViouseLemur/I/1 66, TupaiaBelangeri NorthemTreesh/T 66, Rabbit/84/4 75, Human. ChrlO/290/4 75,
Chimp PanTrogiodytes/49/4 75, Rhesus/23/4 75, Macacajnulatta/1/1 70, SorexAraneus__CommonShrew/l/T 66, Mouse .ehrl 9_CPEB3/70/4 75, Rat.Chrl/411/4 74,
EquusCaballus/1/1 69, Lama pacos Alpaca/1/ 1 70, Opossuin/55/4 75,
Macropus eugenii Tammar W ailab/T 72, Monodelphis domestica..Grey Sho/ 1 71, CaviaPorcellus_GuineaPig1l/1 66, QehotonaPrinceps__AmeneanPike//l 66, Dasypus_novemeinetus_Nme_Band/2 73, Choloepus_ho†Tmanni_Hofmanns_t/4 75,
MyotisLueifugus BiOwnBat/l/l 60, Cow/112/475, Callithrix jacchus Common rnarm/1 69, Tursiops truncatus Bottlenose / 1 70, EchinopsTelfairi HedgehogTenre/272, Sus Scrofa/ 1/2. 71, Dipodomys_ordsii_Ords_Karigaroo/3 72, Pteropus_vampyrus_Malayan_Flyi/l 69, Gorilla_gorilla, self-cleaving ribozyme-containing R2 elements, the LITc retrotransposon found in Trypanosoma cruzi, short interspaced nuclear elements (SINEs) in Schistosomes, Penelope-like elements, and retrozymes.
5. The nucleic acid of embodiment 3 or 4, wherein the 3' ribozyme is a Hammerhead ribozyme, HDV, RiboJ, or CPEB3.
6. Tire nucleic acid of any one of embodiments 1 to 5, further comprising a stem-loop sequence located between the stabilizing 5' ribozyme sequence and the msr sequence.
7. A nucleic acid encoding
(a) a retron comprising:
(i) a stem loop-stabilizing 5' ribozyme sequence;
(ii) an msr sequence;
(lii) an msd sequence;
(iv) a subject expression sequence within the msd sequence;
(v) a first inverted repeat sequence and a second inverted repeat sequence; and
(b) a guide RNA region.
8. The nucleic acid of embodiment 7, wherein the subject expression sequence comprises a donor sequence for homology directed repair (HDR).
9. The nucleic acid of embodiment 7 or 8, wherein the stem loop-stabilizing 5' ribozyme sequence is HDV or RiboJ.
10. The nucleic acid of any one of embodiments 7 to 9, further comprising a 3’ ribozyme sequence, or a 3' stabilizing stem-loop structure, or a 3' ribozyme sequence which leaves behind a stabilizing RNA structure.
11. The nucleic acid of embodiment 10, wherein the 3' ribozyme sequence is HDV or RiboJ.
12. A nucleic acid encoding a retron comprising:
(i) a stabilizing 5 '-end sequence specific RNA cleavage site sequence;
(ii) an msr sequence;
(iii) an msd sequence;
(iv) a subject expression sequence within the msd sequence; and (v) a first inverted repeat sequence and a second inverted repeat sequence: wherein said nucleic acid does not comprise a guide RNA coding region.
13. The nucleic acid of embodiment 12, wherein the subject expression sequence comprises a donor sequence for homology directed repair (HOR).
14. Tire nucleic acid of embodiment 12 or 13, further comprising a 3' stabilizing stem- loop structure or 3' ribozyme sequence which leaves behind a stabilizing RNA structure.
15. The nucleic acid of any one of embodiments 12 to 14, wherein the stabilizing 5 '-end sequence specific RNA cleavage site sequence is an RNase III target motif.
16. The nucleic acid of any one of embodiments 12. to 15, further comprising a structureforming nucleic acid within the msd sequence.
17. A method of generating retron nucleic acid in a cell, the method comprising contacting the cell with a nucleic acid of any one of embodiments 1 to 16 and a reverse transcriptase or a nucleic acid encoding the same.
18. A method of editing a nucleic acid of a cell, the method comprising contacting the ceil with a nucleic acid of any one of embodiments 1 to 16, a reverse transcriptase or a nucleic acid encoding the same, and a sequence specific endonuclease or a nucleic acid encoding the same, thereby altering the genomic sequence of the cell.
19. The method of embodiment 18, wherein the sequence specific endonuclease is a CRISPR-associated endonuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease.
20. The method of embodiment 18 or 19, further comprising contacting the ceil with a guide RNA or a nucleic acid encoding the same, wherein the sequence specific endonuclease is a CRISPR-associated endonuclease.
21. The method of embodiment 19 or 20, wherein the CRISPR-associated endonuclease is selected from Cas1, Cas1B, Cas2, C2c1, Cas3, Cas4, Cas5, Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8al, Cas8a2, Cas8b, Cas8c, Cas9 (Csn1 or Csx12), SpCas9, FokI-dCas9, Cas10, Cas10d, Cas12, Cas12a, Mad 7™, CasX, CasY, Casd), CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel1 (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Crnr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, Cu1966, and homologs or modified versions thereof. .
22. A method of treating a genetic disease in a subject in need, the method comprising administering to the subject an effective amount of a nucleic acid of any one of embodiments 1 to 16, a reverse transcriptase or a nucleic acid encoding the same, and a sequence-specific endonuclease or a nucleic acid encoding the same.
23. The method of embodiment 22 wherein said sequence-specific endonuclease is a CRISPR-associated endonuclease, a Zinc-finger nuclease, a Transcription activator-hke effector nuclease (TALEN), or a meganuciease.
24. The method of embodiment 22 or 23, further comprising administering a guide RNA or a nucleic acid encoding the same to the subject, wherein the sequence specific endonuclease is a CRISPR-associated endonuclease.
25. The method of embodiment 23 or 24, wherein the CRISPR-associated endonuclease is Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, CasSaL Cas8a2, CasBb, CasBc, Cas9 (Csnl or Csx12), CasiO, Cas10d, Cas12, Cas12a, Mad7™, CasX, CasY, CasO, CasF, CasG, Casli, Csyl, Csy2, Csy3, Csei (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Csc1, Csc2, Csa5, Csn2, Csm2, Csrn3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, or Cu1966.
26. The nucleic acid of any one of embodiments 12 to 16, wherein cleavage of the stabilizing 5'-end sequence specific RNA cleavage site sequence results in a stabilizing structure 3’ of the cleavage site that is attached to the 5’ end of the msr sequence.
27. A nucleic acid encoding
(a) a retron comprising:
(i) a stabilizing 5' sequence;
(ii) an msr sequence;
(iii) an msd sequence;
(iv) a subject expression sequence within the msd sequence; and
(v) a first inverted repeat sequence and a second inverted repeat sequence; and
(b) optionally, a guide RNA region.
28. The nucleic acid of embodiment 27, wherein the subject expression sequence comprises a donor sequence for homology directed repair (HOR).
29. Tire nucleic acid of embodiments 27 or 28, wherein the stabilizing 5' sequence comprises a stable RNA structure or a G-quadruplex.
30. The nucleic acid of any one of embodiments 27 to 29, further comprising a 3’ ribozyme sequence, a 3' stabilizing stem-loop structure or 3' ribozyme sequence which leaves behind a stabilizing RNA structure.
31 . Tire nucleic acid of embodiment 30, wherein the 3' ribozyme sequence is HDV or
RiboJ. 32. The nucleic acid of any one of embodiments 27 to 31, wherein said nucleic acid does not comprise a guide RNA region.
[0121] All publications and patent applications mentioned in this disclosure are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference.
|0I22j No admission is made that any reference cited herein constitutes prior art. The discussion of the references states what their authors assert, and the Applicant reserves the right to challenge the accuracy and pertinence of the cited documents. It will be clearly understood that, although a number of infonnation sources, including scientific journal articles, patent documents, and textbooks, may be referred to herein; this reference does not constitute an admission that any of these documents forms part of the common general knowledge in the art.
[0123] The discussion of the general methods given herein is intended for illustrative pusposes only. Other alternative methods and alternatives will be apparent to those of skill in the art upon review of this disclosure and are to be included within the scope of this application.
[0124] While particular alternatives of the present disclosure have been disclosed, it is to be understood that various modifications and combinations are possible and are contemplated within the scope of the appended claims. There is no intention, therefore, of limitations to the exact abstract and disclosure herein presented.
INFORMAL, SEQUENCE LISTING (ALL SEQUENCES LISTED 5' TO 3'):
[0125] Canis_familiaris/i/3 73:
CGGGGCCACAGCAAAAGTGTTCACGTCATGGCCCCTGTCAGATTCTGGTGAATCT GCA A ATT CTGCTGTGT (SEQ ID NO: 1)
[0126] Feiis_catus_domestic_cat/l/3 74:
CAGGGACCACAGCAGAAGTTTCACATCGTGGCCCCTGTCAGATGCCAGTGAATCT
GTAAATTTCTGCTGTGT (SEQ ID NO:2)
[0127] Ailuropoda me lanoieuca Giant p/3 73:
CGGGGCCACAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCTGGTGAATCT GCAAATT CTGCTGTGT (SEQ ID NOG)
[0128] Elephant/ 113/4 75:
AGGGGGCCACAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCTGGTGAAT CTGCGAATT CTGCTACGC (SEQ ID NO:4)
[0129] PongoAbeiii___SimiatranOrangutan/71 66:
GGGGGCCACAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCTGGTGAATCT GCGAATT CTGC (SEQ ID NO: 5 )
[0130] MicrocebusMurinus_MouseLemur/l/l 66:
GGGGGCCACAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCTGGTGAATCT GCGAATT CTGC (SEQ ID NO:6)
[0131] TupaiaBeiangeri NoithernTreesh/l 66:
GGGGGCCACAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCTGGTGAATCT GCGAATT CTGC (SEQ ID NO: 7)
[0132] Rabbit/84/475:
AGGGGGCCACAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCTGGTGAAT CTGCGAATT CTGCTGCAT (SEQ ID NO: 8)
[0133] Human .Chr 10/290/4 75 : AGGGGGCCACAGCAGAAGCGTTCACGTCGCAGCCCCTGTCAGATTCTGGTGAAT CTGCGAATT CTGCTGTAT (SEQ ID NO:9)
|0I34] Chimp PanTroglodytes/49/475:
AGGGGGCCACAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCTGGTGAAT CTGCGAATT CTGCTGTAT (SEQ ID NO: 10)
[0135] Rhesus/23/4 75:
AGGGGGCCACAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCTGGTGAAT CTGCGAATT CTGCTGTAT (SEQ ID NO: 11)
[0136] Macaca mulatta/l/l 70:
AGGGGGCCACAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCTGGTGAAT CTGCGAATT CTGCTGT (SEQ ID NO: 12)
[0137] SorexAraneus_CommonShrew/l/l 66:
GGGGGCCACAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCTGGTGAATCT GCGAATT CTGC (SEQ ID NO: 13)
[0138] Mouse .chr 19 CPEB 3/70/475:
AGGGGGCCACAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCTGGCGAAT CTGCGAATT CTGCTGTAC (SEQ ID NO: 14)
[0139] Rat.Chrl/411/4 74:
AGGGGCCACAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCTGGTGAATCT GCGAATT CTGCTGTAC (SEQ ID NO: 15)
[0140] EquusCaballus/1/1 69:
GGGGGCCACAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCTGGTGAATCT GCGAATT CTGCTGT (SEQ ID NO: 16)
[0141] Lama jpacos Alpaca'' 1/1 70:
AGGGGGCCACAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCTGGTGAAT CTGCGAATT CTGCTGT (SEQ ID NO: 17)
[0142] Opossum/55/4 75: AGGGGGCCATAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCATAGGAAT
CTGCGAATT CTGCTGCAC (SEQ ID NO: 18)
[0143] Macfopus eugenii Tamrnar Wallab/1 12
AGGGGGCCATAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCATATGAAT
CTGCGAATT CTGCTGCAC (SEQ ID NO: 19)
[0144] Monodelpbis_domestica_Grey_Sbo/l 71 :
AGGGGGCCATAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCATAGGAAT CTGCGAATT CTGCTGAT (SEQ ID NO:20)
[0145] CaviaPorcellus GuineaPig/1/1 66:
GGGGGCCACAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCTGATGAATCT GCGAATT CTGC (SEQ ID NO:21)
[0146] OchotonaPrinceps__AmericanPike,'7'l 66:
GGGGGCCACAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCTGATGAATCT GCGAATT CTGC (SEQ ID NO: 22)
[0147] Dasypus novemcinctus Nine Band/273 :
ATGGGGCCACAGCAGAAGCATTCATGTTGCAGCCCTTGTGAGATTCAAGTGAATC TGTGAATT CTGCTGCGT (SEQ ID NO:23)
[0148] Choloepus_hoffi'nanni_Hofi'nanns_t/475:
ACGGGGCCACAGCAGAAGCATTCATGTCGCAGCCCCTGTCAGATTCTGGTGAATC TGCGGATT CTGCTGTGT (SEQ ID NO:24)
[0149] MyotisLucifbgusJBrownBat/1/1 60:
GGGGGCCTCAGCAGACACATCACAGTCCCCATCAGATTCTGGTGAATCCGTGAAT
TTTGC (SEQ ID NO:25)
[0150] Cow/112/4 75:
AAGGGGCCAGAGCAGAAGCATTCACGTCGCGGCCCCTGTCAGATTCTGGTGAAT CTGCGAATT CTGCTGTCA (SEQ ID NO:26)
[0151] Ca31ithrix_jacchus_Coramon_mami/I 69: GGGGGGCACAGCAGAAGCATTCACTTCGTGGCCCCTGTCAGATTCTAGTGAATCT GCGAATT CTGCTGT (SEQ ID NO:27)
[0152] Tursiops truncatus Botlenose /I 70:
AAGGGGCTACAGCAGAAGCGTTCACATTGCAGCCCCTGTCAGATTCTGGTGAATC
TGCGAATT CTGCTGT (SEQ ID NO:28)
[0153] EchinopsTelfairi_HedgehogTenre/272. :
ACAGGGCCACCCCAAAGCGTTCACATTGTGGCCCCTGTCAGATTCTGGTAAATCT GCGAGTT CTGCTATAT (SEQ ID NO: 29)
[0154] Sus___Scrofa/l/271:
TGGCAGCCACAGTAGAAGCATTCACATTGTGGTCCATGTCAGATTCTGGTGAATT TGCAAATT CTGCTGT (SEQ ID NQ:30)
[0155] Dipodomys_ordsii_Ords_Kangaroo/3 72:
ACAGAGCCGTTACAGAAGTGTTCATATCATGGTCCCTGTCAGATTCTGGTGAATC TGAAAATT CTGCTGT (SEQ ID NO:31)
[0156] Pteropiss vampyrus ^Malayan _Fly i/ 1 69 :
AGGGGCCGCGACGTGAACGCTTCTGCTGTGGCCCCTGTTATCCTCTTGTATTTGA AAACT GAACAACAA (SEQ ID NO:32)
[0157] Gorilla__goril]a:
AGGGGCCGCGACGTGAACGCTTCTGCTGTGGCCCCCTGTTATCCCCTTG (SEQ ID
NOGS)
[0158] Agam_l_l:
GGCUGACAAAAUCCUUUCCCAACCUCCACGUGGUGUCGGCUGGAUAAUGCAUU
AGAAAU GU UGC AUUU AC CAACU GGGAAGG (SEQ ID NO: El)
[0159] Again 2 2:
GUUCUGUAAACGGGGUUGGAUCCGACUCUCAUAGGCUCUCCAACCCAACUCCU
ACUCAAUACGUCCUCGUCGUACAGAACGGUAACAUGUUUUCCGAACAUCCGCG CUUGGGUAUACGAGUAUACACCUUACCCAACCCUCGCCAACGGGGAGGAUGGA
AAAC AUGGC UAAAUU GAGAGGG (SEQ ID NO: 35)
|0I60] Pmar I :
GGGGAAGAAGAUGACAACCUUCUCGCGGUCUUCCCUGUUACGUGUGUAGACAC AACGAAUGUCAUCA (SEQ ID NO:36)
[0161] Bflo _1 :
GACACCCAGGCAGAUGACCUCCUCGUUGGUGGGUGUCGGUAAUGUUGUCAGUC
AC AGGCAAC AGCUAACGAU CU GC C AU G U (SEQ ID NO:37)
[0162] Bflo 2:
GGGGACCAUAGAAGGAGCGUUCUCGUCGCGGUCCCUGUCAGGCUCGUCCUGCG AAU CCIJ U C IJ ACC ACAIJ (SEQ ID NO:38)
[0163] Spur !:
GGGGGCCAU UGAAGGAGCGU UCACGUCGCGGUCCCUGUCAGAUGAAAAUCUGC GA AUC CUUC A ACU AC ACUA (SEQ ID NO:39)
[0164] Spur 2:
GGGGUUCAUGUUGUCGACCUUCACGUGGUGAGCCCUGUCAACUGACUGCUGUC AGGCUAACAGACAACCAU (SEQ ID NO:40)
[0165] Spur 3:
GACACUGAGUGAGAAACGUCCCCGUCGUAGUGUCGGUAAUGCGUUGUUUCAAC GUA GCCA A UU CU CAC AUU A (SEQ ID NO:4i)
[0166] Spur_4:
GGGUUGCACAGGAGCAGGGUCCACGUCCCGCAACCUGGGUGUCAUGAUUUCUU GAAGCCAUGAL1 AGCUGAUGCUCC IIACA (SEQ ID NO:42)
[0167] Ppac 1:
GACUCGCUUGACUGUUCACCUCCCCGUGGUGCGAGUUGGACACCCACCACUCG
CAUUCUUCACCUAUUGUUUAAUUGUGCUUGUGGUGGGUGACUGAGAAACAGU
CCCAAC (SEQ ID NO:43) [0168] Cjap _ 1 :
GGUUUGCUUGUUGGAUACCUCCUCGCGGUGCAAACUGGGCAAUGCUGUGCUCC
UACGACAGGGGAAAUU
GGACACCACUGUUAGGCACAGUAGCCAAAAUCCACUCUU (SEQ ID NO:44)
[0169] Fpra_l :
GCUGCUCAUAUAUGCUACCUCUCCGUGGUGAGCAGUAGGCAACGGAUCUCUAU CCGGCUAAAGCAUGUGAUUGUC (SEQ ID NO:45)
[0170] CIV 1:
GAGGUGCUUGUAGAUAACCUCCACGAUGGUGCACCUUGGGCAACACAAAAGUG
GCAAAUCAUCUACAUIIAA (SEQ ID NO:46)
[0171] Dpap 1 :
GAGGGACAAGAAUCUGACCUGCACCUCCUCGUGGUGUCCCUCGGAAACGUGCU CAACGCGCGGCCGACGCAGGCAGUCUU (SEQ ID NO:47)
[0172] S air i
GGACAACGAAAUCGGCCUCUGCAACCUCCACGUGGUGU1IGUCUGGGAACCUGA
UCAAAACUACCGAGUUUGAUCAGGCCAAUGCAGAGAC (SEQ ID NO:48)
[0173] CPEB3:
GGGGGCCACAGCAGAAGCGTTCACGTCGCGGCCCCTGTCAGATTCTGGCGAATC TGCGAATTCTGCTGTACATTTT (SEQ ID NO:49)
[0174] GJ-IDV:
GGCCGGCATGGTCCCAGCCTCCTCGCTGGCGCCGGCTGGGCAACATTCCGAGGG GACCGTCCCCTCGGTAATGGCGAATGGGAC (SEQ ID NO: 50)
[0175] A HD V :
GGGTCGGCATGGCATCTCCACCTCCTCGCGGTCCGACCTGGGCATCCGAAGGAG GACGCACGTCCACTCGGATGGCTAAGGGAGAG (SEQ ID N0:51)
[0176] HDV: GATGGCCGGCATGGTCCCAGCCTCCTCGCTGGCGCCGGCTGGGCAACACCTTCGG
GTGGCGAATGGGACTT (SEQ ID NO:52)
[0177] Hammerhead Ribozyme:
NNNNNNCTGATGAGTCCGTGAGGACGAAACGAGTAAGCTCGTCNNNNNN (SEQ ID NO:53)
[0178] RiboJ:
Agctgtcaccggalgtgctttccggtctgaigagtccgtgaggacgaaacagcctctacaaatttlgtttaa (SEQ ID NO:54) [0179] MSR:
ATGCGCACCCTTAGCGAGAGGTTTATCATTAAGGTCAACCTCTGGATGTTGTTTC GGCATCCTGCATTGAATCTGAGTTACT (SEQ ID NQ:55)
[0180] MSD:
TCTGAGTTACTGTCTGTTTTCCT (SEQ ID NO:56) (first fragment), programmable loop, AGGAAACCCGTTTCTTCTGACGTAAGGGTGCGCA (SEQ ID NQ:57) (second fragment with inverted repeat).

Claims

WHAT IS CLAIMED 18:
1. A nucleic acid encoding a retron comprising:
(i) a stabilizing 5' ribozyme sequence;
(ii) an msr sequence;
(iii) an msd sequence;
(iv) a subject expression sequence within the msd sequence; and
(v) a first inverted repeat sequence and a second inverted repeat sequence; wherein said nucleic acid does not comprise a guide RNA region.
2. The nucleic acid of claim 1, wherein the subject expression sequence comprises a donor sequence for homology directed repair (1 SDR).
3. The nucleic acid of claim 1, further comprising a 3' ribozyme sequence, a 3‘ stabilizing stem-loop structure or 3' ribozyme sequence which leaves behind a stabilizing RNA structure.
4. The nucleic acid of claim 1, wherein the 5' ribozyme sequence or sequences are selec ted from the group consisting of Hammerhead Ribozyme, HDV ribozyme, RiboJ, CPEB3, Again 1 1, Agam 2 2, Pmar 1, Bflo L Bflo 2, Spur I, Spur 2, Spur 3, Spur_4, Ppac_i, Cjap_l, Fpra_l, CIV_1, Dpap_l, Tatr_l, CPEB3, G_HDV, A_HDV, Canis_fa,miliaris/i/3 73, Felis_catus_domestic_cat/l /3 74,
Ailuropoda rnelanoleuca Giant p/3 73, Elephant'! 13/4 75,
PongoAbelii_SumatranOrangutan//l 66, MicroeebusMurinus_MouseLemur/ 1 /I 66, TupaiaBelangeri_NorthemTreesli/l 66, Rabbit/84/4 75, Human. Chr 10/290/4 75,
Clump PanTioglodytes/49/4 75, Rhesus/23/4 75, Macaca mulatta/1/1 70, SorexAraneus CommonShrew/l/l 66, Mouse .chr 19_CPEB3/70/4 75, Rat. Chrl/411/4 74, EquusCaballus/1/1 69, Lama_pacos_Alpaca/l/I 70, Opossum/55/4 75, Macropus_eugemi_Tamrnar_Wallab/l 72, Monodelphis_domestica_Grey_Sho/l 71 , CaviaPorcellus GuineaPig/1/1 66, OchotonaPrineeps AmericanPike/'/l 66, Dasy pus novemcinctus JN ine Band/2 73 , Choloepus hoffmanni Hofmanns t/4 75 , Myoti sLucifugusJBrownBat/ 1/1 60, Cow/112/475, CallithrixJacchus_Common_marm/l 69, Tursiops_truncatus_Bottlenose_/l 70, EchinopsTelfairi_HedgehogTenre/272, Sus_Scrofa/l/2 71, Dipodomys ordsii Ords Kangaroo/3 72, Pteropus vampyrus Malayan Fly i/1 69, Gorilla gorilla, self-cleaving ribozyme -containing R2 elements, the LITc retrotransposon found in Trypanosoma cruzi, short interspaced nuclear elements (SINEs) in Schistosomes, Penelope-like elements, and retrozymes.
5. The nucleic acid of claim 3, wherein the 3' ribozyme is a Hammerhead ribozyme, HDV, RiboJ, or CPEB3.
6. The nucleic acid of claim 1 , further comprising a stem-loop sequence located between the stabilizing 5' ribozyme sequence and the msr sequence.
7. A nucleic acid encoding
(a) a retron comprising:
(i) a stem loop-stabilizing 5' ribozyme sequence;
(ii) an msr sequence;
(iii) an msd sequence;
(iv) a subject expression sequence within the msd sequence; and
(v) a first inverted repeat sequence and a second inverted repeat sequence; and
(b) a guide RNA region.
8. The nucleic acid of claim 7, wherein the subject expression sequence comprises a donor sequence for homology directed repair (HDR).
9. The nucleic acid of claim 7, wherein the stem loop-stabilizing 5' ribozyme sequence is HDV or RiboJ.
10. The nucleic acid of claim 7, further comprising a 3 ’ ribozyme sequence, a 3’ stabilizing stem-loop structure or 3' ribozyme sequence which leaves behind a stabilizing RNA structure.
11. The nucleic acid of claim 10, wherein the 3' ribozyme sequence is HDV or RiboJ.
12. A nucleic acid encoding a retron comprising:
(i) a stabilizing 5'-end sequence specific RNA cleavage site sequence;
(ii) an msr sequence;
(iii) an msd sequence; (iv) a subject expression sequence within the msd sequence; and
(v) a first inverted repeat sequence and a second inverted repeat sequence; wherein said nucleic acid does not comprise a guide RNA coding region,
13. The nucleic acid of claim 12, wherein the subject expression sequence comprises a donor sequence for homology directed repair (HDR).
14. The nucleic acid of claim 12, further comprising a 3' stabilizing stem- loop structure or 3' ribozyme sequence which leaves behind a stabilizing RNA structure.
15. The nucleic acid of claim 12. wherein the stabilizing 5 '-end sequence specific RNA cleavage site sequence is an RNase III target motif.
16. The nucleic acid of claim 12, wherein cleavage of the stabilizing 5'-end sequence specific RNA cleavage site sequence results in a stabilizing structure 3’ of the cleavage site that is attached to the 5’ end of the msr sequence.
17. The nucleic acid of claim 12, further comprising a structure-forming nucleic acid within the msd sequence.
18. A method of generating retron nucleic acid in a cell, the method comprising contacting the cell with a nucleic acid of claim I and a reverse transcriptase or a nucleic acid encoding the same.
19. A method of editing a nucleic acid of a cell, the method comprising contacting the cell with a nucleic acid of claim 1, a reverse transcriptase or a nucleic acid encoding the same, and a sequence specific endonuclease or a nucleic acid encoding the same, thereby altering the genomic sequence of the ceil.
20. The method of claim 19, wherein the seq uence specific endonuclease is a CRISPR-assoeiated endonuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease.
21. The method of claim 19, further comprising contacting the cell with a guide RNA or a nucleic acid encoding the same, wherein the sequence specific endonuclease is a CRISPR-associated endonuclease.
2.2. The method of claim 20, wherein the CRISPR-associated endonuclease is selected from Cast, CaslB, Cas2, C2cl, Cas3, Cas4, Cas5, Cas5e (CasD), Cash, Cas6e, Cas6f, Cas7, Cas8al, Cas8a2, Cas8b, Cas8c, Cas9 (Csnl or Csxl 2), SpCas9, Foki-dCas9, CaslO, CaslQd, Casl2, CasI2a, Mad 7™, CasX, CasY, Cas<l>, CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Csel, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl 7, Csxl4, CsxlO, Csxl6, CsaX, Csx3, Csxl, Csxl5, Csfl, Csf2, Csf3, Csf4, Cul966, and homologs or modified versions thereof.
23. A method of treating a genetic disease in a subject in need, the method comprising administering to the subject an effective amount of a nucleic acid of any of claims 1 to 17, a reverse transcriptase or a nucleic acid encoding the same, and a sequence-specific endonuclease or a nucleic acid encoding the same.
24. The method of claim 23 wherein said sequence-specific endonuclease is a CRISPR-associated endonuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease.
25. The method of claim 24, further comprising administering a guide RNA or a nucleic acid encoding the same to the subject, wherein the sequence specific endonuclease is a CRISPR-associated endonuclease.
26. Tire method of claim 24, wherein the CRISPR-associated endonuclease is Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cash, Cas6e, Cas6f, Cas7, Cas8a1, Cas8a2, CasBb, Cas8c, Cas9 (Csn1 or Csx12), Cas10, Cas10d, Cas12, Cas12a, Mad 7™, CasX, CasY, CasΦ , CasF, CasG, CasH, Csy1, Csy2, Csy3, Cse1 (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Cse1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, or Cu1966.
2.7. A nucleic acid encoding
(a) a retron compri sing :
(i) a stabilizing 5' sequence;
(ii) an msr sequence; (iii) an rnsd sequence;
(iv) a subject expression sequence within the msd sequence; and
(v) a first inverted repeat sequence and a second inverted repeat sequence; and
(b) optionally, a guide RNA region.
28. The nucleic acid of claim 27, wherein the subject expression sequence comprises a donor sequence for homology directed repair (HDR).
29. The nucleic acid of claim 27, wherein the stabilizing 5' sequence comprises a stable RNA structure or a G-quadruplex.
30. The nucleic acid of claim 27, further comprising a 3" ribozyme sequence, a 3' stabilizing stem-loop structure or 3'' ribozyme sequence which leaves behind a stabilizing RNA structure.
31. The nucleic acid of claim 30, wherein the 3' ribozyme sequence is HDV or RiboJ.
32. The nucleic acid of claim 27, wherein said nucleic acid does not comprise a guide RNA region.
PCT/US2022/073129 2021-06-23 2022-06-23 Compositions and methods for efficient retron production and genetic editing WO2022272293A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163214197P 2021-06-23 2021-06-23
US63/214,197 2021-06-23

Publications (1)

Publication Number Publication Date
WO2022272293A1 true WO2022272293A1 (en) 2022-12-29

Family

ID=84545854

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/073129 WO2022272293A1 (en) 2021-06-23 2022-06-23 Compositions and methods for efficient retron production and genetic editing

Country Status (1)

Country Link
WO (1) WO2022272293A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023081756A1 (en) * 2021-11-03 2023-05-11 The J. David Gladstone Institutes, A Testamentary Trust Established Under The Will Of J. David Gladstone Precise genome editing using retrons

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5405775A (en) * 1989-02-24 1995-04-11 The University Of Medicine And Dentistry Of New Jersey Retrons coding for hybrid DNA/RNA molecules
US20180312874A1 (en) * 2012-05-25 2018-11-01 The Regents Of The University Of California Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription
US20190330619A1 (en) * 2016-09-09 2019-10-31 The Board Of Trustees Of The Leland Stanford Junior University High-throughput precision genome editing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5405775A (en) * 1989-02-24 1995-04-11 The University Of Medicine And Dentistry Of New Jersey Retrons coding for hybrid DNA/RNA molecules
US20180312874A1 (en) * 2012-05-25 2018-11-01 The Regents Of The University Of California Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription
US20190330619A1 (en) * 2016-09-09 2019-10-31 The Board Of Trustees Of The Leland Stanford Junior University High-throughput precision genome editing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023081756A1 (en) * 2021-11-03 2023-05-11 The J. David Gladstone Institutes, A Testamentary Trust Established Under The Will Of J. David Gladstone Precise genome editing using retrons

Similar Documents

Publication Publication Date Title
US20230383290A1 (en) High-throughput precision genome editing
Sun et al. Engineering herbicide-resistant rice plants through CRISPR/Cas9-mediated homologous recombination of acetolactate synthase
JP6835726B2 (en) CRISPR hybrid DNA / RNA polynucleotide and usage
CN111836894B (en) Compositions for genome editing using CRISPR/Cpf1 systems and uses thereof
US20190161743A1 (en) Self-Targeting Guide RNAs in CRISPR System
EP4012037A1 (en) Crispr/cas9 gene editing system and application thereof
WO2017049129A2 (en) Methods of making guide rna
EP3546582A1 (en) Promoter activating elements
EP4180460A1 (en) System and method for editing nucleic acid
WO2022272293A1 (en) Compositions and methods for efficient retron production and genetic editing
KR102151064B1 (en) Gene editing composition comprising sgRNAs with matched 5&#39; nucleotide and gene editing method using the same
KR102515727B1 (en) Composition and method for inserting specific nucleic acid sequence into target nucleic acid using overlapping guide nucleic acid
WO2024044736A2 (en) Enhanced mammalian crispr editing with separated retron donor and nickases
EP4048803A1 (en) Modified double-stranded donor templates
US11884915B2 (en) Guide RNAs with chemical modification for prime editing
WO2023029492A1 (en) System and method for site-specific integration of exogenous genes
WO2022272294A1 (en) Compositions and methods for efficient retron recruitment to dna breaks
US20230088902A1 (en) Cell specific, self-inactivating genomic editing using crispr-cas systems having rnase and dnase activity
US20230340468A1 (en) Methods for using guide rnas with chemical modifications
EP4043574A1 (en) Synergistic promoter activation by combining cpe and cre modifications
US20230075913A1 (en) Codon-optimized cas9 endonuclease encoding polynucleotide
WO2024044767A2 (en) Recruitment of donor dna from in vivo assembled plasmids for saturation genome editing
CN118043465A (en) Methods for using guide RNAs with chemical modifications
US20210054367A1 (en) Methods and compositions for targeted editing of polynucleotides
WO2023019164A2 (en) High-throughput precision genome editing in human cells

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22829519

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE