WO2022272294A1 - Compositions and methods for efficient retron recruitment to dna breaks - Google Patents

Compositions and methods for efficient retron recruitment to dna breaks Download PDF

Info

Publication number
WO2022272294A1
WO2022272294A1 PCT/US2022/073130 US2022073130W WO2022272294A1 WO 2022272294 A1 WO2022272294 A1 WO 2022272294A1 US 2022073130 W US2022073130 W US 2022073130W WO 2022272294 A1 WO2022272294 A1 WO 2022272294A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
rna
nucleic acid
domain
dna
Prior art date
Application number
PCT/US2022/073130
Other languages
French (fr)
Inventor
Kevin R. ROY
Justin D. Smith
Robert P. St. Onge
Lars M. Steinmetz
Original Assignee
The Board Of Trustees Of The Leland Stanford Junior University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Board Of Trustees Of The Leland Stanford Junior University filed Critical The Board Of Trustees Of The Leland Stanford Junior University
Publication of WO2022272294A1 publication Critical patent/WO2022272294A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/907Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/80Fusion polypeptide containing a DNA binding domain, e.g. Lacl or Tet-repressor
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/85Fusion polypeptide containing an RNA binding domain
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/16Aptamers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]

Definitions

  • HDR Homology-directed repair
  • DSBs double strand breaks
  • HDR efficiencies limit genome editing applications in many organisms and cell lines in both basic and translational research settings.
  • HDR-based approaches are unmatched among genome editing methods in enabling the introduction of genome edits of virtually any size.
  • HDR can be utilized to repair deleterious single-nucleotide polymorphisms (SNPs), to insert multiple genes encoding entire pathways into chromosomes, to make large, programmed deletions or translocations, and to build chromosome -sized DN A inside the cell for synthetic biology applications.
  • SNPs single-nucleotide polymorphisms
  • TALENs transcription activator-like effector nucleases
  • HDR high-density lipoprotein
  • NHEJ non-homologous end joining
  • tethering of donor DNA to the nuclease or near the double-stranded DNA cut with the LexA-Fkhlp system HDR is still a limiting factor in enhancing editing.
  • many cell types prefer single-stranded DNA (ssDNA) over double -stranded DNA (dsDNA) for HDR.
  • ssDNA single-stranded DNA
  • dsDNA double -stranded DNA
  • the LexA-Fkhlp donor recruitment system which utilizes the Forkhead-associated (FHA) domain of the yeast Fkhlp protein, was previously shown to work only with dsDNA.
  • compositions and methods for recruiting single-stranded donor DNA directly to target edit sites to achieve higher HDR efficiency are provided herein, inter alia , are solutions to these and other problems in the art.
  • This disclosure provides compositions and methods for recruiting single-stranded donor DNA directly to target edit sites to achieve higher HDR efficiency.
  • nucleic acids encoding a retron that include : (a) one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences, (b) an msr sequence, (c) an msd sequence, (d) a subject expression sequence within the msd sequence, and (e) a first inverted repeat sequence and a second inverted repeat sequence.
  • the subject expression sequence comprises a donor sequence for homologous directed repair (HDR).
  • the RNA binding domain recognition sequence is a MS2. stem loop sequence, a Pumilio (PUF) recognition sequence, an RNA Recognition Motif (RRM) recognition sequence, a Double-Stranded RNA -Binding Domain (dsRBD) recognition sequence, a Zinc finger (ZF) Domain recognition sequence, a G- quadruplex-forming sequence, a Z-alpha, arginine/glycine rich (RGG) domain recognition sequence, or a K Homology (KH) Domain recognition sequence.
  • PAF Pumilio
  • RRM RNA Recognition Motif
  • dsRBD Double-Stranded RNA -Binding Domain
  • ZF Zinc finger
  • G-quadruplex-forming sequence a Z-alpha
  • arginine/glycine rich (RGG) domain recognition sequence or a K Homology (KH) Domain recognition sequence.
  • the single stranded nucleic acid binding domain recognition sequence is a sequence recognized by a single stranded nucleic acid binding domain such as those found in a CR1SPR associated endonuclease such as Cas9 or Casl2a, POT1, TEPB, CspB, K homology (KH) domain, far upstream element (FUSE)-binding protein (FBP), poly(C)-binding protein, a G-quadruplex binding domain including nucleolin, linRNP, serine/arginine-rieh splicing factors (SRSF) 1 and 9, splicing factor U2AF, TRF2, FRM2, and the RNA helicase associated with AU-rich element (RHAU) proteins, FBP-interacting repressor (FIR), hnRNP A1 , hnRNP D, or a wbirly domain.
  • a CR1SPR associated endonuclease such as Cas9 or Casl2a,
  • chimeric constructs that include an RN A hybridized to a DNA, such as that formed in the cell by reverse transcription of an engineered retron non- coding RNA, wherein the RNA comprises one or more RNA binding domain recognition sequences and an msr sequence; and wherein the DNA comprises an msd sequence and a subject expression sequence within the msd sequence, in some embodiments, the subject expression sequence comprises a donor sequence for homoiogy-directed repair (HDR).
  • HDR homoiogy-directed repair
  • RNA binding domain is an RNA binding domain of the MS2 coat protein (MCP) polypeptide that binds to a MS2 stem loop sequence, a Pumiho (PUF) recognition sequence, an RNA Recognition Motif (RRM) recognition sequence, a Double-Stranded RNA-Binding Domain (dsRBD) recognition sequence, a Zinc finger (ZF) domain recognition sequence, a G-quadruplex-forming sequence, a Z-alpha, arginine/glycine rich (RGG) domain recognition sequence, or a K Homology (KH) domain recognition sequence.
  • MCP MS2 coat protein
  • the single stranded nucleic acid binding domain is a single stranded nucleic acid binding domain of a CRISPR associated endonuclease, POTl, TEPB, CspB, a K homology (KH) domain, a far upstream element (FUSE)-binding protein (FBP), apo!y(C)-bmding protein, a G-quadruplex binding domain including nucieo!in, hnRNP, serine/arginine-rich splicing factors (SRSF) 1 and 9, splicing factor U2AF, TRF2, FRM2, and the RNA helicase associated with AU-rich element (RHAU) proteins, an FBP- interacting repressor (FIR), hnRNP Al, hnRNP D, or a whirly domain.
  • a CRISPR associated endonuclease POTl, TEPB, CspB
  • KH K homology
  • FUSE far upstream element
  • the DNA break site localizing domain is a DNA break site localizing domain of a polypeptide listed in any of Tables 1 to 5.
  • the RNA binding domain comprises an RNA binding domain of MS2 coat protein (MCP) and the DNA break site localizing domain comprises a forkhead-assoeiated (FHA) domain.
  • the polypeptide further comprises a LexA domain located between the RNA binding domain of MCP and the FIFA domain.
  • the LexA domain is from the LexA repressor protein (UniProtKB - P0A7C2).
  • complexes that include the chimeric constructs above non- covalently bound to a polypeptide that includes an RNA binding domain or single stranded nucleic acid binding protein covalently or non-covalently bound to a DNA break site localizing domain.
  • a retron that includes (a) one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences,
  • the sequence specific endonuclease is a CR1SPR associated (Cas) nuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease.
  • the method comprises contacting the cell with one or more a guide RNAs (gRNAs), or one or more nucleic acids encoding the same.
  • the Cas nuclease is Cas9, Streptococcus pyogenes Cas9 (SpCas9), Cpfl (Cas 12a), Mad7TM, C2cl, or FokI-dCas9.
  • the Cas nuclease is selected from the goup consisting of Cast, CaslB, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cash, Casbe, Cas6f, Cas7, Cas8al, Cas8a2, CasSb, Cas8c, Cas9 (Csnl or Csxl2), SpCas9, FokI-dCas9, Cas 10, CaslOd, Casl2a/CpfL Mad7TM, CasF, CasG, CasH, Csyi, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4,
  • a retron that includes (a) one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences, (b) an msr sequence, (c) an msd sequence, and (d) a subject expression sequence within the msd sequence, and (e) a first inverted repeat sequence and a second inverted repeat sequence, (2) a polypeptide of an RNA binding domain or a single stranded DNA binding domain covalently bound to a DNA break site localizing domain or its encoding nucleic acid, (3) a reverse transcriptase or a nucleic acid encoding the same, and (4) a sequence specific endonuclease or a nucleic acid encoding, thereby treating the disease.
  • the sequence specific endonuclease is a CRISPR associated (Cas) nuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease.
  • the method comprises contacting the cell with one or more a guide RNAs (gRNAs), or one or more nucleic acids encoding the same.
  • the Cas nuclease is Cas9, SpCas9, Cpfl (Casl2a), Mad7TM, C2el, or Fokl- dCas9.
  • the Cas nuclease is selected from the goup consisting of Cask Cas IB, Cas2, Cas3, Cas4, Cas5, CasSe (CasD), Cas6, Cas6e, Casbf, Cas7, CasBal, Cas8a2, Cas8b, Cas8e, Cas9 (Csnl or Csxl2), SpCas9, FokI-dCas9, Cas 10, CaslOd, Casl2a/Cpfl, Mad7TM, CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cm
  • FIGS, 1A-1D show an overview of the dual retron amplification-donor recruitment system.
  • FIGS. 2.A and 2B show the levels of retron cDNA produced by the different editing cassettes from FIG 1.
  • FIGS 3A-3D show a multiplexed editing assay to introduce all possible single nucleotide variants (SNVs) across two genomic regions using either the retron alone, donor recruitment alone, both operating simultaneously and independently, or with the dual retron amplification-donor recruitment system with either Streptococcus pyogenes Cas9 (SpCas9) or Lachnospiraceae bacterium Cas 12a (LhCasl2a; also known as Cpfl).
  • SpCas9 Streptococcus pyogenes Cas9
  • LhCasl2a Lachnospiraceae bacterium Cas 12a
  • compositions and methods to increase efficiency for retron production and its recruitment to the site of DNA breaks in a cell are provided herein.
  • the term “about” means a range of values including the specified value, which a person of ordinary' skill in the art would consider reasonably similar to the specified value. In embodiments, about means within a standard deviation using measurements generally acceptable in the art. in embodiments, about means a range extending to +/- 10% of the specified value (e.g., +/- 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, or 10% of the specified value). In embodiments, about means the specified value.
  • the term “genome editing” refers to a type of genetic engineering in which DNA is inserted, replaced, or removed from a target DNA (e.g., the genome of a cell) using one or more nucleases and/or nickases.
  • the nucleases create specific double-strand breaks (DSBs) at desired locations in the genome and harness the cell's endogenous mechanisms to repair the induced break by homology-directed repair (HDR) (e.g., homologous recombination) or by nonhomologous end joining (NHEJ).
  • HDR homology-directed repair
  • NHEJ nonhomologous end joining
  • two nickases can be used to create two single-strand breaks on opposite strands of a target DNA, thereby generating a blunt or a sticky end.
  • Any suitable DNA nuclease can be introduced into a ceil to induce genome editing of a target DNA sequence.
  • DNA nuclease refers to an enzyme capable of cleaving the phosphodiester bonds between the nucleotide subunits of DNA and may be an endonuclease or an exonuclease.
  • the DNA nuclease may be an engineered (e.g., programmable or targetable) DNA nuclease which can be used to induce genome editing of a target DNA sequence.
  • Any suitable DNA nuclease can be used including, but not limited to, CRISPR-associated protein (Cas) nucleases, other endo- or exo-nucleases, variants thereof, fragments thereof, and combinations thereof.
  • Cas CRISPR-associated protein
  • double-strand break or “double-strand cut” refers to the severing or cleavage of both strands of the DNA double helix.
  • the DSB may result in cleavage of both stands at the same position leading to “blunt ends” or staggered cleavage resulting in a region of single -stranded DNA at the end of each DNA fragment, or “sticky ends”.
  • a DSB may arise from the action of one or more DNA nucleases.
  • NHEJ nonhomologous end joining
  • HDR homologous recombination
  • the term “retron” is used in accordance with its plain ordinary meaning and refers to a DNA sequence found in the genome of many bacteria species that codes for reverse transcriptase and a unique single-stranded DNA/RNA hybrid called multicopy single-stranded DNA (msDNA),
  • the Retron msr-msd RNA is the non-coding RNA produced by retron elements and is the immediate precursor to the synthesis of msDNA.
  • the retron msr RNA folds into a characteristic secondary structure that contains a conserved guanosine residue at the end of a stem loop.
  • RNA/RNA chimera which is composed of small single-stranded DNA linked to small single-stranded RNA.
  • Hie RNA strand is joined to the 5' end of the DNA chain via a 2.'— 5' phosphodiester linkage that occurs from tire 2' position of the conserved internal guanosine residue.
  • the retron operon carries a promoter sequence P that controls the synthesis of an RNA transcript earning three loci: msr, msd, and ret .
  • the ret gene product, a reverse transcriptase processes the msd/msr portion of the RNA transcript into msDNA.
  • Retron elements are about 2 kb long. They contain a single operon controlling the synthesis of an RNA transcript earning three loci, msr, msd, and ret, that are involved in msDNA synthesis. Hie DNA portion of msDNA is encoded by the msd region, the RNA portion is encoded by the msr region, while the product of the ret open-reading frame is a reverse transcriptase similar to the RTs produced by retroviruses and other types of retroelements. Like other reverse transcriptases, the retron RT contains seven regions of conserved amino acids, including a highly conserved tyr-ala-asp-asp (YADD) sequence associated with the catalytic core. The ret gene product is responsible for processing the msd/msr portion of the RNA transcript into msDNA.
  • YADD highly conserved tyr-ala-asp-asp
  • reverse transcriptase refers to its plain and ordinary meaning as an enzyme used to generate complementary DNA (cDNA) from an RNA template, a process termed reverse transcription.
  • polypeptide and protein refer to a polymer of amino acid residues and are not limited to a minimum length. Thus, peptides, oligopeptides, dimers, multimers, and the like, are included within the definition. Both full length proteins and fragments thereof are encompassed by the definition.
  • the terms also include post expression modifications of the polypeptide, for example, glyeosylation, acetylation, phosphorylation, hydroxylation, and the like.
  • a "polypeptide” refers to a protein which includes modifications, such as deletions, additions and substitutions to the native sequence, so long as the protein maintains the desired activity. These modifications may be deliberate, as through site directed mutagenesis, or may be accidental, such as through mutations of hosts which produce the pro teins or errors due to PCR amplification.
  • single stranded nucleic acid binding domain refers to a polypeptide or aptamer that preferentially binds to specific sequences of single stranded DNA or single stranded RNA.
  • Single stranded nucleic acid binding domain recognition domains of polypeptides include, but are not limited to, CRISPR associated endonucleases such as Cas 13 or Cas 14, oligonucleotide/oligosaccharide/oligopeptide-binding (OB) folds, such as in such as human PO ’ Tl, Schizosaccharomyces pombe Potl, Sterkiella nova T ⁇ ERB, CspB protein from Bacillus caldolyticus and Bacillus subtilis ; K homology (Kid) domains, such as in Kid domain- containing proteins include heterogeneous ribonucleoprotein K (hnRNP K), far upstream element (FUSE) ⁇ bmding protein (FBP), and poly(C)-binding proteins (PCBP) 1 and 2: RNA recognition motifs (RRMs) which bind DNA such as in FBP-interacting repressor (FIR), hnRNP Al, and hnRNP D (also known
  • RNA binding domain refers to a polypeptide or aptamer that preferentially binds to specific sequences of a single stranded or double stranded RNA which, in the ease of a polypeptide, can include the entire protein or a functional portion thereof.
  • RNA binding domains include an M82 coat protein (MCP), Pumilio (PUF), RNA Recognition Motif (RRM), Double-Stranded RNA-Bmdmg Domain (dsRBD), Zinc finger (ZF) Domains (CCHH zinc fingers: ⁇ T ⁇ PA, CCCH zinc fingers, CCHC zinc knuckles, RanBP2-type ZFs), Z-alpha, arginine/glycine rich (RGG) domains, or K Homology (KH) Domain, and Poly(A) Binding Proteins.
  • MCP M82 coat protein
  • PEF Pumilio
  • RRM RNA Recognition Motif
  • dsRBD Double-Stranded RNA-Bmdmg Domain
  • ZF Zinc finger Domains
  • CCHH zinc fingers: ⁇ T ⁇ PA CCCH zinc fingers
  • CCHC zinc knuckles RanBP2-type ZFs
  • Z-alpha Zinc finger domains
  • DNA break localizing domain refers to a polypeptide that preferentially binds to regions of DNA damage and/or DNA repair proteins which can include the entire protein or a functional portion thereof.
  • Non-limiting examples of DNA break localizing domains include 14-3-3 proteins, WW domains, Polo-box domains (in PLKi), WD40 repeats (including those in the E3 ligase SCFpTrCP), BRCT domains (including those in BRCA1) and FHA domains (such as in Fkhlp, CHK2 and MDC1). Other examples are provided m Tables 1-5 (see below).
  • sequence specific endonuclease refers to an enzyme that cleaves at a specific sequence within a polynucleotide sequence, in some aspects, the nuclease activity can be partially or completed inhibited, so that only one of the two strands or neither strand is cleaved,.
  • sequence specific endonucleases include CRISPR associated (Cas) nuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease.
  • Cas9 encompasses type IT clustered regularly interspaced short palindromic repeats (CRISPR) system of Cas9 endonucleases from any species, and also includes biologically active fragments, variants, analogs, and derivatives thereof that retain Cas9 endonuclease activity (i.e., catalyze site-directed cleavage of DNA to generate doublestrand breaks).
  • CRISPR type IT clustered regularly interspaced short palindromic repeats
  • a Cas9 endonuclease binds to and cleaves DNA at a site comprising a sequence complementary ' ⁇ to its bound guide RNA (gRNA).
  • gRNA bound guide RNA
  • a Cas9 polynucleotide, nucleic acid, oligonucleotide, protein, polypeptide, or peptide refers to a molecule derived from any source. The molecule need not be physically derived from an organism but may be synthetically or recombinantly produced. Cas9 sequences from a number of bacterial species are well known in the art and listed in the National Center for Biotechnology Information (NCBI) database.
  • NCBI National Center for Biotechnology Information
  • sequences or a variant thereof comprising a sequence having at least about 70-100% sequence identity thereto, including any percent identity within this range, such as 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% sequence identity thereto, can be used for genome editing, as described herein, wherein the variant retains biological activity, such as Cas9 site-directed endonuclease activity. See also Fonfara et al. (2014) Nucleic Acids Res.
  • derivative is intended any suitable modification of the native polypeptide of interest, of a fragment of the native polypeptide, or of their respective analogs, such as glycosylation, phosphorylation, polymer conjugation (such as with polyethylene glycol), or other addition of foreign moieties, as long as the desired biological activity of the native polypeptide is retained.
  • Methods for making polypeptide fragments, analogs, and derivatives are generally available in the art.
  • fragment is intended a molecule consisting of only a part of the intact full-length sequence and structure.
  • the fragment can include a C-terminai deletion, an N- terminal deletion, and/or an internal deletion of the polypeptide.
  • Active fragments of a particular protein or polypeptide will generally include at least about 5-10 contiguous amino acid residues of the full length molecule, preferably at least about 15-25 contiguous amino acid residues of the full length molecule, and most preferably at least about 20-50 or more contiguous amino acid residues of the full length molecule, or any integer between 5 amino acids and the full length sequence, provided that the fragment in question retains biological activity, such as Cas9 site- directed endonuclease activity.
  • substantially purified generally refers to isolation of a substance (compound, polynucleotide, nucleic acid, protein, polypeptide, polypeptide composition) such that the substance comprises the majority percent of the sample in which it resides.
  • a substantially purified component comprises 50%, preferably 8Q%-85%, more preferably 90-95% of the sample.
  • Techniques for purifying polynucleotides and polypeptides of interest are well-known in the art and include, for example, ion-exchange chromatography, affinity chromatography and sedimentation according to density.
  • isolated is meant, when referring to a polypeptide, that the indicated molecule is separate and discrete from the whole organism with which the molecule is found in nature or is present in the substantial absence of other biological macro molecules of the same type.
  • isolated with respect to a polynucleotide is a nucleic acid molecule devoid, in whole or part, of sequences normally associated with it in nature; or a sequence, as it exists in nature, but having heterologous sequences in association therewith; or a molecule disassociated from the chromosome.
  • polynucleotide oligonucleotide
  • nucleic acid nucleic acid molecule
  • nucleic acid molecule polymeric form of nucleotides of any length, either ribonucl eotides or deoxyribonucleotides. This term refers only to the primary structure of the molecule. Thus, the term includes triple-, double- and single-stranded DNA, as well as triple- , double- and single-stranded RNA. it also includes modifications, such as by methylation and/or by capping, and unmodified fonns of the polynucleotide.
  • polynucleotide examples include polydeoxyribonucieotides (containing 2-deoxy-D-ribose), polyribonucleotides (containing D- ribose), any other type of polynucleotide which is an N- or C-glycoside of a purine or pyrimidine base, and other polymers containing nonnueleotidie backbones, for example, polyamide (e.g., peptide nucleic acids (PNAs)) and polymorpholino (commercially available from the Anti-Virais, Inc., Corvallis, Qreg., as Neugene) polymers, and other synthetic sequence-specific nucleic acid polymers providing that the polymers contain nudeobases in a configuration which allows for base pairing and base stacking, such as is found in DNA and RNA.
  • PNAs peptide nucleic acids
  • polymorpholino commercially available from the Anti-Virais, Inc., Corvallis, Qreg.,
  • polynucleotide oligonucleotide
  • nucleic acid nucleic acid molecule
  • these terms include, for example, 3 '-deoxy-2',5 '-DNA, oligodeoxyribonucleotide N3' P3' phosphoraxnidates, 2'-0-alkyl-substituted RNA, double- and single-stranded DNA, as well as double- and single -stranded RNA, microRNA, DNA:RNA hybrids, and hybrids between PNAs and DNA or RNA, and also include known types of modifications, for example, labels which are known in the art, methylation, "caps," substitution of one or more of the naturally occurring nucleotides with an analog (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3 -methyl
  • the term also includes locked nucleic acids (e.g., comprising a ribonucleotide that has a methylene bridge between the 2'-oxygen atom and the 4'-carbon atom).
  • locked nucleic acids e.g., comprising a ribonucleotide that has a methylene bridge between the 2'-oxygen atom and the 4'-carbon atom.
  • Tire terms “hybridize” and “hybridization” refer to the formation of complexes between nucleotide sequences which are sufficiently complementary to form duplexes via Watson-Crick base pairing.
  • identity refers to an exact nucleotide to nucleotide or amino acid to amino acid correspondence of two polynucleotides or polypeptide sequences, respectively. Percent identity can be determined by a direct comparison of the sequence information between two molecules by aligning the sequences, counting the exact number of matches between the two aligned sequences, dividing by the length of the shorter sequence, and multiplying the result by 100. Readily available computer programs can be used to aid in the analysis, such as ALIGN, Dayhoff, M.O. in Atlas of Protein Sequence and Structure M.O. Dayhoff ed., 5 Suppl.
  • nucleotide sequence identity is available in the Wisconsin Sequence Analysis Package, Version 8 (available from Genetics Computer Group, Madison, Wi) for example, the BESTFIT, FASTA and GAP programs, which also rely on the Smith and Waterman algorithm. These programs are readily utilized with the default parameters recommended by the manufacturer and described in the Wisconsin Sequence Analysis Package referred to above. For example, percent identity of a particular nucleotide sequence to a reference sequence can be determined using the homology algorithm of Smith and Waterman with a default scoring table and a gap penalty of six nucleotide positions.
  • Another method of establishing percent identity in the context of the present disclosure is to use the MPSRCH package of programs copyrighted by the University of Edinburgh, developed by John F. Collins and Shane S. Sturrok, and distributed by Intel liGenetics, Inc. (Mountain View, CA), From this suite of packages, the Smith Waterman algorithm can be employed where default parameters are used for the scoring table (for example, gap open penalty of 12, gap extension penalty of one, and a gap of six). From the data generated tire "Match" value reflects "sequence identity.”
  • Other suitable programs for calculating the percent identity or similarity between sequences are generally known in the art, for example, another alignment program is BLAST, used with default parameters.
  • homology can be determined by hybridization of polynucleotides under conditions which form stable duplexes between homologous regions, followed by- digestion with single stranded specific nuclease(s), and size determination of the digested fragments.
  • DNA sequences that are substantially homologous can be identified in a Southern hybridization experiment under, for example, stringent conditions, as defined for that particular system. Defining appropriate hybridization conditions is within the skill of the art. See, e.g., Sambrook et ak, supra. ⁇ , DNA Cloning, supra ; Nucleic Acid Hybridization, supra.
  • homologous region refers to a region of a nucleic acid with homology to another nucleic acid region. Thus, whether a "homologous region” is present in a nucleic acid molecule is determined with reference to another nucleic acid region in the same or a different molecule. Further, since a nucleic acid is often double-stranded, the term “homologous, region,” as used herein, refers to the ability of nucleic acid molecules to hybridize to each other. For example, a single-stranded nucleic acid molecule can have two homologous regions which are capable of hybridizing to each other. Thus, the term “homologous region” includes nucleic acid segments with complementary sequences.
  • Homologous regions may vary' in length but will typically be between 4 and 500 nucleotides (e.g., from about 4 to about 40, from about 40 to about 80, from about 80 to about 12.0, from about 120 to about 160, from about 160 to about 200, from about 200 to about 240, from about 240 to about 280, from about 280 to about 320, from about 320 to about 360, from about 360 to about 400, from about 400 to about 440, etc.).
  • nucleotides e.g., from about 4 to about 40, from about 40 to about 80, from about 80 to about 12.0, from about 120 to about 160, from about 160 to about 200, from about 200 to about 240, from about 240 to about 280, from about 280 to about 320, from about 320 to about 360, from about 360 to about 400, from about 400 to about 440, etc.
  • complementary refers to polynucleotides that are able to form base pairs with one another. Base pairs are typically formed by hydrogen bonds between nucleotide units in an anti-parallel orientation between polynucleotide strands. Complementary polynucleotide strands can base pair in a Watson- Crick manner (e.g., A to T, A to U, C to G), or in any other manner that allows for the formation of duplexes. As persons skilled in the art are aware, when using RNA as opposed to DNA, uracil (U) rather than thymine (T) is the base that is considered to be complementary to adenosine.
  • uracil when a uracil is denoted in the context of the present disclosure, the ability to substitute a thymine is implied, unless otherwise stated.
  • “Complementarity” may exist between two RNA strands, two DNA strands, or between a RNA strand and a DNA strand, it is generally understood that two or more polynucleotides may be “complementary” and able to form a duplex despite having less than perfect or less than 100% complementarity'.
  • Two sequences are "perfectly complementary” or “100% complementary” if at least a contiguous portion of each polynucleotide sequence, comprising a region of complementarity, perfectly base pairs with the other polynucleotide without any mismatches or interruptions within such region.
  • TWO or more sequences are considered “perfectly complementary'” or “100% complementary” even if either or both polynucleotides contain additional non-complementary sequences as long as the contiguous region of complementarity within each polynucleotide is able to perfectly hybridize with the other.
  • "Less than perfect" complementarity refers to situations where less than all of the contiguous nucleotides within such region of complementarity are able to base pair with each other.
  • a gRNA may comprise a sequence "complementary" to a target sequence (e.g., major or minor allele), capable of sufficient base-pairing to form a duplex (i.e., the gRNA hybridizes with the target sequence). Additionally, the gRNA may comprise a sequence complementary to a sequence adjacent to a PAM sequence, wherein the gRNA also hybridizes with the sequence adjacent to a PAM sequence in a target DNA.
  • a "target site” or “target sequence” is the nucleic acid sequence recognized (i.e., sufficiently complementary for hybridization) by a guide RNA (gRNA) or a homology arm of a donor polynucleotide.
  • gRNA guide RNA
  • the target site may be allele-specific (e.g., a major or minor allele).
  • the term “subject expression sequence” refers to any polynucleotide of tiny length and any sequence that can be transcribed into RNA.
  • the subject expression sequence is a polynucleotide inserted within the msd region of the retron non-coding RNA (ncRNA) which is converted to complementary DNA (cDNA) during reverse transcription .
  • ncRNA retron non-coding RNA
  • cDNA complementary DNA
  • the subject expression sequence is a donor polynucleotide.
  • donor polynucleotide or “donor sequence” refers to a polynucleotide that provides a sequence of an intended edit to be integrated into the genome at a target locus by HDR.
  • homology arm is meant a portion of a donor polynucleotide that is responsible for targeting the donor polynucleotide to the genomic sequence to be edited in a cell.
  • the donor polynucleotide typically comprises a 5' homology arm that hybridizes to a 5' genomic target sequence and a 3' homology arm that hybridizes to a 3' genomic target sequence flanking a nucleotide sequence comprising the intended edit to the genomic DNA, with the positive or plus strand of the double helix (also called Watson strand) used arbitrarily as the reference.
  • the homology arms are referred to herein as 5' and 3' (i.e., upstream and downstream) homology arms, which relates to the relative position of the homology anus to the nucl eotide sequence comprising the intended edit within the donor polynucleotide.
  • the 5 ' and 3' homology- arms hybridize to regions within the target locus in the genomic DNA to be modified, which are referred to herein as the "5' target sequence” and "3' target sequence,” respectively.
  • the nucleotide sequence comprising the intended edit is integrated into the genomic DNA by HDR at the genomic target locus recognized (i.e., sufficiently complementary for hybridization) by the 5' and 3' homology arms.
  • administering a nucleic acid, such as a retron, a nucleic acid encoding a fusion of an RNA binding domain or single stranded nucleic acid binding domain and DMA break localizing domain, guide RMA, or Cas9 expression system, to a cell comprises transforming, transducing, transfecting, electroporating, translocating, fusing, phagocytosing, shooting or ballistic methods, etc., i.e., any means by which a nucleic acid can be transported across a cell membrane.
  • a gRNA will bind to a substantially complementary sequence and not to unrelated sequences.
  • a gRNA that "selectively binds" to a particular allele such as a particular mutant allele (e.g., allele comprising a substitution, insertion, or deletion), denotes a gRNA that binds preferentially to the particular target allele, but to a lesser extent to a wild-type allele or other sequences.
  • a gRNA that selectively binds to a particular target DNA sequence will selectively direct binding of an RNA -guided nuclease (e.g., Cas9) to a substantially complementary sequence at the target site and not to unrelated sequences.
  • an RNA -guided nuclease e.g., Cas9
  • recombination target site denotes a region of a nucleic acid molecule comprising a binding site or sequence-specific motif recognized by a site-specific recombinase that binds at the target site and catalyzes recombination of specific sequences of DNA at the target site.
  • Site-specific recombinases catalyze recombination between two such target sites. The location and relative orientation of the target sites determines the outcome of recombination. For example, translocation occurs if the recombination target sites are on separate DMA molecules.
  • label and “detectable label” refer to a molecule capable of detection, including, but not limited to, radioactive isotopes, fluorescers, chemilimiineseers, chromophores, enzymes, enzyme substrates, enzyme cofactors, enzyme inhibitors, semiconductor nanoparticles, dyes, metal ions, metal sols, ligands (e.g., biotin, streptavidin or haptens) and the like.
  • fluorescer refers to a substance or a portion thereof which is capable of exhibiting fluorescence in the detectable range.
  • Recombinant as used herein to describe a nucleic acid molecule means a polynucleotide of genomic, cDNA, viral, semisynthetic, or synthetic origin which, by virtue of its origin or manipulation, is not associated with ail or a portion of the polynucleotide with which it is associated in nature.
  • the term "recombinant” as used with respect to a protein or polypeptide means a polypeptide produced by expression of a recombinant polynucleotide.
  • the gene of interest is cloned and then expressed in transformed organisms, as described further below. The host organism expresses the foreign gene to produce the protein under expression conditions.
  • transformation refers to the insertion of an exogenous polynucleotide into a host cell, irrespective of the method used for the insertion. For example, direct uptake, transduction or f-mating are included.
  • Hie exogenous polynucleotide may be maintained as a non-integrated vector, for example, a plasmid, or alternatively, may be integrated into the host genome.
  • Recombinant host cells refer to cells which can be, or have been, used as recipients for recombinant vector or other transferred DNA, and include the original progeny of the original cell which has been transfected.
  • a "coding sequence” or a sequence which "encodes" a selected polypeptide is a nucleic acid molecule which is transcribed (in the case of DNA) and translated (in the case of mRNA) into a polypeptide in vivo when placed under the control of appropriate regulatory sequences (or " control elements"). Hie boundaries of the coding sequence can be determined by a start codon at the 5' (amino) terminus and a translation stop codon at the 3' (carboxy) terminus.
  • a coding sequence can include, but is not limited to, cDNA from viral, prokaryotic or eukaryotic mRNA, genomic DNA sequences from viral or prokaryotic DNA, and even synthetic DNA sequences.
  • a transcription termination sequence may be located 3' to the coding sequence.
  • the coding sequence may be interrupted by introns which can be self- splicing group I or group P introns or those which are spliced out by the host cell splicing machinery ⁇ ,
  • control elements include, but are not limited to, transcription promoters, transcription enhancer elements, introns (located anywhere in the transcript), transcription termination signals, polyadenylation sequences (located 3' to the translation stop codon), sequences for optimization of initiation of translation (located 5' to the coding sequence), and translation termination sequences.
  • operably linked refers to an arrangement of elements wherein the components so described are configured so as to perform their usual function.
  • a given promoter operably linked to a coding sequence is capable of effecting the expression of the coding sequence when the proper enzymes are present.
  • the promoter need not be contiguous with the coding sequence, so long as it functions to direct the expression thereof.
  • intervening untranslated yet transcribed sequences can be present between the promoter sequence and the coding sequence and the promoter sequence can still be considered “operably linked" to the coding sequence.
  • Expression cassette or "expression construct” refers to an assembly which is capable of directing the expression of the sequence(s) or gene(s) of interest.
  • An expression cassette generally includes control elements, as described above, such as a promoter which is operably linked to (so as to direct transcription of) the sequence(s) or gene(s) of interest, and often includes a polyadenylation sequence as well.
  • the expression cassete described herein may be contained within a plasmid or viral vector construct (e.g., a vector for genome modification comprising a genome editing cassette comprising a promoter operably linked to a polynucleotide encoding a guide RNA and a donor polynucleotide).
  • the construct may also include, one or more selectable markers, a signal which allows the construct to exist as single stranded DNA (e.g., a M13 origin of replication), at least one multiple cloning site, and a "mammalian" origin of replication (e.g., a SV40 or adenovirus origin of replication) or “yeast” origin of replication (e.g. a 2-micron vector or centromeric vector with an autonomously replicating sequence (ARS)).
  • a signal which allows the construct to exist as single stranded DNA e.g., a M13 origin of replication
  • at least one multiple cloning site e.g., a "mammalian" origin of replication (e.g., a SV40 or adenovirus origin of replication) or “yeast” origin of replication (e.g. a 2-micron vector or centromeric vector with an autonomously replicating sequence (ARS)).
  • ARS autonomously replicating sequence
  • Polynucleotide refers to a polynucleotide of interest or fragment thereof which is essentially free, e.g., contains less than about 50%), preferably less than about 70%, and more preferably less than about at least 90%, of the protein with which the polynucleotide is naturally associated.
  • Techniques for purifying polynucleotides of interest include, for example, disruption of the cell containing the polynucleotide with a chaotropic agent and separation of the polynucleotide(s) and proteins by ion-exchange chromatography, affinity chromatography and sedimentation according to density.
  • transfection is used to refer to the uptake of foreign DNA by a cell.
  • a cell has been "transfected” when exogenous nucleic acids have been introduced inside the ceil membrane.
  • transfection techniques are generally known In the art. See, e.g., Graham et al. (1973) Virology, 52:456, Sambrook et al. (2001) Molecular Cloning, a laboratory manual, 3rd edition, Cold Spring Harbor Laboratories, New York, Davis et al. (1995) Basic Methods in Molecular Biology, 2nd edition, McGraw-Hill, and Chu et al. (1981) Gene 13:197.
  • Such techniques can be used to introduce one or more exogenous nucleic acids moieties into suitable host ceils.
  • the term refers to both stable and transient uptake of the genetic material and includes uptake of peptide- or antibody-linked nucleic acids.
  • a “vector” is capable of transferring nucleic acid sequences to target cells (e.g., viral vectors, non-viral vectors, particulate carriers, and liposomes).
  • target cells e.g., viral vectors, non-viral vectors, particulate carriers, and liposomes.
  • vector construct e.g., viral vectors, non-viral vectors, particulate carriers, and liposomes.
  • expression vector e transfer vector
  • the term includes cloning and expression vehicles, as well as plasmid and viral vectors.
  • variant refers to biologically active derivatives of the reference molecule that retain desired activity, such as site -directed Cas9 endonuclease activity.
  • analog refers to compounds having a native polypeptide sequence and structure with one or more amino acid additions, substitutions (generally conservative in nature) and/or deletions, relative to the native molecule, so long as the modifications do not destroy biological activity and which are "substantially homologous" to the reference molecule as defined below.
  • amino acid sequences of such analogs will have a high degree of sequence homology to the reference sequence, e.g., amino acid sequence homology of more than 50%, generally more than 60%-70%, even more particularly 80%-85% or more, such as at least 90% ⁇ 95% or more, when the two sequences are aligned.
  • the analogs will include the same number of amino acids but will include substitutions, as explained herein.
  • mutant further includes polypeptides having one or more amino acid-like molecules including but not limited to compounds comprising only amino and/or imino molecules, polypeptides containing one or more analogs of an amino acid (including, for example, unnatural amino acids, etc.), polypeptides with substituted linkages, as well as other modifications known in the art, both naturally occurring and non-naturally occurring (e.g., synthetic), cydized, branched molecules and the like.
  • the term also includes molecules comprising one or more N-substituted glycine residues (a "peptoid") and other synthetic amino acids or peptides. (See, e.g., U.S. Patent Nos.
  • analogs generally include substitutions that are conservative in nature, i.e., those substitutions that take place within a family of amino acids that are related in their side chains.
  • amino acids are generally divided into four families: (1) acidic — aspartate and glutamate; (2) basic — lysine, arginine, histidine; (3) non-polar — alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan; and (4) uncharged polar — glycine, asparagine, glutamine, cysteine, serine threonine, and tyrosine.
  • Phenylalanine, tryptophan, and tyrosine are sometimes classified as aromatic amino acids.
  • the polypeptide of interest may include up to about 5-10 conservative or non -conservative amino acid substitutions, or even up to about 15-25 conservative or non-conservative amino acid substitutions, or any integer between 5-25, so long as the desired function of the molecule remains intact.
  • One of skill in the art may readily determine regions of the molecule of interest that can tolerate change by reference to Hopp/Woods and Kyte-Doolittle plots, well known m the art.
  • Gene transfer or “gene delivery'” refers to methods or systems for reliably inserting DNA or RNA of interest into a host cell. Such methods can result in transient expression of non-integrated transferred DMA, extrachromosomal replication and expression of transferred replicons (e.g., episomes), or integration of transferred genetic material into the genomic DMA of host cells.
  • Gene delivery expression vectors include, but are not limited to, vectors derived from bacterial plasmid vectors, viral vectors, non-vira! vectors, adenoviruses, retroviruses, alphavimses, pox viruses, and vaccinia viruses.
  • derived from is used herein to identify the original source of a molecule but is not meant to limit the method by which the molecule is made which can be, for example, by chemical synthesis or recombinant means.
  • a polynucleotide "derived from" a designated sequence refers to a polynucleotide sequence which comprises a contiguous sequence of approximately at least about 6 nucleotides, preferably at least about 8 nucleotides, more preferably at least about 10-12 nucleotides, and even more preferably at least about 15-20 nucleotides corresponding, i.e., identical or complementary to, a region of the designated nucleotide sequence.
  • the derived polynucleotide will not necessarily be derived physically from the nucleotide sequence of interest, but may be generated in any manner, including, but not limited to, chemical synthesis, replication, reverse transcription or transcription, which is based on the information provided by the sequence of bases in the region(s) from which the polynucleotide is derived. As such, it may represent either a sense or an antisense orientation of the original polynucleotide.
  • subject includes both vertebrates and invertebrates, including, without limitation, mammals, including human and non-human mammals such as non-human primates, including chimpanzees and other apes and monkey species; laboratory animals such as mice, rats, rabbits, hamsters, guinea pigs, and chinchillas; domestic animals such as dogs and cats; farm animals such as sheep, goats, pigs, horses and cows; and birds such as domestic, wild and game birds, including chickens, turkeys and other gallinaceous birds, ducks, geese, and the like.
  • mammals including human and non-human mammals such as non-human primates, including chimpanzees and other apes and monkey species
  • laboratory animals such as mice, rats, rabbits, hamsters, guinea pigs, and chinchillas
  • domestic animals such as dogs and cats
  • farm animals such as sheep, goats, pigs, horses and cows
  • birds such as domestic, wild and game birds,
  • the methods of the present disclosure find use in experimental animals, in veterinary application, and in the de velopment of animal models for disease, including, but not limited to, rodents including mice, rats, and hamsters; primates, and transgenic animals,
  • ' subject preferably a mammal, more preferably a human.
  • Mammals include, but are not limited to, murines, simians, humans, farm animals, sport animals, and pets. Tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro are also encompassed.
  • Genetic disease refers to a disease, partially or completely, directly or indirectly, caused by one or more abnormalities in the genome, especially a condition that is present from birth.
  • the abnormality' may be a mutation, an insertion or a deletion.
  • the abnormality' may affect the coding sequence of the gene or its regulatory' sequence .
  • the genetic disease may be selected from the group consisting of an inherited muscle disease (e.g., congenital myopathy' or a muscular dystrophy ⁇ , a lysosomal storage disease, a heritable disorder of connective tissue, a neurodegenerative disorder, and a skeletal dysplasia.
  • the genetic disease may be, but is not limited to, Duchenne muscular dystrophy (DMD), Becker's muscular dystrophy, Lamb-girdle muscular dystrophy, dysferlinopathy, dystroglycanopatliy, aspartylglucosaminuria, Batten disease, cystinosis, Fabry- 7 disease, Gaucher disease, Pompe disease, Tay Sachs disease, Sandhoff disease, metachromatic leukodystrophy, mucolipidosis, mucopolysaccharide storage diseases, Niemann-Pick disease, Schindler disease, Krabbe disease, Ehlers-Danlos syndrome, epidermolysis bullosa, Marfan syndrome, neurofibromatosis, spinal muscular atrophy, amyotrophic lateral sclerosis, progressive muscular atrophy, fragile X syndrome, Charcot-Marie-Tooth disease, osteogenesis imperfecta, achondroplasia, or osteopetrosis.
  • DMD Duchenne muscular dystrophy
  • Becker's muscular dystrophy Lamb
  • ribozyme refers to an RNA molecule that is capable of catalyzing a biochemical reaction.
  • ribozymes function in protein synthesis, catalyzing the linking of amino acids m the ribosome.
  • ribozymes participate in various other RNA processing functions, such as splicing, viral replication, and tRNA biosynthesis.
  • ribozymes can be self-cleaving.
  • Non-limiting examples of ribozymes include the HDV ribozyme, the Lariat capping ribozyrne (formally called GIR1 branching ribozyrne), the glmS ribozyme, group I and group II self-splicing introns, the hairpin ribozyme, the hammerhead ribozyme, various rRNA molecules, RNase P, the twister ribozyme, the VS ribozyme, the pistol ribozyme, and the hatchet ribozyrne.
  • the HDV ribozyme the Lariat capping ribozyrne (formally called GIR1 branching ribozyrne)
  • the glmS ribozyme group I and group II self-splicing introns
  • the hairpin ribozyme the hammerhead ribozyme
  • various rRNA molecules RNase P
  • the twister ribozyme the VS rib
  • ribozyme -containing R2 elements examples include the self- cleaving ribozyme -containing R2 elements, the LITc retrotransposon found in Trypanosoma eruzi, short interspaced nuclear elements (SINEs) in Schistosomes, Penelope-like elements and retrozymes.
  • SINEs short interspaced nuclear elements
  • Penelope-like elements retrozymes.
  • ribozymes see, e.g., Doherty, et al. Ann. Rev. Biophys. Biomol. Struct. 30: 457-475 (2001) and Weinberg, et al., Nucleic Acids Research, (47) 18: 9480-9494 (2019); incorporated herein by reference in its entirety for all purposes.
  • administering includes oral administration, topical contact, administration as a suppository, intravenous, intraperitoneal, intramuscular, intraiesionai, intrathecal, intranasal, or subcutaneous administration to a subject. Administration is by any route, including parenteral and transmucosaJ (e.g., buccal, sublingual, palatal, gingival, nasal, vaginal, rectal, or transdermal). Parenteral administration includes, e.g., intravenous, intramuscular, intra-arteriole, intradermal, subcutaneous, intraperitoneal, intraventricular, and intracranial.
  • parenteral and transmucosaJ e.g., buccal, sublingual, palatal, gingival, nasal, vaginal, rectal, or transdermal.
  • Parenteral administration includes, e.g., intravenous, intramuscular, intra-arteriole, intradermal, subcutaneous, intraperitoneal, intraventricular, and intracranial.
  • Administering also refers to deli very of material, including biological material such as nucleic acids and/or proteins, into cells by transformation, transfection, transduction, ballistic methods and/or electroporation.
  • treating refers to an approach for obtaining beneficial or desired results including, but not limited to, a therapeutic benefit and/or a prophylactic benefit.
  • therapeutic benefit is meant any therapeutically relevant improvement in or effect on one or more diseases, conditions, or symptoms under treatment.
  • the compositions may be administered to a subject at risk of de veloping a particular disease, condition, or symptom, or to a subject reporting one or more of the physiological symptoms of a disease, even though the disease, condition, or symptom may not have yet been manifested.
  • effective amount or “sufficient amount” refers to the amount of an agent that is sufficient to effect beneficial or desired results.
  • the therapeutically effective amount may vary depending upon one or more of: the subject and disease condition being treated, the weight and age of the subject, the severity of the disease condition, the manner of administration and the like, which can readily be determined by one of ordinary' skill m the art.
  • the specific amount may vary depending on one or more of: the particular agent chosen, the host cell type, the location of the host cell in the subject, the dosing regimen to be followed, whether it is administered in combination with other compounds, timing of administration, and the physical delivery system in which it is carried.
  • pharmaceutically acceptable carrier refers to a substance that aids the administration of an active agent to a cell, an organism, or a subject.
  • “Pharmaceutically acceptable carrier” refers to a carrier or excipient that can be included in the compositions of the invention and that causes no significant adverse toxicological effect on the patient.
  • Non- limiting examples of pharmaceutically acceptable carrier include water, NaCl, normal saline solutions, lactated Ringer's, normal sucrose, normal glucose, cell culture media, and the like.
  • pharmaceutically acceptable carrier include water, NaCl, normal saline solutions, lactated Ringer's, normal sucrose, normal glucose, cell culture media, and the like.
  • heterologous refers to biological material that is introduced, inserted, or incorporated into a recipient (c.g . host) organism that originates from another organism.
  • the heterologous material that is introduced into the recipient organism e.g., a host cell
  • Heterologous material can include, but is not limited to, nucleic acids, amino acids, peptides, proteins, and structural elements such as genes, promoters, and cassettes.
  • a host cell can be, but is not limited to, a bacterium, a yeast cell, a mammalian cell, or a plant cell.
  • heterologous material into a host cell or organism can result, in some instances, in the expression of additional heterologous material in or by the host cell or organism.
  • the transformation of a yeast host cell with an expression vector that contains DNA sequences encoding a bacterial protein may result in the expression of the bacterial protein by the yeast cell.
  • the incorporation of heterologous material may be permanent or transient.
  • the expression of heterologous material may be permanent or transient.
  • nucleic acid encoding a retron that includes (a) one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences, (b) an msr sequence, (c) an msd sequence, and (d) a subject expression sequence within the msd sequence.
  • the subject expression sequence comprises a donor sequence for homologous directed repair (HDR).
  • RNA binding domain recognition sequence is an RNA sequence specifically bound by an RNA binding domain of a polypeptide or an aptamer.
  • RNA binding domain recognition sequences that bind polypeptide RNA binding domains include, but are not limited to, MS2 stem loop sequence which binds to the M82 coat protein (MCP), a Pumilio (PUF) recognition sequence, RNA Recognition Motif (RRM) recognition sequence, Double- Stranded RNA -Binding Domain (dsRBD) recognition sequence.
  • An exemplary MS2 coat protein is a bacteriophage MS2 coat protein (see, for example UniProtKB - J9QBW2 (J9QBW2JBPMS2) and UniProtKB - P03612 (CAPSD_BPMS2)).
  • the single stranded nucleic acid binding domain recognition sequence is a single stranded DNA or RNA sequence specifically bound by a single stranded mieleici acid binding domain of a polypeptide or an aptamer.
  • Non-limiting examples of single stranded nucleic acid binding domain recognition sequences are described in Dickey et al., “Single-Stranded DNA- Binding Proteins: Multiple Domains for Multiple Functions,” Structure 21(7), pgs 1074-1084, July 2, 2013, and references cited therein. As described in Dickey et al.
  • single stranded DNA-binding proteins have a wide range of structures and functions, but many of them contain small autonomous domains whose recognition of ssDNA has been well studied. These domains include four structural topologies that have been structurally characterized with ssDNA: oligonucleotide/oligosaccharide/oiigopeptide-binding (OB) folds, K homology (KH) domains, RNA recognition motifs (RRMs), and whiriy domains.
  • OB oligonucleotide/oligosaccharide/oiigopeptide-binding
  • KH K homology domains
  • RRMs RNA recognition motifs
  • OB folds are formed from a five-stranded b barrel with interspersed loop and helical elements, show significant structural divergence and are capable of binding a variety of ligands in addition to ssDNA and ssRNA (Theobald et al., 2003).
  • OB folds can bind ssDNA with high sequence specificity.
  • telomere-end protection (TEP) proteins utilize OB folds to sequence specifically bind the GT-rich 30 ssDNA overhang constitutively found at the end of eukaryotic telomeres (reviewed in Horvath, 2011; Lewis and Wuttke, 2012).
  • Kid domains are small domains (approximately 70 aa) characterized by three a helices packed against a three-stranded b sheet (reviewed in Valverde et al., 2008), and KH domains from proteins structurally characterized in complex with ssDNA include heterogeneous ribonucleoprotein K (hnRNP K), far upstream element (FUSE)-binding protein (EBP), and poly(C)-binding proteins (PCBP) 1 and 2.
  • hnRNP K heterogeneous ribonucleoprotein K
  • FUSE far upstream element
  • PCBP poly(C)-binding proteins
  • RRMs most often bind RNA, but have also been shown to bind ssDNA (reviewed in Cle'ry et al., 2008).
  • RRMs are typically about 90 aa in length and form a relatively large b sheet surface (more similar to OB folds than to KH domains) packed against two a helices.
  • the majority of RRMs contain two conserved sequence motifs (RNPs) on strands 1 and 3 that form the primary nucleic acid-binding surface. Residues found elsewhere in the sheet (sometimes including an additional strand) and intervening loops also contribute to nucleic acid binding.
  • Whirly domains are large (approximately 180 aa) domains that contain two roughly parallel four-stranded b sheets with interspersed helical elements, individual domains form tetramers through interaction of the helices, and these tetramers further interact to form hexamers of tetramers (Cappadocia et al, 2010, 2012). See Dickey et al., “Single-Stranded DNA-Binding Proteins: Multiple Domains for Multiple Functions,” Structure 21(7), pgs 1074-1084, July 2, 2013, and references cited therein.
  • the one or more single stranded nucleic acid binding domain recognition sequences include, but are not limited to, oligoiiucleotide/oligosaceharide/oligopeptide-bindmg (OB) folds, such as in human POTl, Schizosaccharomyces pombe Potl, Sterkiella nova TEPB, CdcI3, CspB protein from Bacillus caldolyticus and Bacillus subtilis ; K homology (KH) domains, such as in KH domain- containing proteins heterogeneous ribonucleoprotein K (hnRNP K), far upstream element (FUSE)-bmding protein (FBP), and poly(C) ⁇ binding proteins (PCBP) 1 and 2; RNA recognition motifs (RRMs) which bind DNA such as in FBP-interacting repressor (FIR), hnRNP Al, and hnRNP D (also known as Aufl); and whi
  • the single stranded nucleic acid binding domain is a single stranded nucleic acid binding domain of a a G-quadruplex binding domain including nucleolin, hnRNP, serine/arginine-rich splicing factors (SRSF) 1 and 9, splicing factor U2AF, TRF2, FRM2, and the RNA helicase associated with AU-rich element (RHAU) proteins (see V. Brazda et al,, DNA and RNA quadruplex -binding proteins. Int I Mol Sci. 2014; 15(10): 17493-17517. doi : 10.3390/ijms 151017493 ) .
  • SRSF serine/arginine-rich splicing factors
  • chimeric constructs encoding a retron multicopy single- stranded DNA (msDNA), which comprises an msr RNA covalently attached to a msd DNA, wherein the RNA comprises one or more RNA binding domain recognition sequences and an msr sequence; and wherein the DNA comprises an msd sequence and a subject expression sequence within the msd sequence.
  • the subject expression sequence comprises a donor sequence for homologous directed repair (HDR).
  • RNA binding domain is an RNA binding domain of a polypeptide that binds to a MS2 stem loop sequence which binds to the M82 coat protein (MCP), a Pumilio (PUF) recognition sequence, an RNA Recognition Motif (RRM) recognition sequence, a Double-Stranded RNA-Binding Domain (dsRBD) recognition sequence, a Zinc finger (ZF) Domain recognition sequence, a Z-a!pha, arginine/glycine rich (RGG) domain recognition sequence, a K Homology (KH) Domain recognition sequence, or a Poly(A) tail.
  • MCP M82 coat protein
  • PAF Pumilio
  • RRM RNA Recognition Motif
  • dsRBD Double-Stranded RNA-Binding Domain
  • ZF Zinc finger
  • ZF Zinc finger domain recognition sequence
  • Z-a!pha arginine/glycine rich domain recognition sequence
  • KH K Homology domain recognition sequence
  • Poly(A) tail
  • the single stranded nucleic acid binding domain is a single stranded nucleic acid binding domain of a polypeptide that binds to a specific sequence of a single stranded DNA or RNA.
  • Single stranded nucleic acid binding domain recognition domains of polypeptides include, but are not limited to, oiigonudeotide/oligosaccharide/oligopeptide- hinding (OB) folds, such as in such as human POTl, Schizosaccharomyces pornhe Pot I, Sterkiella nova TEPB, CspB protein from Bacillus caldolyticus and Bacillus subtilis ; K homology (KH) domains, such as in KH domain-containing proteins include heterogeneous ribonucleoprotein K (hnRNP K), far upstream element (FUSE)-binding protein (FBP), and polyfQ-binding proteins (PCBP) 1 and 2; RNA recognition motifs (RRMs)
  • RNA binding proteins with well-characterized motifs can be utilized for recruiting the retron msDNA.
  • an inverted LexA-LexA repeat with an intervening loop sequence could be inserted into the reverse -transcribed portion of the retron donor as described in FIG IB. Upon reverse transcription these inverted repeats would fold back on one another creating a highly stable stem loop structure and enable the LexA DNA binding domain to be utilized.
  • the FHA domain could be replaced with other domains known to bind to double-strand breaks, or the MCP could be fused directly to Cas9 to have retron donor present at the cut site when Cas9 cleavage occurs.
  • RNA binding domains and aptamers could be used in place of the MS2 system such as the programmable RNA-binding domains of Pumilio/fem- 3 mRNA binding factors (PUF domains) (Zhao et ah, Nucleic Acids Research, 2018 PMCID: PMC5961129) or using CRISPR-Cas systems, where the scaffold for a deactivated Cas nuclease could be introduced in place of MS2 loops, and the deactivated Cas enzyme fused to the FHA domain ,
  • PEF domains Pumilio/fem- 3 mRNA binding factors
  • CRISPR-Cas systems where the scaffold for a deactivated Cas nuclease could be introduced in place of MS2 loops, and the deactivated Cas enzyme fused to the FHA domain
  • the DNA break site localizing domain is a DNA break site localizing domain of a polypeptide listed in Tables 1 -5 below.
  • msDNA retron multicopy single-stranded DNA
  • msDNA which comprises an msr RNA covalently attached to a msd DNA complexes including a chimera of an RNA hybridized to a DNA, wherein the RNA comprises one or more RNA binding domain recognition sequences and an msr sequence; and wherein the DNA comprises an msd sequence and a subject expression sequence within the msd sequence, and where the chimera is non-covalently bound to a polypeptide that includes an RNA binding domain or single stranded nucleic acid binding domain bound to a DNA break site localizing domain.
  • msDNA retron multicopy single-stranded DNA
  • retrons comprising msr, msd, and inverted repeat sequences that can be used in the nucleic acids of the disclosure are provided in Table 6.
  • the retrons in Table 6 also express reverse transcriptases that can be used in the methods of the disclosure..
  • the retron encoded by the nucleic acids described herein is a Retron-Eco I (Ec86) retron and reverse transcriptase system. II. Methods of use
  • sequence specific endonuclease is a Cas9 endonuclease, a Casl2a endonuclease, a Zinc -finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease.
  • sequence specific endonuclease is a Cas9 endonuclease or a Casl2a endonuclease, and wherein the method comprises administering to the subject one or more a guide RNAs (gRNAs), or one or more nucleic acids encoding the same.
  • gRNAs guide RNAs
  • RNA binding domain recognition sequences or oue or more single stranded nucleic acid binding domain recognition sequences comprising administering to the subject an effective amount of (i) any of the compositions described above encoding a retron that includes (a) one or more RNA binding domain recognition sequences or oue or more single stranded nucleic acid binding domain recognition sequences, (b) an msr sequence, (c) an msd sequence, and (d) a subject expression sequence within the msd sequence, (2) a polypeptide comprising an RNA binding domain or single stranded nucleic acid binding domain covalently bound to a DNA break site localizing domain or its encoding nucleic acid, (3) a reverse transcriptase or a nucleic acid encoding the same, and (4) a sequence specific endonuclease or a nucleic acid encoding, thereby treating the disease.
  • sequence specific endonuclease is a CRISPR-associated nuclease, such as Cas9 endonuclease, a Cpfl (also known as Casl2a) endonuclease, a Zinc -finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease.
  • CRISPR-associated nuclease such as Cas9 endonuclease, a Cpfl (also known as Casl2a) endonuclease, a Zinc -finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease.
  • sequence specific endonuclease is a Cas9, Cpfl (also known as Cas!2a), C2cl, FokI-dCas9, dCasl3, or dCasl4 endonuclease, and wherein the method comprises administering to the subject one or more a guide RNAs (gRNAs), or one or more nucleic acids encoding the same.
  • gRNAs guide RNAs
  • Genome editing may be performed on a single cell or a population of cells of interest and can be performed on any type of cell, including any cell from a prokaryotic, eukaryotic, or archaeon organism, including bacteria, archaea, fungi, protists, plants, and animals.
  • Cells from tissues, organs, and biopsies, as well as recombinant cells, genetically modified cells, cells from cell lines cultured in vitro , and artificial ceils (e.g., nanoparticles, liposomes, polymersomes, or microcapsules encapsulating nucleic acids) may ail be used in the practice of the present disclosure.
  • the methods of the disclosure are also applicable to editing of nucleic acids m cellular fragments, cell components, or organelles comprising nucleic acids (e.g., mitochondria in animal and plant cells, plastids (e.g., chloroplasts) in plant cells and algae).
  • Cells may be cultured or expanded prior to or after performing genome editing as described herein, in one embodiment, the cells are yeast cells.
  • RNA-guided nuclease can be targeted to a particular genomic sequence (i.e., genomic target sequence to be modified) by altering its guide RRA sequence.
  • a target-specific guide RNA comprises a nucleotide sequence that is complementary to a genomic target sequence, and thereby mediates binding of the nuclease-gRNA complex by hybridization at the target site.
  • the gRNA can be designed with a sequence complementary to the sequence of a minor allele to target the nuclease-gRNA complex to the site of a mutation.
  • the mutation may comprise an insertion, a deletion, or a substitution.
  • the mutation may include a single nucleotide variation, gene fusion, translocation, inversion, duplication, frame shift, missense, nonsense, or other mutation associated with a phenotype or disease of interest.
  • the targeted minor allele may be a common genetic variant or a rare genetic variant.
  • the gRNA is designed to selectively bind to a minor allele with single base-pair discrimination, for example, to allow binding of the nuclease-gRNA complex to a single nucleotide polymorphism (SNP).
  • SNP single nucleotide polymorphism
  • the gRNA may be designed to target disease-relevant mutations of interest for the purpose of genome editing to remove the mutation from a gene.
  • the gRNA can be designed with a sequence complementary to the sequence of a major or wild-type allele to target the nuclease-gRN A complex to the allele for the purpose of genome editing to introduce a mutation into a gene in the genomic DNA of the ceil, such as an insertion, deletion, or substitution.
  • Such genetically modified cells can be used, tor example, to alter phenotype, confer new properties, or produce disease models for drug screening.
  • the RNA-guided nuclease used for genome modification is a clustered regularly interspaced short palindromic repeats (CRISPR) system Cas nuclease.
  • CRISPR regularly interspaced short palindromic repeats
  • Any RNA-guided Cas nuclease capable of catalyzing site-directed cleavage of DNA to allow integration of donor polynucleotides by the HDR mechanism can be used in genome editing, including CRISPR system type I, type P, or type III Cas nucleases.
  • Cas proteins include Casl, Cas IB, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8al, Cas8a2, CasBb, CasBc, Cas9 (Csnl or Csxl2), CaslO, CaslOd, Casl2a/Cpfl, Mad7TM (INSCRIPTA ®), CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csni6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Cs
  • a type Ii CRISPR system such as a Cas9 endonuclease
  • Cas9 nucleases from any species, or biologically active fragments, variants, analogs, or derivatives thereof that retain Cas9 endonuclease activity (i.e., catalyze site-directed cleavage of DNA to generate double-strand breaks) may be used to perform genome modification as described herein.
  • the Cas9 need not be physically derived from an organism, but may be synthetically or reeombmantfy produced. Cas9 sequences from a number of bacterial species are well known in the art. and listed in the National Center for Biotechnology Information (NCBI) database.
  • NCBI National Center for Biotechnology Information
  • sequences or a variant thereof comprising a sequence having at least about 70-100% sequence identity thereto, including any percent identity within this range, such as 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91 , 92, 93, 94, 95, 96, 97, 98, or 99% sequence identity thereto, can he used tor genome editing, as described herein. See also Fonfara et al. (2014) Nucleic Acids Res. 42(4):2577-90; Kapitonov et al. (2015) J. Bacteriol.
  • the CRISPR-Cas system naturally occurs in bacteria and archaea where it plays a role in RNA -mediated adaptive immunity against foreign DNA.
  • the bacterial type II CRISPR system uses the endonuclease, Cas9, which forms a complex with a guide RNA (gKNA) that specifically hybridizes to a complementary genomic target sequence, where the Cas9 endonuclease catalyzes cleavage to produce a double-stranded break.
  • gKNA guide RNA
  • Targeting of Cas9 typically further relies on the presence of a 3' protospacer-adjacent motif (PAM) in the DNA directly downstream of the gRNA-binding site.
  • PAM 3' protospacer-adjacent motif
  • the genomic target site will typically comprise a nucleotide sequence that is complementary to the gKNA and may further comprise a protospacer adjacent motif (PAM).
  • the target site comprises 20-30 base pairs m addition to a 3 base pair PAM.
  • the first nucleotide of a PAM can be any nucleotide, while the two other nucleotides will depend on the specific Cas9 protein that is chosen.
  • Exemplary' PAM sequences are known to those of skill in the art and include, without limitation, NNG, NGN, NAG, and NGG, wherein N represents any nucleotide.
  • the allele targeted by a gRNA comprises a mutation that creates a PAM within the allele, wherein the PAM promotes binding of the Cas9-gRNA complex to the allele.
  • the gRNA is 5-50 nucleotides, 10-30 nucleotides, 15-25 nucleotides, 18-22 nucleotides, or 19-21 nucleotides in length, or any length between the stated ranges, including, for example, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides in length.
  • the guide RNA may be a single guide RNA comprising crRNA and tracrRNA sequences in a single RNA molecule, or the guide RNA may comprise two RNA molecules with crRNA and tracrRNA sequences residing in separate RNA molecules.
  • Cpfl the CRISPR nuclease from Prevotella and Francisella 1
  • Cpfl also known as Casl2a
  • Casl2a is another class P CRISPR/Cas system RNA-guided nuclease with similarities to Cas9 and may he used analogously.
  • Cpfl does not require a tracrRNA and only depends on a crRNA in its guide RNA, which provides the advantage that shorter guide RNAs can be used with Cpfl for targeting than Cas9.
  • Cpfl is capable of cleaving either DNA or RNA.
  • the PAM sites recognized by Cpfl have the sequences 5'-YTN-3' (where "Y” is a pyrimidine and “N” is any nucleobase) or 5 -TPU-3' and are located 5' to the gRNA binding site, in contrast to the G-rich PAM site recognized by Cas9 which is located 3' to the gRNA binding site.
  • Cpfl /Cast 2a cleavage of DNA produces double-stranded breaks with a sitesky-ends having a 4 or 5 nucleotide overhang.
  • Cpfl see, e.g., Ledford et ai. (2015) Nature. 526 (7571): 17-17, Zetsche et al. (2015) Cell.
  • a class 2 type V-A CRISPR-Cas (Cas 12a/Cpfl ) nuclease can be used, such as Mad7TM.
  • MAD7TM is an engineered class 2 type V-A CRISPR-Cas (Casl2a/Cpfl) system isolated from Eubacterium rectale. It is an RNA-guided nuclease with demonstrated gene editing activity in Escherichia coli, yeast, human, mice and rat cells. See Liu Z et al, CRISPR J. 2020 Apr;3(2):97-108.
  • C2cl is another class II CRISPR/Cas system RNA-guided nuclease that may be used.
  • C2cl similarly to Cas9, depends on both a crRNA and tracrRNA for guidance to target sites.
  • RNA-guided Fold nuclease may be used.
  • RNA-guided Fokl nucleases comprise fusions of inactive Cas9 (dCas9) and the Foki endonuclease (FokI-dCas9), wherein the dCas9 portion confers guide RNA-dependent targeting on Fold.
  • dCas9 inactive Cas9
  • Foki endonuclease Foki endonuclease
  • the RNA-guided nuclease can be provided in the form of a protein, such as the nuclease eomplexed with a gRNA, or provided by a nucleic acid encoding the RNA-guided nuclease, such as an RNA (e.g., messenger RNA) or DNA (expression vector). Codon usage may be optimized to improve production of an RNA-guided nuclease in a particular cell or organism.
  • RNA e.g., messenger RNA
  • DNA expression vector
  • a nucleic acid encoding an RNA-guided nuclease can be modified to substitute codons having a higher frequency of usage in a yeast cell, a bacterial cell, a human ceil, a non-human cell, a mammalian cell, a rodent cell, a mouse ceil, a rat cell, or any other host cell of interest, as compared to the naturally occurring polynucleotide sequence.
  • the protein can be transiently, conditionally, or constitutively expressed in the cell.
  • Donor polynucleotides and gRNAs are readily synthesized by standard techniques, e.g., solid phase synthesis via phosphoramidite chemistry, as disclosed in U.S. Patent Nos. 4,458,066 and 4,415,732, incorporated herein by reference; Beaucage et al.. Tetrahedron (1992) 48:2223-2311; and Applied Biosystems User Bulletin No. 13 (1 April 1987).
  • Other chemical synthesis methods include, for example, the phosphotri ester method described by Narang et al.,Meth. Enzymo!. (1979) 68:90 and the phosphodiester method disclosed by Brown et ai., Me th. Enzymol (1979) 68:109.
  • gRNA-donor polynucleotide cassettes can be produced by standard oligonucleotide synthesis techniques and subsequently ligated into vectors. Moreover, libraries of gRNA-donor polynucleotide cassettes directed against thousands of genomic targets can be readily created using highly parallel array-based oligonucleotide library synthesis methods (see, e.g., Cleary et al. (2004) Nature Methods 1:241-248, Svensen et al. (2011) PLoS One 6(9):e24906).
  • adapter sequences can be added to oligonucleotides to facilitate high- throughput amplification or sequencing.
  • a pair of adapter sequences can be added at the 5' and 3' ends of an oligonucleotide to allow amplification or sequencing of multiple oligonucleotides simultaneously by the same set of primers.
  • restriction sites can be incorporated into oligonucleotides to facilitate cloning of oligonucleotides into vectors.
  • oligonucleotides comprising gRNA -donor polynucleotide cassettes can be designed with a common 5' restriction site and a common 3' restriction site to facilitate ligation into the genome modification vectors.
  • a restriction digest that selectively cleaves each oligonucleotide at the common 5' restriction site and the common 3' restriction site is performed to produce restriction fragments that can be cloned into vectors (e.g., plasmids or viral vectors), followed by transformation of cells with the vectors comprising the gRNA -donor polynucleotide cassetes.
  • vectors e.g., plasmids or viral vectors
  • a restriction site can also be added in between the gRNA and donor polynucleotide sequences to enable a second cloning step for the introduction of a guide RNA scaffold sequence or other constructs into the vector,
  • Amplification of polynucleotides encoding gRNA-donor polynucleotide cassettes may be performed, for example, before ligation into genome modification vectors or before sequencing and after barcoding. Any method for amplifying oligonucleotides may be used, including, but not limited to polymerase chain reaction (PCR), isothermal amplification, nucleic acid sequence-based amplification (NA8BA), transcription mediated amplification (TMA), strand displacement amplification (SDA), and ligase chain reaction (LCR).
  • PCR polymerase chain reaction
  • NA8BA nucleic acid sequence-based amplification
  • TMA transcription mediated amplification
  • SDA strand displacement amplification
  • LCR ligase chain reaction
  • the genome editing cassetes comprise common 5' and 3' priming sites to allow amplification of the gRNA-donor polynucleotide sequences in parallel with a set of universal primers.
  • a set of selective primers is used to selectively amplify a subset of the gRNA-donor polynucleotides from a pooled mixture.
  • Cells that are transformed with recombinant polynucleotides comprising the genome editing cassettes may be prokaryotic cells or eukaryotic cells and are preferably designed for high-efficiency incorporation of gRNA-donor polynucleotide libraries by transformation.
  • Methods of introducing nucleic acids into a host ceil are well known in the art. Commonly used methods of transformation include chemically-induced transformation, typically using divalent cations (e.g., CaCh), and electroporation. See, e.g., Sambrook et al. (2001) Molecular Cloning, a laboratory manual, 3’ d edition, Cold Spring Harbor Laboratories, New York, Davis et al.
  • the method for active donor recruitment comprises: a) introducing into a cell a fusion protein comprising a protein that selectively binds to the DNA break connected to a polypeptide comprising a nucleic acid binding domain; and b) introducing into the cell a donor polynucleotide comprising i) a nucleotide sequence sufficiently complementary to hybridize to a sequence adjacent to the DNA break, and ii) a nucleotide sequence comprising a binding site recognized by the nucleic acid binding domain of the fusion protein, wherein the nucleic acid binding domain selectively binds to the binding site on the donor polynucleotide to produce a complex between the donor polynucleotide and the fusion protein, thereby recruiting the donor polynucleotide to the DNA break and promoting HDR.
  • the DNA break may be created by a site-specific nuclease, such as, but not limited to, a Cas nuclease (e.g., Cas9, Cpfl, or C2cl), an engineered RNA-guxded Fokl nuclease, a zinc finger nuclease (ZFN), a transcription activator-like effector-based nuclease (TALEN), a restriction endonuclease, a meganuclease, a homing endonuclease, and the like.
  • a site-specific nuclease such as, but not limited to, a Cas nuclease (e.g., Cas9, Cpfl, or C2cl), an engineered RNA-guxded Fokl nuclease, a zinc finger nuclease (ZFN), a transcription activator-like effector-based nuclease (TALEN), a restriction endonuclease
  • the DNA break may be a single-stranded (nick) or double-stranded DNA break. If the DNA break is a single-stranded DNA break, the fusion protein used comprises a protein that selectively binds to the single-stranded DNA break, whereas if the DNA break is a double- stranded DNA break, the fusion protein used comprises a protein that selectively binds to the double-stranded DNA break. The fusion protein can also recognize both single -stranded and double -stranded DNA breaks.
  • the protein that selectively binds to the DNA break can be, for example, an RNA-guided nuclease, such as a Cas nuclease (e.g., Cas9 or Cpfl) or an engineered RNA- guided Fokl nuclease.
  • an RNA-guided nuclease such as a Cas nuclease (e.g., Cas9 or Cpfl) or an engineered RNA- guided Fokl nuclease.
  • Donor polynucleotides may be single-stranded or double-stranded and may be composed of RNA or DNA.
  • a donor polynucleotide comprising DNA can be produced from a donor polynucleotide comprising RNA, if desired, by reverse transcription using reverse transcriptase either in the cell (e.g. by a retron reverse transcriptase) or outside the cell (e.g. by a recombinant reverse transcriptase such as M-MLV).
  • RNA binding domain may be any protein or domain from a protein that binds a known RNA sequence. Examples of each of these proteins are well known in the art. Nonlimiting examples of RNA binding domains include domains of proteins that bind to MS2 stem loop sequence, a Pumilio (PUF) recognition sequence, an RNA Recognition Motif (RRM) recognition sequence, a Double-Stranded RNA -Binding Domain (dsRBD) recognition sequence, a Zinc finger (ZF) Domain recognition sequence, a Z-alpha, arginine/glycine rich (RGG) domain recognition sequence, a K Homology (KH) Domain recognition sequence, or a Poly (A) tail.
  • PAF Pumilio
  • RRM RNA Recognition Motif
  • dsRBD Double-Stranded RNA -Binding Domain
  • the single stranded nucleic acid binding domain may be any protein or domain from a protein that binds a known single stranded nucleic acid sequence. Examples of each of these proteins are well known in the art.
  • Single stranded nucleic acid binding domain recognition domains of polypeptides include, but are not limited to, oligonucleotide/oligosaccharide/oligopeptide-binding (OB) folds, such as in such as human POTT, Schizosaccharomyces pombe Potl, Sterkiella nova TEPB, CspB protein from Bacillus caldolyticus and Bacillus subtilis ; K homology (KH) domains, such as in KH domain- containing proteins include heterogeneous nbonucleoprotein K (hiiRNP K), far upstream element (FUSE)-binding protein (FBP), and poly(C) ⁇ binding proteins (PCBP) 1 and 2: RNA recognition motifs (RRMs) which bind DNA
  • the fusion protein may comprise a FHA phosphothreonine- binding domain, wherein the donor polynucleotide is selectively recruited to a DNA break having a protein comprising a phosphorylated threonine residue located sufficiently close to the DNA break for the FHA phosplioihreonine-bindmg domain to bind to the phosphorylated threonine residue.
  • the FHA phosphothreonine-binding domain may be combined with any RNA binding domain (e.g., fusion with MCP) or single stranded nucleic acid binding domain (e.g. OB-fold) for donor recruitment.
  • the donor recruitment protein includes a fusion of a polypeptide domain from any protein that has an RNA binding domain or single stranded nucleic acid binding domain with a polypeptide domain from any protein that has a DNA break localizing domain.
  • Non-limiting examples of DNA break localizing domains include domains of proteins that bind to areas of DNA damage and/or DNA repair proteins. Phospho-Ser/Tbr-binding domains have emerged as crucial regulators of cell cycle progression and DNA damage signaling. Such domains include 14-3-3 proteins, WW domains, Polo-box domains (in PLK1), WD40 repeats (including those m the E3 iigase sCF pTtCP ), BRCT domains (including those in BRCAi) and FHA domains (such as in CHK2 and MDCI). These domains all have the potential to be used in donor recruitment systems.
  • FHA domains are conserved between eukaryotes and bacteria and thus would also have utility in bacteria as well as eukaryotes for donor recruitment.
  • proteins or genes encoding such proteins are provided, without limitation, in Tables 1-5. Additional genes/proteins are known in the art and can be found, for example, by searching public gene or protein databases for genes or proteins known to have a role in DNA repair or binding of DNA damage (e.g., gene ontology term analysis), it is contemplated that proteins from any species can be used (e.g., eukaryotic proteins, proteins from yeast, mammalian cells, including human proteins, and/or from fungus), in embodiments, the donor recruitment protein comprises a polypeptide sequence from a DNA break-recruiting protein from the same kingdom, phylum or division, class, order, family, genus, and/or species as the cell to be genetically modified.
  • the fusion protein comprises an RN A binding domain of MS2 coat protein (MCP) joined to a forkhead-associated (FHA) domain.
  • the fusion protein comprises comprises an RNA binding domain of MS2 coat protein (MCP) joined to an FHA phosphothreonine-binding domain.
  • the fusion protein comprises a LexA domain located between the RNA binding domain of MCP and the FHA domain.
  • the LexA domain is from the LexA repressor protein (UniProtKB - P0A7C2).
  • an inhibitor of the noil-homologous end joining (NHEJ) pathway is used to further increase the frequency of cells genetically modified by HDR.
  • inhibitors of the NHEJ pathway include any compound (agent) that inhibits or blocks either expression or activity of any protein component in the NHEJ pathway.
  • Protein components of the NHEJ pathway include, but are not limited to, Kis70, Ku86, DNA protein kinase (DNA-PK), Rad50, MRE11, NBSi, DNA ligase IV, and XRCC4.
  • An exemplary inhibitor is wortmannin which inhibits at least one protein component (e.g., DNA-PK) of the NHEJ pathway.
  • RNA interference or CRI8PR- interference may also be used to block expression of a protein component of the NHEJ pathway (e.g., DNA-PK or DNA ligase IV).
  • siRNAs small interfering RNAs
  • hairpin RNAs and other RNA or RNA:DNA species which can be cleaved or dissociated in vivo to form siRNAs
  • RNA interference RNA interference
  • deactivated Cas9 dCas9
  • sgRNAs single guide RNAs
  • an HDR enhancer such as RS-1 maybe used to increase the frequency of HDR in cells (Song et al. (2016) Nat. Commun. 7:10548).
  • Example 1 Recruitment of retron-amplified donor DNA to double-strand breaks for enhanced homology-directed repair
  • HDR homology -directed repair
  • MAGE8TIC was demonstrated which may be adapted for the retron system by introducing M82 ribonucleic acid (RNA) stem-loops into the retron and fusing the forkhead-associated (FHA) donor recruitment domain to the MS2 coat protein (MCP) which binds to the MS2 RNA.
  • RNA ribonucleic acid
  • FHA forkhead-associated donor recruitment domain
  • MCP MS2 coat protein
  • a donor recruitment system whereby a LexA-FHA fusion protein consisting of the LexA DNA binding domain (DBD) and the forkhead-associated (FHA) domain of Fkhlp recruit donor plasmids to clustered regularly interspaced short palindromic repeats (CRISPR) double-strand breaks (DSBs) was described previously.
  • the LexA -DBD binds to an array of LexA sites on the double-stranded DNA (dsDN A) donor plasmid
  • the FHA domain binds to phosphothreonine-containing proteins which accumulate at DSBs.
  • ssDNA single- stranded DNA
  • FHA recruitment system w3 ⁇ 4s sought to combine with the ssDNA donor retron amplification system. Because the LexA DBD does not bind to single-stranded DNA, advantage was taken of tire unique dual RNA-DNA structure found in the mature retron msDNA and two MS2 stem- loops were inserted directly upstream the 5' end of the retron.
  • FIG. 1A shows the expression locus for the retron donor (Triose-phosphate DeHydrogenase 3 (TDH3) promoter) and guide (small nucleolar RNA 52 (SNR52) promoter).
  • TDH3 riose-phosphate DeHydrogenase 3
  • SNR52 small nucleolar RNA 52
  • Two MS2 stem- loop repeats are inserted in between the 5' Hepatitis Delta Virus (HDV) ribozyme and the 5' end of the retron.
  • Hie retron donor introduces CC-to-TG mutation which results in a premature termination codon.
  • the guide-donor plasmid also harbors a tandem array of 4 LexA sites to enable comparison of the results directly with the previously demonstrated LexA-FHA donor recruitment system.
  • FIG. IB shows the mature retron msDNA transcripts after the HDV ribozyme has cleaved off the 5' cap and after reverse transcription of the msd region and host cell RNase H activity has removed the msd RNA component.
  • the 3' inverted repeat is still shown as base- paired to the 5' inverted repeat, although that is likely removed along with the 3' polyA tail by- host cell 3'-5 f exonucleases.
  • FIG. IB shows the mature retron msDNA transcripts after the HDV ribozyme has cleaved off the 5' cap and after reverse transcription of the msd region and host cell RNase H activity has removed the msd RNA component.
  • the 3' inverted repeat is still shown as base- paired to the 5' inverted repeat, although that is likely removed along with the 3' polyA tail by- host cell 3'-5 f exonucleases.
  • the donor recruitment module consists of the MS2 coat protein (MCP) fused to the forkhead-associated (FHA) domain of fork head protein homolog 1 (Fkhlp), optionally containing a LexA DNA binding domain (DBD) in between to allow for simultaneous recruitment of double-stranded plasmid donor and single-stranded retron donor.
  • MCP binds to the MS2 stem loops linked to the retron donor via the branched G of the retron msr RNA through the unusual 2' ribonucleic acid (RNA)-5' deoxyribonucleic acid (DNA) linkage catalyzed by the retron reverse transcriptase (RT) during initiation of complementary DNA (cDNA) synthesis.
  • the FHA domain binds to phosphothreonine motifs on several proteins which localize to double-strand breaks, including Mutator Phenotype (Mphlp), Fdolp, and other unidentified protein(s).
  • Mutator Phenotype Mphlp
  • Fdolp Fdolp
  • the middle drawing of Fig. IB show's a control retron donor lacking the MS2 loops.
  • the right drawing of Fig. IB shows an alternate method for recruitment of the retron based on two inverted repeats of die LexA sequence downstream of the donor, which would be bound by the LexA-FHA fusion protein.
  • the top panel of FIG. 1C shows ADEnine requiring (ADE2) editing assay with a strain harboring Cas9 and a high-copy plasmid harboring a retron donor introducing a premature translation termination codon in the ADE2 open reading frame (ORF) and an artificially weakened guide sequence harboring genomic mismatches at positions 20, 19, and 18 from the protospacer adjacent motif (PAM) (i.e., a 17-mer guide). From left to right are strains without RT, with the RT, and then with either LexA-FHA, MCP-FHA or MCP -LexA- FHA fusion proteins.
  • ADE2 open reading frame ORF
  • PAM protospacer adjacent motif
  • 1C show's the dual retron amplification-donor recruitment system enables editing with an even further weakened guide, with mismatches from 20 to 17 bp from the PAM (i.e., a I6 ⁇ mer guide).
  • Recruitment via the LexA inverted repeats does not improve editing over the HDV-retron donor control, while the MCP-FHA or MCP-LexA-FHA systems both improve editing substantially.
  • prime editing efficiency at the same target site is shown with a full 20-mer guide and the same CC-to-TG edit encoded in the RT template of the prime editing guide RNA (pegRNA).
  • Fig 1C show that the effect of retron recruitment is more than simply combining that of donor recruitment and retron separately, as the MCP-FHA and MCP-LexA-FHA constructs perform tire best.
  • the data also show that retron-based editing is possible even when the guide is truncated down to a 16-mer which is supposed to eliminate cleavage capacity of Cas9 and may actually have nicking activity.
  • the MS2 retron recruitment again shows improved editing over retron alone or an alternative retron donor recruitment construct with an inverted LexA-LexA repeat. For comparison, all of these systems outperform prime editing in yeast,
  • FIGS . 2A shows the levels of retron cDNA produced by the different editing cassettes from FIG 1.
  • FIG. 2.B shows next-generation sequencing (NGS)-based quantification of retron donor cDNA levels in the absence of Cas9 or donor recruitment proteins.
  • Primers were designed to amplify both the single stranded donor template and the genomic target.
  • the donor encodes a CC-to-TG mutation in the middle (asterisk), so the ratio of reads containing the donor mutation relative to the wild type (WT) genomic locus is proportional to the ratio of donor cDNA to genome copies.
  • the different cassettes are sorted left to right by greatest to least retron cDNA produced.
  • the primers also amplify the double- stranded donor on die retron guide cassete, which resides on a high copy 2-micron vector (except for the TDH3-HDV cassette labeled “on Cen/Ars”).
  • the same retron cassettes were transformed into cells lacking an RT, and the donor: genome ratio in such cells grown in glucose was first subtracted from the levels observed in the cells with the RT.
  • the genome has two strands which can bind both primers in the first round of polymerase chain reaction (PCR), while the donor has only one strand, so the donorgenome ratio is multiplied by 2 to obtain the values on the y-axis, cDNA copies per genome equivalent.
  • PCR polymerase chain reaction
  • retron donor In addition to ssDNA being a superior donor template for HDR than dsDNA, the number of copies of retron donor can vastly exceed the highest levels of donor plasmids observed in cells. By expressing the retron from a Pol IT promoter with the HDV ribozyme at the 5' end, >500 copies of ssDNA per ceil was achieved. By contrast, the high copy two-micron plasmids in budding yeast only accumulate to -20-30 copies per cell (Karim et ah, FEMS Yeast Research, 2013, PMCID: PMC3546148).
  • the retron donors driven by the TDH3 promoter 5'HDV-3'none produce similar levels of cDNA (-'-800 copies per cell) to that observed with the GAL7 promoter.
  • the addition of the 2.X-MS2 loops slightly reduces retron cDNA levels (-600 copies per cell).
  • this reduction is more than offset by the recruitment function of MCP-FHA. in other words, simply producing more retron is not as effective as recruiting the retron directly to the cut site.
  • Example 2 Comparison of different retron donor systems in multiplexed editing.
  • the edit fractions for each designed variant in SpCas9 set A (438 SNVs for 6 guides), SpCas9 set B (339 SNVs for 5 guides), LbCasl2a set A (348 SNVs for 4 guides), LbCasI2a set B (348 SNVs for 4 guides) are plotted as box plots to demonstrate how the abundance distribution of individual variants varies tor each donor DNA enhancement system and nuclease. Designed edits which were not observed are indicated by the numbers at the bottom of each box plot, and visualized by adding a pseudo-fraction of le-05 to all variants. Note that the retron appears to benefit editing to a greater extent with LbCasl2a than 8pCas9.
  • retron donor recruitment with the MCP fusion proteins has a marked improvement over separated plasmid recruitment and retron expression (LexA-FHA + RT) for LbCasl2a.
  • Direct comparison between overall editing efficiency between SpCas9 and LbCasl2a is complicated by the fact that the SpCas9 and LbCasl2a libraries were synthesized separately and exhibited different ohgo error rates.
  • a nucleic acid encoding a retron comprising: a. one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences; b. an msr sequence; c. an msd sequence; d. a subject expression sequence within the msd sequence; and, e. a first inverted repeat sequence and a second inverted repeat sequence.
  • nucleic acid of embodiment 1, wherein the subject expression sequence comprises a donor sequence for homologous directed repair (HDR).
  • HDR homologous directed repair
  • RNA binding domain recognition sequence is a MS2 stem loop sequence, a Pumilio (PUF) recognition sequence, an RNA Recognition Motif (RRM) recognition sequence, a Double-Stranded RNA-Binding Domain (dsRBD) recognition sequence, a Zinc finger (ZF) Domain recognition sequence, a G- quadruplex-formiiig sequence, a Z-alpha, argimne/glyeine rich (RGG) domain recognition sequence, or a K Homology (KH) Domain recognition sequence.
  • PAF Pumilio
  • RRM RNA Recognition Motif
  • dsRBD Double-Stranded RNA-Binding Domain
  • ZF Zinc finger
  • G- quadruplex-formiiig sequence a Z-alpha
  • argimne/glyeine rich (RGG) domain recognition sequence or a K Homology (KH) Domain recognition sequence.
  • single stranded nucleic acid binding domain recognition sequence is a sequence recognized by the single stranded nucleic acid binding domain of a CRISP R associated endonuclease, POT1, TEPB, CspB, a K homology (KH) domains, a far upstream element (FUSE)-binding protein (FBP), a poiy(C)-binding protein, a G-quadruplex binding domain including nucleolin, hiiRNP, serine/arginine-rich splicing factors (SRSF) 1 and 9, splicing factor U2AF, TRF2, FRM2, and the RNA heiicase associated with A U -rich element (RHAU) proteins, an FBP-interacting repressor (FIR), hiiRNP A 1, hnRNP D, or a whirly domain.
  • CRISP R associated endonuclease POT1, TEPB, CspB
  • KH K homology
  • FUSE far upstream
  • a chimeric construct comprising an RNA hybridized to a DNA, wherein the RNA comprises one or more RNA binding domain recognition sequences and an msr sequence: and wherein the DNA comprises an msd sequence and a subject expression sequence within the msd sequence.
  • a polypeptide comprising an RNA binding domain or single stranded nucleic acid binding domain bound to a DNA break site localizing domain.
  • RNA binding domain is an RNA binding domain of a polypeptide that binds to a M82 stem loop sequence, a Pumilio (PUF) recognition sequence, an RNA Recognition Motif (RRM) recognition sequence, a Double- Stranded RNA-Binding Domain (dsKBD) recognition sequence, a Zinc finger (ZF) Domain recognition sequence, a Z -alpha, arginine/glycine rich (RGG) domain recognition sequence, or a K Homology (KH) Domain recognition sequence.
  • PAF Pumilio
  • RRM RNA Recognition Motif
  • dsKBD Double- Stranded RNA-Binding Domain
  • ZF Zinc finger
  • ZG arginine/glycine rich domain recognition sequence
  • KH K Homology
  • single stranded nucleic acid binding domain is a single stranded nucleic acid binding domain of a CRISPR associated endonuclease, POT1, TEPB, CspB, a K homology (KH) domain, a far upstream element (FUSE)-binding protein (FBI 5 ), a poiy(C)-bindmg protein, an FBP-interacting repressor (FIR), linRNP A I, hnRNP D, or a whirly domain,
  • RNA binding domain comprises an RNA binding domain of M82.
  • coat protein (MCP) and the DNA break site localizing domain comprises a forkhead-associated (FHA) domain.
  • polypeptide of embodiment I I further comprising a LexA domain located between the RNA binding domain of MCP and the FHA domain
  • a method of editing DNA in a cell comprising contacting the cell with the nucleic acid of any one of embodiments 1 to 4 and the polypeptide of any one of embodiments 7 to 12 or the nucleic acid of embodiment 13, a reverse transcriptase or a nucleic acid encoding the same, and a sequence specific endonuclease or a nucleic acid encoding the same, thereby editing the DNA of the cell .
  • sequence specific endonuclease is a CRISPR associated (Cas) nuclease, a Zinc -finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease.
  • Cas CRISPR associated
  • Zinc -finger nuclease Zinc -finger nuclease
  • TALEN Transcription activator-like effector nuclease
  • gRNAs guide RNAs
  • a method of treating a genetic disease in a subject in need comprising administering to the subject the nucleic acid of any one of embodiments 1 to 4 and the polypeptide of any one of embodiments 7 to 12 or the nucleic acid of embodiment 13, a reverse transcriptase or a nucleic acid encoding the same, and a sequence specific endonuclease or a nucleic acid encoding the same, thereby editing the DNA.
  • sequence specific endonuclease is a Cas nuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease.
  • Cas nuclease is selected from the goup consisting of Casl, CaslB, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cash, Cas6e, Cas6f, Cas7, CasBa!, Cas8a2, Cas8b, Cas8c, Cas9 (Csnl or Csxl2), SpCas9, FokI-dCas9, CaslO, CaslOd, Casl2a/Cpfl, Mad7TM, CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Csci, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Cs
  • TCTGAGTTACTGTCTGTnTCCT (SEQ ID NO:2). (first fragment), programmable loop, AGGAAACCCGTTTCTTCTGACGTAAGGGTGCGCA (SEQ ID NO:3) (second fragment with inverted repeat).
  • DGNPTPS AI A AN SG GU -C terminus (SEQ ID NO:5)
  • NLS-linker-MCP-linker-NLS -iinker-FHA NLS-linker-MCP-linker-NLS -iinker-FHA

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Mycology (AREA)
  • Cell Biology (AREA)
  • Medicinal Chemistry (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

Provided herein, inter alia, are compositions and methods for increasing the efficiency of retron production and recruitment. In aspects, compositions and methods are provided for sequence editing in a cell or a subject in need.

Description

COMPOSITIONS AND METHODS FOR EFFICIENT RETRON RECRUITMENT
TO DNA BREAKS
[0001] This application claims the benefit of priori†}' to U.S. Provisional Application No. 63/214,196, filed June 23, 2021, the disclosure of winch is hereby incorporated by reference in its entirety for all purposes.
STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER
FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT [0002] This invention was made with Government support under Grant Nos. P0 IHG000205, R01GM121932, and U0IGMI 10706 awarded by the National Institutes of Health. The Government has certain rights in the invention.
BACKGROUND
[0003] Homology-directed repair (HDR) is a naturally occurring nucleic acid repair system and can be harnessed to modify genomes in many organisms, including humans. HDR is stimulated by the presence of double strand breaks (DSBs) in deoxyribonucleic acid (DNA). HDR efficiencies limit genome editing applications in many organisms and cell lines in both basic and translational research settings. However, HDR-based approaches are unmatched among genome editing methods in enabling the introduction of genome edits of virtually any size. For example, HDR can be utilized to repair deleterious single-nucleotide polymorphisms (SNPs), to insert multiple genes encoding entire pathways into chromosomes, to make large, programmed deletions or translocations, and to build chromosome -sized DN A inside the cell for synthetic biology applications. For HDR to be harnessed in genome editing, donor DNA must be incorporated at the targeted loci, most often initiated by double-strand breaks (DSBs) using clustered regularly interspaced short palindromic repeats (CRISPR)/Cas or protein-only nucleases such as zinc -finger nucleases and transcription activator-like effector nucleases (TALENs).
[0004] With the discover}' and development of programmable nucleases over the past decade, HDR efficiencies are now a major bottleneck in fully harnessing the potential of genome editing. Even with previous enhancements to CRISPR-HDR editing systems such as the inhibition of competing repair pathways such as non-homologous end joining (NHEJ) or the tethering of donor DNA to the nuclease or near the double-stranded DNA cut with the LexA-Fkhlp system, HDR is still a limiting factor in enhancing editing. Furthermore, many cell types prefer single-stranded DNA (ssDNA) over double -stranded DNA (dsDNA) for HDR. The LexA-Fkhlp donor recruitment system, which utilizes the Forkhead-associated (FHA) domain of the yeast Fkhlp protein, was previously shown to work only with dsDNA.
[0005] Provided herein, inter alia , are solutions to these and other problems in the art. This disclosure provides compositions and methods for recruiting single-stranded donor DNA directly to target edit sites to achieve higher HDR efficiency.
BRIEF SUMMARY
[0006] Provided herein are nucleic acids encoding a retron that include : (a) one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences, (b) an msr sequence, (c) an msd sequence, (d) a subject expression sequence within the msd sequence, and (e) a first inverted repeat sequence and a second inverted repeat sequence.
[0007] In some embodiments, the subject expression sequence comprises a donor sequence for homologous directed repair (HDR). In some embodiments, the RNA binding domain recognition sequence is a MS2. stem loop sequence, a Pumilio (PUF) recognition sequence, an RNA Recognition Motif (RRM) recognition sequence, a Double-Stranded RNA -Binding Domain (dsRBD) recognition sequence, a Zinc finger (ZF) Domain recognition sequence, a G- quadruplex-forming sequence, a Z-alpha, arginine/glycine rich (RGG) domain recognition sequence, or a K Homology (KH) Domain recognition sequence. In some embodiments, the single stranded nucleic acid binding domain recognition sequence is a sequence recognized by a single stranded nucleic acid binding domain such as those found in a CR1SPR associated endonuclease such as Cas9 or Casl2a, POT1, TEPB, CspB, K homology (KH) domain, far upstream element (FUSE)-binding protein (FBP), poly(C)-binding protein, a G-quadruplex binding domain including nucleolin, linRNP, serine/arginine-rieh splicing factors (SRSF) 1 and 9, splicing factor U2AF, TRF2, FRM2, and the RNA helicase associated with AU-rich element (RHAU) proteins, FBP-interacting repressor (FIR), hnRNP A1 , hnRNP D, or a wbirly domain.
[0008] Also provided herein are chimeric constructs that include an RN A hybridized to a DNA, such as that formed in the cell by reverse transcription of an engineered retron non- coding RNA, wherein the RNA comprises one or more RNA binding domain recognition sequences and an msr sequence; and wherein the DNA comprises an msd sequence and a subject expression sequence within the msd sequence, in some embodiments, the subject expression sequence comprises a donor sequence for homoiogy-directed repair (HDR).
[0009] Also provided herein are polypeptides (e.g., fusion proteins) that include an RNA binding domain or single stranded nucleic acid binding domain bound to a DNA break site- localizing domain and their encoding nucleic acids. In some embodiments, the RNA binding domain is an RNA binding domain of the MS2 coat protein (MCP) polypeptide that binds to a MS2 stem loop sequence, a Pumiho (PUF) recognition sequence, an RNA Recognition Motif (RRM) recognition sequence, a Double-Stranded RNA-Binding Domain (dsRBD) recognition sequence, a Zinc finger (ZF) domain recognition sequence, a G-quadruplex-forming sequence, a Z-alpha, arginine/glycine rich (RGG) domain recognition sequence, or a K Homology (KH) domain recognition sequence. In some embodiments, the single stranded nucleic acid binding domain is a single stranded nucleic acid binding domain of a CRISPR associated endonuclease, POTl, TEPB, CspB, a K homology (KH) domain, a far upstream element (FUSE)-binding protein (FBP), apo!y(C)-bmding protein, a G-quadruplex binding domain including nucieo!in, hnRNP, serine/arginine-rich splicing factors (SRSF) 1 and 9, splicing factor U2AF, TRF2, FRM2, and the RNA helicase associated with AU-rich element (RHAU) proteins, an FBP- interacting repressor (FIR), hnRNP Al, hnRNP D, or a whirly domain. In some embodiments, the DNA break site localizing domain is a DNA break site localizing domain of a polypeptide listed in any of Tables 1 to 5. In some embodiments, the RNA binding domain comprises an RNA binding domain of MS2 coat protein (MCP) and the DNA break site localizing domain comprises a forkhead-assoeiated (FHA) domain. In some embodiments, the polypeptide further comprises a LexA domain located between the RNA binding domain of MCP and the FIFA domain. In some embodiments, the LexA domain is from the LexA repressor protein (UniProtKB - P0A7C2).
[001Q] Also provided herein are complexes that include the chimeric constructs above non- covalently bound to a polypeptide that includes an RNA binding domain or single stranded nucleic acid binding protein covalently or non-covalently bound to a DNA break site localizing domain. [0011] Also provided herein are methods for editing DNA in a cell by contacting the cell with (1) a retron that includes (a) one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences, (b) an msr sequence, (c) an msd sequence, and (d) a subject expression sequence within the msd sequence, and (e) a first inverted repeat sequence and a second inverted repeat sequence, (2) a polypeptide comprising an RNA binding domain or single stranded nucleic acid binding domain bound to a DNA break site localizing domain or its encoding nucleic acid, (3) a reverse transcriptase or a nucleic acid encoding the same, and (4) a sequence specific endonuclease or a nucleic acid encoding, thereby editing the DNA of the cell. In some embodiments, the sequence specific endonuclease is a CR1SPR associated (Cas) nuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease. In some embodiments, the method comprises contacting the cell with one or more a guide RNAs (gRNAs), or one or more nucleic acids encoding the same. In some embodiments, the Cas nuclease is Cas9, Streptococcus pyogenes Cas9 (SpCas9), Cpfl (Cas 12a), Mad7™, C2cl, or FokI-dCas9. In some embodiments, the Cas nuclease is selected from the goup consisting of Cast, CaslB, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cash, Casbe, Cas6f, Cas7, Cas8al, Cas8a2, CasSb, Cas8c, Cas9 (Csnl or Csxl2), SpCas9, FokI-dCas9, Cas 10, CaslOd, Casl2a/CpfL Mad7™, CasF, CasG, CasH, Csyi, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Csxl6, CsaX, Csx3, Csxl, Csxl5, Csfl, Csf2, Csf3, Csf4, C2cl and Cul966, and homologs or modified versions thereof.
[0012] Also provided herein are methods for treating a genetic disease in a subject in need by administering to tire subject an effective amount of (1) a retron that includes (a) one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences, (b) an msr sequence, (c) an msd sequence, and (d) a subject expression sequence within the msd sequence, and (e) a first inverted repeat sequence and a second inverted repeat sequence, (2) a polypeptide of an RNA binding domain or a single stranded DNA binding domain covalently bound to a DNA break site localizing domain or its encoding nucleic acid, (3) a reverse transcriptase or a nucleic acid encoding the same, and (4) a sequence specific endonuclease or a nucleic acid encoding, thereby treating the disease. In some embodiments, the sequence specific endonuclease is a CRISPR associated (Cas) nuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease. In some embodiments, the method comprises contacting the cell with one or more a guide RNAs (gRNAs), or one or more nucleic acids encoding the same. In some embodiments, the Cas nuclease is Cas9, SpCas9, Cpfl (Casl2a), Mad7™, C2el, or Fokl- dCas9. In some embodiments, the Cas nuclease is selected from the goup consisting of Cask Cas IB, Cas2, Cas3, Cas4, Cas5, CasSe (CasD), Cas6, Cas6e, Casbf, Cas7, CasBal, Cas8a2, Cas8b, Cas8e, Cas9 (Csnl or Csxl2), SpCas9, FokI-dCas9, Cas 10, CaslOd, Casl2a/Cpfl, Mad7™, CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlQ, Csxl6, CsaX, Csx3, Csxl, Csxl5, Csfl, Csf2, Csf3, Csf4, C2cl and Cu 1966, and homologs or modified versions thereof.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] FIGS, 1A-1D show an overview of the dual retron amplification-donor recruitment system.
[0014] FIGS. 2.A and 2B show the levels of retron cDNA produced by the different editing cassettes from FIG 1.
[0015] FIGS 3A-3D show a multiplexed editing assay to introduce all possible single nucleotide variants (SNVs) across two genomic regions using either the retron alone, donor recruitment alone, both operating simultaneously and independently, or with the dual retron amplification-donor recruitment system with either Streptococcus pyogenes Cas9 (SpCas9) or Lachnospiraceae bacterium Cas 12a (LhCasl2a; also known as Cpfl).
DETAILED DESCRIPTION
[0016] Provided herein are compositions and methods to increase efficiency for retron production and its recruitment to the site of DNA breaks in a cell.
[0017] The practice of the present invention will employ, unless otherwise indicated, conventional methods of genome editing, biochemistry, chemistry, immunology, molecular biology and recombinant DNA techniques within die skill of the art. Such techniques are explained fully in the literature. See, e.g., Targeted Genome Editing Using Site-Specific Nucleases: ZFNs, TALENs, and the CRISPR'CasJ' System (T. Yamamoto ed.. Springer, 2015); Genome Editing: The Next Step in Gene Therapy (Advances in Experimental Medicine and Biology, T. Cathomen, M. Hirsch, and M. Porteus eds., Springer, 2016); Aachen Press Genome Editing (CreateSpace Independent Publishing Platform, 2015); Handbook of Experimental Immunology , Vols. I-IV (D.M. Weir and C.C. Blackwell eds., Blackwell Scientific Publications); A.L. Lehninger, Biochemistry (Worth Publishers, Inc., current addition); Sambrook, et al. Molecular Cloning: A Laboratory Manual (3rd Edition, 2.001); Methods In Enzymology (S. Coiowick and N. Kaplan eds., Academic Press, Inc.).
[0018] Ail publications, patents and patent applications cited herein, whether supra or infra, are hereby incorporated by reference in their entireties,
I. Definitions
[0019] Before the present invention is further described, it is to be understood that this invention is not strictly limited to particular embodiments described, as such may of course vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the claims.
[0020] It must be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. It should further be understood that as used herein, the term “a” entity or “an” entity refers to one or more of that entity. For example, a nucleic acid molecule refers to one or more nucleic acid molecules. As such, the terms “a”, “an”, “one or more” and “at least one” can be used interchangeably. Similarly, the terms “comprising”, “including” and “having” can be used interchangeably .
[0021] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary'· skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention, the preferred methods and materials are now described. All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited, lire publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates, which may need to be independently confirmed.
[0022] it is appreciated that certain features of the invention, winch are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub- combination. All combinations of the embodiments are specifically embraced by the present invention and are disclosed herein just as if each and every combination was individually and explicitly disclosed. In addition, all sub-combinations are also specifically embraced by the present invention and are disclosed herein j ust as if each and every such sub-combination was individually and explicitly disclosed herein.
|O023j It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to sen/e as antecedent basis for use of such exclusive terminology' as “solely,” “only” and the like in connection with the recitation of claim elements or use of a “negative” limitation.
[0024] As used herein, the term “about” means a range of values including the specified value, which a person of ordinary' skill in the art would consider reasonably similar to the specified value. In embodiments, about means within a standard deviation using measurements generally acceptable in the art. in embodiments, about means a range extending to +/- 10% of the specified value (e.g., +/- 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, or 10% of the specified value). In embodiments, about means the specified value.
[0025] The term “genome editing” refers to a type of genetic engineering in which DNA is inserted, replaced, or removed from a target DNA (e.g., the genome of a cell) using one or more nucleases and/or nickases. The nucleases create specific double-strand breaks (DSBs) at desired locations in the genome and harness the cell's endogenous mechanisms to repair the induced break by homology-directed repair (HDR) (e.g., homologous recombination) or by nonhomologous end joining (NHEJ). The nickases create specific single-strand breaks at desired locations m the genome. In one non-limiting example, two nickases can be used to create two single-strand breaks on opposite strands of a target DNA, thereby generating a blunt or a sticky end. Any suitable DNA nuclease can be introduced into a ceil to induce genome editing of a target DNA sequence.
[0026] The term “DNA nuclease” refers to an enzyme capable of cleaving the phosphodiester bonds between the nucleotide subunits of DNA and may be an endonuclease or an exonuclease. According to the present invention, the DNA nuclease may be an engineered (e.g., programmable or targetable) DNA nuclease which can be used to induce genome editing of a target DNA sequence. Any suitable DNA nuclease can be used including, but not limited to, CRISPR-associated protein (Cas) nucleases, other endo- or exo-nucleases, variants thereof, fragments thereof, and combinations thereof.
[0027] The term “double-strand break” or “double-strand cut” refers to the severing or cleavage of both strands of the DNA double helix. The DSB may result in cleavage of both stands at the same position leading to “blunt ends” or staggered cleavage resulting in a region of single -stranded DNA at the end of each DNA fragment, or “sticky ends”. A DSB may arise from the action of one or more DNA nucleases.
[0028] The term “nonhomologous end joining” or “NHEJ” refers to a pathway that repairs double-strand DNA breaks in which the break ends are directly ligated without the need for a homologous template.
[0029] The term “homology-directed repair” or “HDR” refers to a mechanism in cells to accurately and precisely repair double-strand DNA breaks using a homologous template to guide repair. The most common form of HDR is homologous recombination (HR), a type of genetic recombination in which nucleotide sequences are exchanged between two similar or identical molecules of DNA.
|0030j As used herein, the term “retron” is used in accordance with its plain ordinary meaning and refers to a DNA sequence found in the genome of many bacteria species that codes for reverse transcriptase and a unique single-stranded DNA/RNA hybrid called multicopy single-stranded DNA (msDNA), The Retron msr-msd RNA is the non-coding RNA produced by retron elements and is the immediate precursor to the synthesis of msDNA. The retron msr RNA folds into a characteristic secondary structure that contains a conserved guanosine residue at the end of a stem loop. Synthesis of DNA by the retron-encoded reverse transcriptase (RT) results in a DNA/RNA chimera which is composed of small single-stranded DNA linked to small single-stranded RNA. Hie RNA strand is joined to the 5' end of the DNA chain via a 2.'— 5' phosphodiester linkage that occurs from tire 2' position of the conserved internal guanosine residue. The retron operon carries a promoter sequence P that controls the synthesis of an RNA transcript earning three loci: msr, msd, and ret . The ret gene product, a reverse transcriptase, processes the msd/msr portion of the RNA transcript into msDNA. Retron elements are about 2 kb long. They contain a single operon controlling the synthesis of an RNA transcript earning three loci, msr, msd, and ret, that are involved in msDNA synthesis. Hie DNA portion of msDNA is encoded by the msd region, the RNA portion is encoded by the msr region, while the product of the ret open-reading frame is a reverse transcriptase similar to the RTs produced by retroviruses and other types of retroelements. Like other reverse transcriptases, the retron RT contains seven regions of conserved amino acids, including a highly conserved tyr-ala-asp-asp (YADD) sequence associated with the catalytic core. The ret gene product is responsible for processing the msd/msr portion of the RNA transcript into msDNA.
[0031] As used herein, the term ‘"reverse transcriptase” refers to its plain and ordinary meaning as an enzyme used to generate complementary DNA (cDNA) from an RNA template, a process termed reverse transcription.
[0032] The terms "polypeptide" and "protein" refer to a polymer of amino acid residues and are not limited to a minimum length. Thus, peptides, oligopeptides, dimers, multimers, and the like, are included within the definition. Both full length proteins and fragments thereof are encompassed by the definition. The terms also include post expression modifications of the polypeptide, for example, glyeosylation, acetylation, phosphorylation, hydroxylation, and the like. Furthermore, for purposes of the present disclosure, a "polypeptide" refers to a protein which includes modifications, such as deletions, additions and substitutions to the native sequence, so long as the protein maintains the desired activity. These modifications may be deliberate, as through site directed mutagenesis, or may be accidental, such as through mutations of hosts which produce the pro teins or errors due to PCR amplification.
[0033] As used herein, the term “single stranded nucleic acid binding domain” refers to a polypeptide or aptamer that preferentially binds to specific sequences of single stranded DNA or single stranded RNA. Single stranded nucleic acid binding domain recognition domains of polypeptides include, but are not limited to, CRISPR associated endonucleases such as Cas 13 or Cas 14, oligonucleotide/oligosaccharide/oligopeptide-binding (OB) folds, such as in such as human POTl, Schizosaccharomyces pombe Potl, Sterkiella nova TΊERB, CspB protein from Bacillus caldolyticus and Bacillus subtilis ; K homology (Kid) domains, such as in Kid domain- containing proteins include heterogeneous ribonucleoprotein K (hnRNP K), far upstream element (FUSE)~bmding protein (FBP), and poly(C)-binding proteins (PCBP) 1 and 2: RNA recognition motifs (RRMs) which bind DNA such as in FBP-interacting repressor (FIR), hnRNP Al, and hnRNP D (also known as Aufl); and whirly domains such as in the mitochondrial whirly protein Why 2 and the mammalian transcriptional regulator PurA. See, for example, Dickey TΉ et al. (2.013) Structure 21(7); 1074-1084.
[0034] As used herein, the term ‘"RNA binding domain” refers to a polypeptide or aptamer that preferentially binds to specific sequences of a single stranded or double stranded RNA which, in the ease of a polypeptide, can include the entire protein or a functional portion thereof. Non-limiting examples of RNA binding domains include an M82 coat protein (MCP), Pumilio (PUF), RNA Recognition Motif (RRM), Double-Stranded RNA-Bmdmg Domain (dsRBD), Zinc finger (ZF) Domains (CCHH zinc fingers: ΊTΊPA, CCCH zinc fingers, CCHC zinc knuckles, RanBP2-type ZFs), Z-alpha, arginine/glycine rich (RGG) domains, or K Homology (KH) Domain, and Poly(A) Binding Proteins. Other examples include Fox-1, !JIA, pentatricopeptide repeat proteins, hnRNP K homology domains, or antibodies engineered to bind RNA. Hie term “RNA binding domain recognition sequence” refers to the RNA sequence to which an RNA binding domain preferentially binds.
[0035] As used herein, the term “DNA break localizing domain” refers to a polypeptide that preferentially binds to regions of DNA damage and/or DNA repair proteins which can include the entire protein or a functional portion thereof. Non-limiting examples of DNA break localizing domains include 14-3-3 proteins, WW domains, Polo-box domains (in PLKi), WD40 repeats (including those in the E3 ligase SCFpTrCP), BRCT domains (including those in BRCA1) and FHA domains (such as in Fkhlp, CHK2 and MDC1). Other examples are provided m Tables 1-5 (see below). [0036] As used herein, “sequence specific endonuclease” refers to an enzyme that cleaves at a specific sequence within a polynucleotide sequence, in some aspects, the nuclease activity can be partially or completed inhibited, so that only one of the two strands or neither strand is cleaved,. Non-limiting examples of sequence specific endonucleases include CRISPR associated (Cas) nuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease.
[0037] The term "Cas9" as used herein encompasses type IT clustered regularly interspaced short palindromic repeats (CRISPR) system of Cas9 endonucleases from any species, and also includes biologically active fragments, variants, analogs, and derivatives thereof that retain Cas9 endonuclease activity (i.e., catalyze site-directed cleavage of DNA to generate doublestrand breaks). A Cas9 endonuclease binds to and cleaves DNA at a site comprising a sequence complementary'· to its bound guide RNA (gRNA).
[0038] A Cas9 polynucleotide, nucleic acid, oligonucleotide, protein, polypeptide, or peptide refers to a molecule derived from any source. The molecule need not be physically derived from an organism but may be synthetically or recombinantly produced. Cas9 sequences from a number of bacterial species are well known in the art and listed in the National Center for Biotechnology Information (NCBI) database. See, tor example, NCBI entries for Cas9 from: Streptococcus pyogenes (WP 002989955, WP 038434062, WP 011528583); Campylobacter jejuni (WP_022552435, YP_G023449G0), Campylobacter cob (WP_0607861 I6); Campylobacter fetus (WP_Q59434633); Corynehacterium ulcerans (NC_015683, NC 017317); Corynehacterium diphtheria (NC 016782, NC 016786); Enterococcus faecalis (WP 033919308); Spiroplasma syrphidicola (NC 021284); Prevotella intermedia (NC_017861); Spiroplasma taiwanense (NC_021846); Streptococcus iniae (NC_021314); Belliella baltica (NC_018010); Psychroflexus torquisl (NC_018721); Streptococcus thermophilus (YP_820832), Streptococcus rrntans (WP 061046374, WP 024786433); Listeria innoma (NP_472073); Listeria monocytogenes (WP_061665472); Legionella pneumophila (WP_062726656); Staphylococcus aureus (WP_001573634); Francisella tularensis (WP 032729892, WP 014548420), Enterococcus faecalis (WP 033919308); Lactobacillus rhamnosus (WP 048482595, WP 032965177); and Neisseria meningitidis (WP_061704949, YP_002342100); all of which sequences (as entered by the date of filing of this application) are herein incorporated by reference. Any of these sequences or a variant thereof comprising a sequence having at least about 70-100% sequence identity thereto, including any percent identity within this range, such as 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% sequence identity thereto, can be used for genome editing, as described herein, wherein the variant retains biological activity, such as Cas9 site-directed endonuclease activity. See also Fonfara et al. (2014) Nucleic Acids Res. 42(4):2577-90; Kapitonov et al. (2015) J. Bacterid. 198(5): 797- 807, Shmakov etal. (2015) Mol. Cell. 60(3):385-397, and Chylinski etal, (2.014) Nucleic Acids Res. 42(10):6091-6I05); for sequence comparisons and a discussion of genetic diversity and phylogenetic analysis of Cas9.
[0039] By "derivative" is intended any suitable modification of the native polypeptide of interest, of a fragment of the native polypeptide, or of their respective analogs, such as glycosylation, phosphorylation, polymer conjugation (such as with polyethylene glycol), or other addition of foreign moieties, as long as the desired biological activity of the native polypeptide is retained. Methods for making polypeptide fragments, analogs, and derivatives are generally available in the art.
[0040] By "fragment" is intended a molecule consisting of only a part of the intact full-length sequence and structure. The fragment can include a C-terminai deletion, an N- terminal deletion, and/or an internal deletion of the polypeptide. Active fragments of a particular protein or polypeptide will generally include at least about 5-10 contiguous amino acid residues of the full length molecule, preferably at least about 15-25 contiguous amino acid residues of the full length molecule, and most preferably at least about 20-50 or more contiguous amino acid residues of the full length molecule, or any integer between 5 amino acids and the full length sequence, provided that the fragment in question retains biological activity, such as Cas9 site- directed endonuclease activity.
[0041] "Substantially purified" generally refers to isolation of a substance (compound, polynucleotide, nucleic acid, protein, polypeptide, polypeptide composition) such that the substance comprises the majority percent of the sample in which it resides. Typically, in a sample, a substantially purified component comprises 50%, preferably 8Q%-85%, more preferably 90-95% of the sample. Techniques for purifying polynucleotides and polypeptides of interest are well-known in the art and include, for example, ion-exchange chromatography, affinity chromatography and sedimentation according to density.
[0042] By "isolated" is meant, when referring to a polypeptide, that the indicated molecule is separate and discrete from the whole organism with which the molecule is found in nature or is present in the substantial absence of other biological macro molecules of the same type. The term "isolated" with respect to a polynucleotide is a nucleic acid molecule devoid, in whole or part, of sequences normally associated with it in nature; or a sequence, as it exists in nature, but having heterologous sequences in association therewith; or a molecule disassociated from the chromosome.
[0043] The terms "polynucleotide," "oligonucleotide," "nucleic acid" and "nucleic acid molecule" are used herein to include a polymeric form of nucleotides of any length, either ribonucl eotides or deoxyribonucleotides. This term refers only to the primary structure of the molecule. Thus, the term includes triple-, double- and single-stranded DNA, as well as triple- , double- and single-stranded RNA. it also includes modifications, such as by methylation and/or by capping, and unmodified fonns of the polynucleotide. More particularly, the terms "polynucleotide," "oligonucleotide," "nucleic acid" and "nucleic acid molecule" include polydeoxyribonucieotides (containing 2-deoxy-D-ribose), polyribonucleotides (containing D- ribose), any other type of polynucleotide which is an N- or C-glycoside of a purine or pyrimidine base, and other polymers containing nonnueleotidie backbones, for example, polyamide (e.g., peptide nucleic acids (PNAs)) and polymorpholino (commercially available from the Anti-Virais, Inc., Corvallis, Qreg., as Neugene) polymers, and other synthetic sequence-specific nucleic acid polymers providing that the polymers contain nudeobases in a configuration which allows for base pairing and base stacking, such as is found in DNA and RNA. There is no intended distinction in length between the terms "polynucleotide," "oligonucleotide," "nucleic acid" and "nucleic acid molecule," and these terms will be used interchangeably. Thus, these terms include, for example, 3 '-deoxy-2',5 '-DNA, oligodeoxyribonucleotide N3' P3' phosphoraxnidates, 2'-0-alkyl-substituted RNA, double- and single-stranded DNA, as well as double- and single -stranded RNA, microRNA, DNA:RNA hybrids, and hybrids between PNAs and DNA or RNA, and also include known types of modifications, for example, labels which are known in the art, methylation, "caps," substitution of one or more of the naturally occurring nucleotides with an analog (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3 -methyl adenosine, C5-propynylcytidine, C5- propynyl uridine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-methylcytidine, 7- deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, 0(6)-methylguanine, and 2-thiocytidine), intemudeotide modifications such as, for example, those with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphoramidates, carbamates, etc.), with negatively charged linkages (e.g., phosphorothioates, phosphorodithioates, etc.), and with positively charged linkages (e.g., aminoalklyphosphoramidates, aminoalkylphosphotriesters), those containing pendant moieties, such as, for example, proteins (including nucleases, toxins, antibodies, signal peptides, poly-L-iysine, etc.), those with intercalators (e.g., acridine, psoralen, etc.), those containing chelators (e.g., metals, radioactive metals, boron, oxidative metals, etc.), those containing alkyfators, those with modified linkages (e.g., alpha anomeric nucleic acids, etc.), as well as unmodified forms of the polynucleotide or oligonucleotide. The term also includes locked nucleic acids (e.g., comprising a ribonucleotide that has a methylene bridge between the 2'-oxygen atom and the 4'-carbon atom). See, for example, Kurreck et ai. (2002) Nucleic Adds Res, 30: 1911-1918; Elayadi et ai. (2001) Curr, Opinion Invest. Drags 2: 558-561; Oram et al. (2001) Curr. Opinion Mol. Ther. 3: 239-243; Koshkin et al. (1998) Tetrahedron 54: 3607-3630; Obika et al. (1998) Tetrahedron Lett. 39: 5401-5404.
[0044] Tire terms "hybridize" and "hybridization" refer to the formation of complexes between nucleotide sequences which are sufficiently complementary to form duplexes via Watson-Crick base pairing.
|O045j In general, "identity" refers to an exact nucleotide to nucleotide or amino acid to amino acid correspondence of two polynucleotides or polypeptide sequences, respectively. Percent identity can be determined by a direct comparison of the sequence information between two molecules by aligning the sequences, counting the exact number of matches between the two aligned sequences, dividing by the length of the shorter sequence, and multiplying the result by 100. Readily available computer programs can be used to aid in the analysis, such as ALIGN, Dayhoff, M.O. in Atlas of Protein Sequence and Structure M.O. Dayhoff ed., 5 Suppl. 3:353 358, National Biomedical Research Foundation, Washington, DC, which adapts the local homology algorithm of Smith and Waterman Advances in Appi. Math. 2:482 489, 1981 for peptide analysis. Programs for determining nucleotide sequence identity are available in the Wisconsin Sequence Analysis Package, Version 8 (available from Genetics Computer Group, Madison, Wi) for example, the BESTFIT, FASTA and GAP programs, which also rely on the Smith and Waterman algorithm. These programs are readily utilized with the default parameters recommended by the manufacturer and described in the Wisconsin Sequence Analysis Package referred to above. For example, percent identity of a particular nucleotide sequence to a reference sequence can be determined using the homology algorithm of Smith and Waterman with a default scoring table and a gap penalty of six nucleotide positions.
[0046] Another method of establishing percent identity in the context of the present disclosure is to use the MPSRCH package of programs copyrighted by the University of Edinburgh, developed by John F. Collins and Shane S. Sturrok, and distributed by Intel liGenetics, Inc. (Mountain View, CA), From this suite of packages, the Smith Waterman algorithm can be employed where default parameters are used for the scoring table (for example, gap open penalty of 12, gap extension penalty of one, and a gap of six). From the data generated tire "Match" value reflects "sequence identity." Other suitable programs for calculating the percent identity or similarity between sequences are generally known in the art, for example, another alignment program is BLAST, used with default parameters. For example, BLASTN and BLASTP can be used with the following default parameters: genetic code = standard; filter = none; strand = both; cutoff = 60; expect = 10; Matrix = BLOSUM62; Descriptions = 50 sequences; sort by = HIGH SCORE; Databases = non-redundant, GenBank + EMBL + DDBJ + PDB + GenBank CDS translations + Swiss protein + Spupdate + PIR. Details of these programs are readily available.
[0047] Alternatively, homology can be determined by hybridization of polynucleotides under conditions which form stable duplexes between homologous regions, followed by- digestion with single stranded specific nuclease(s), and size determination of the digested fragments. DNA sequences that are substantially homologous can be identified in a Southern hybridization experiment under, for example, stringent conditions, as defined for that particular system. Defining appropriate hybridization conditions is within the skill of the art. See, e.g., Sambrook et ak, supra.·, DNA Cloning, supra ; Nucleic Acid Hybridization, supra.
[0048] The term "homologous region" refers to a region of a nucleic acid with homology to another nucleic acid region. Thus, whether a "homologous region" is present in a nucleic acid molecule is determined with reference to another nucleic acid region in the same or a different molecule. Further, since a nucleic acid is often double-stranded, the term "homologous, region," as used herein, refers to the ability of nucleic acid molecules to hybridize to each other. For example, a single-stranded nucleic acid molecule can have two homologous regions which are capable of hybridizing to each other. Thus, the term "homologous region" includes nucleic acid segments with complementary sequences. Homologous regions may vary' in length but will typically be between 4 and 500 nucleotides (e.g., from about 4 to about 40, from about 40 to about 80, from about 80 to about 12.0, from about 120 to about 160, from about 160 to about 200, from about 200 to about 240, from about 240 to about 280, from about 280 to about 320, from about 320 to about 360, from about 360 to about 400, from about 400 to about 440, etc.).
[0049] As used herein, the terms "complementary" or "complementarity" refers to polynucleotides that are able to form base pairs with one another. Base pairs are typically formed by hydrogen bonds between nucleotide units in an anti-parallel orientation between polynucleotide strands. Complementary polynucleotide strands can base pair in a Watson- Crick manner (e.g., A to T, A to U, C to G), or in any other manner that allows for the formation of duplexes. As persons skilled in the art are aware, when using RNA as opposed to DNA, uracil (U) rather than thymine (T) is the base that is considered to be complementary to adenosine. However, when a uracil is denoted in the context of the present disclosure, the ability to substitute a thymine is implied, unless otherwise stated. "Complementarity" may exist between two RNA strands, two DNA strands, or between a RNA strand and a DNA strand, it is generally understood that two or more polynucleotides may be "complementary" and able to form a duplex despite having less than perfect or less than 100% complementarity'. Two sequences are "perfectly complementary" or "100% complementary" if at least a contiguous portion of each polynucleotide sequence, comprising a region of complementarity, perfectly base pairs with the other polynucleotide without any mismatches or interruptions within such region. TWO or more sequences are considered "perfectly complementary'" or "100% complementary" even if either or both polynucleotides contain additional non-complementary sequences as long as the contiguous region of complementarity within each polynucleotide is able to perfectly hybridize with the other. "Less than perfect" complementarity refers to situations where less than all of the contiguous nucleotides within such region of complementarity are able to base pair with each other. Determining the percentage of complementarity' between two polynucleotide sequences is a matter of ordinary' skill in the art. For purposes of Cas9 targeting, a gRNA may comprise a sequence "complementary" to a target sequence (e.g., major or minor allele), capable of sufficient base-pairing to form a duplex (i.e., the gRNA hybridizes with the target sequence). Additionally, the gRNA may comprise a sequence complementary to a sequence adjacent to a PAM sequence, wherein the gRNA also hybridizes with the sequence adjacent to a PAM sequence in a target DNA.
[0050] A "target site" or "target sequence" is the nucleic acid sequence recognized (i.e., sufficiently complementary for hybridization) by a guide RNA (gRNA) or a homology arm of a donor polynucleotide. The target site may be allele-specific (e.g., a major or minor allele).
[0051] As used herein, the term “subject expression sequence” refers to any polynucleotide of tiny length and any sequence that can be transcribed into RNA. In aspects, the subject expression sequence is a polynucleotide inserted within the msd region of the retron non-coding RNA (ncRNA) which is converted to complementary DNA (cDNA) during reverse transcription . In aspects, the subject expression sequence is a donor polynucleotide.
[0052] The term "donor polynucleotide" or “donor sequence” refers to a polynucleotide that provides a sequence of an intended edit to be integrated into the genome at a target locus by HDR.
[0053] By "homology arm" is meant a portion of a donor polynucleotide that is responsible for targeting the donor polynucleotide to the genomic sequence to be edited in a cell. The donor polynucleotide typically comprises a 5' homology arm that hybridizes to a 5' genomic target sequence and a 3' homology arm that hybridizes to a 3' genomic target sequence flanking a nucleotide sequence comprising the intended edit to the genomic DNA, with the positive or plus strand of the double helix (also called Watson strand) used arbitrarily as the reference. The homology arms are referred to herein as 5' and 3' (i.e., upstream and downstream) homology arms, which relates to the relative position of the homology anus to the nucl eotide sequence comprising the intended edit within the donor polynucleotide. The 5 ' and 3' homology- arms hybridize to regions within the target locus in the genomic DNA to be modified, which are referred to herein as the "5' target sequence" and "3' target sequence," respectively. The nucleotide sequence comprising the intended edit is integrated into the genomic DNA by HDR at the genomic target locus recognized (i.e., sufficiently complementary for hybridization) by the 5' and 3' homology arms. [0054] "Administering" a nucleic acid, such as a retron, a nucleic acid encoding a fusion of an RNA binding domain or single stranded nucleic acid binding domain and DMA break localizing domain, guide RMA, or Cas9 expression system, to a cell comprises transforming, transducing, transfecting, electroporating, translocating, fusing, phagocytosing, shooting or ballistic methods, etc., i.e., any means by which a nucleic acid can be transported across a cell membrane.
[0055] By "selectively binds" with reference to a guide RNA is meant that the guide RMA binds preferentially to a target sequence of interest or binds with greater affinity to the target sequence than to other genonne sequences. For example, a gRNA will bind to a substantially complementary sequence and not to unrelated sequences. A gRNA that "selectively binds" to a particular allele, such as a particular mutant allele (e.g., allele comprising a substitution, insertion, or deletion), denotes a gRNA that binds preferentially to the particular target allele, but to a lesser extent to a wild-type allele or other sequences. A gRNA that selectively binds to a particular target DNA sequence will selectively direct binding of an RNA -guided nuclease (e.g., Cas9) to a substantially complementary sequence at the target site and not to unrelated sequences.
[0056] As used herein, the term “recombination target site” denotes a region of a nucleic acid molecule comprising a binding site or sequence-specific motif recognized by a site-specific recombinase that binds at the target site and catalyzes recombination of specific sequences of DNA at the target site. Site-specific recombinases catalyze recombination between two such target sites. The location and relative orientation of the target sites determines the outcome of recombination. For example, translocation occurs if the recombination target sites are on separate DMA molecules.
[0057] As used herein, the terms "label" and "detectable label" refer to a molecule capable of detection, including, but not limited to, radioactive isotopes, fluorescers, chemilimiineseers, chromophores, enzymes, enzyme substrates, enzyme cofactors, enzyme inhibitors, semiconductor nanoparticles, dyes, metal ions, metal sols, ligands (e.g., biotin, streptavidin or haptens) and the like. The term "fluorescer” refers to a substance or a portion thereof which is capable of exhibiting fluorescence in the detectable range. Particular examples of labels which may be used in the practice of the present disclosure include, but are not limited to, SYBR green, SYBR gold, a CAL Fluor dye such as CAL Fluor Gold 540, CAL Fluor Orange 560, CAL Fluor Red 590, CAL Fluor Red 610, and CAL Fluor Red 635, a Quasar dye such as Quasar 570, Quasar 670, and Quasar 705, an Alexa Fluor such as Alexa Fluor 350, Alexa Fluor 488, Alexa Fluor 546, Alexa Fluor 555, Alexa Fluor 594, Alexa Fluor 647,and Alexa Fluor 784, a cyanine dye such as Cy 3, Cy3.5, Cy5, Cy5.5, and Cy7, fluorescein, 2', 4', 5', T- tetrachloro-4-7-dichlorofluorescein (TET), carboxy fluorescein (FAM), 6-earboxy-4',5'- dicbloro-2',7'-dirnethoxyf]uorescein (JOE), hexachlorofiuoreseein (HEX), rhodainine, carboxy-X-rhodainine (ROX), tetramethyl rhodainine (TAMRA), FITC, dansyl, umbelliferone, dimethyl acridiniurn ester (DMAE), Texas red, luminol, NADPH, horseradish peroxidase (HRP), and a-b-galactosidase.
[0058] "Recombinant" as used herein to describe a nucleic acid molecule means a polynucleotide of genomic, cDNA, viral, semisynthetic, or synthetic origin which, by virtue of its origin or manipulation, is not associated with ail or a portion of the polynucleotide with which it is associated in nature. The term "recombinant" as used with respect to a protein or polypeptide means a polypeptide produced by expression of a recombinant polynucleotide. In general, the gene of interest is cloned and then expressed in transformed organisms, as described further below. The host organism expresses the foreign gene to produce the protein under expression conditions.
[0059] The term "transformation" refers to the insertion of an exogenous polynucleotide into a host cell, irrespective of the method used for the insertion. For example, direct uptake, transduction or f-mating are included. Hie exogenous polynucleotide may be maintained as a non-integrated vector, for example, a plasmid, or alternatively, may be integrated into the host genome.
[0060] "Recombinant host cells”, "host cells," "cells", "ceil lines," "cell cultures," and other such terms denoting microorganisms or higher eukaryotic cell fines cultured as unicellular entities refer to cells which can be, or have been, used as recipients for recombinant vector or other transferred DNA, and include the original progeny of the original cell which has been transfected.
[0061] A "coding sequence" or a sequence which "encodes" a selected polypeptide, is a nucleic acid molecule which is transcribed (in the case of DNA) and translated (in the case of mRNA) into a polypeptide in vivo when placed under the control of appropriate regulatory sequences (or "control elements"). Hie boundaries of the coding sequence can be determined by a start codon at the 5' (amino) terminus and a translation stop codon at the 3' (carboxy) terminus. A coding sequence can include, but is not limited to, cDNA from viral, prokaryotic or eukaryotic mRNA, genomic DNA sequences from viral or prokaryotic DNA, and even synthetic DNA sequences. A transcription termination sequence may be located 3' to the coding sequence. The coding sequence may be interrupted by introns which can be self- splicing group I or group P introns or those which are spliced out by the host cell splicing machinery·,
[0062] Typical "control elements," include, but are not limited to, transcription promoters, transcription enhancer elements, introns (located anywhere in the transcript), transcription termination signals, polyadenylation sequences (located 3' to the translation stop codon), sequences for optimization of initiation of translation (located 5' to the coding sequence), and translation termination sequences.
[0063] "Operably linked" refers to an arrangement of elements wherein the components so described are configured so as to perform their usual function. Thus, a given promoter operably linked to a coding sequence is capable of effecting the expression of the coding sequence when the proper enzymes are present. The promoter need not be contiguous with the coding sequence, so long as it functions to direct the expression thereof. Thus, for example, intervening untranslated yet transcribed sequences can be present between the promoter sequence and the coding sequence and the promoter sequence can still be considered "operably linked" to the coding sequence.
[0064] "Expression cassette" or "expression construct" refers to an assembly which is capable of directing the expression of the sequence(s) or gene(s) of interest. An expression cassette generally includes control elements, as described above, such as a promoter which is operably linked to (so as to direct transcription of) the sequence(s) or gene(s) of interest, and often includes a polyadenylation sequence as well. Within certain embodiments of the present disclosure, the expression cassete described herein may be contained within a plasmid or viral vector construct (e.g., a vector for genome modification comprising a genome editing cassette comprising a promoter operably linked to a polynucleotide encoding a guide RNA and a donor polynucleotide). In addition to the components of the expression cassete, the construct may also include, one or more selectable markers, a signal which allows the construct to exist as single stranded DNA (e.g., a M13 origin of replication), at least one multiple cloning site, and a "mammalian" origin of replication (e.g., a SV40 or adenovirus origin of replication) or “yeast” origin of replication (e.g. a 2-micron vector or centromeric vector with an autonomously replicating sequence (ARS)).
(0065] "Purified polynucleotide" refers to a polynucleotide of interest or fragment thereof which is essentially free, e.g., contains less than about 50%), preferably less than about 70%, and more preferably less than about at least 90%, of the protein with which the polynucleotide is naturally associated. Techniques for purifying polynucleotides of interest are well-known in the art and include, for example, disruption of the cell containing the polynucleotide with a chaotropic agent and separation of the polynucleotide(s) and proteins by ion-exchange chromatography, affinity chromatography and sedimentation according to density.
|0066] The term "transfection" is used to refer to the uptake of foreign DNA by a cell. A cell has been "transfected" when exogenous nucleic acids have been introduced inside the ceil membrane. A number of transfection techniques are generally known In the art. See, e.g., Graham et al. (1973) Virology, 52:456, Sambrook et al. (2001) Molecular Cloning, a laboratory manual, 3rd edition, Cold Spring Harbor Laboratories, New York, Davis et al. (1995) Basic Methods in Molecular Biology, 2nd edition, McGraw-Hill, and Chu et al. (1981) Gene 13:197. Such techniques can be used to introduce one or more exogenous nucleic acids moieties into suitable host ceils. The term refers to both stable and transient uptake of the genetic material and includes uptake of peptide- or antibody-linked nucleic acids.
[0067] A "vector" is capable of transferring nucleic acid sequences to target cells (e.g., viral vectors, non-viral vectors, particulate carriers, and liposomes). Typically, "vector construct," "expression vector," and "gene transfer vector," mean any nucleic acid construct capable of directing the expression of a nucleic acid of interest and which can transfer nucleic acid sequences to target cells. Thus, the term includes cloning and expression vehicles, as well as plasmid and viral vectors.
[0068] Hie terms "variant", "analog" and "mutein" refer to biologically active derivatives of the reference molecule that retain desired activity, such as site -directed Cas9 endonuclease activity. In general the terms "variant" and "analog" refer to compounds having a native polypeptide sequence and structure with one or more amino acid additions, substitutions (generally conservative in nature) and/or deletions, relative to the native molecule, so long as the modifications do not destroy biological activity and which are "substantially homologous" to the reference molecule as defined below. In general, the amino acid sequences of such analogs will have a high degree of sequence homology to the reference sequence, e.g., amino acid sequence homology of more than 50%, generally more than 60%-70%, even more particularly 80%-85% or more, such as at least 90%~95% or more, when the two sequences are aligned. Often, the analogs will include the same number of amino acids but will include substitutions, as explained herein. The term "mutein" further includes polypeptides having one or more amino acid-like molecules including but not limited to compounds comprising only amino and/or imino molecules, polypeptides containing one or more analogs of an amino acid (including, for example, unnatural amino acids, etc.), polypeptides with substituted linkages, as well as other modifications known in the art, both naturally occurring and non-naturally occurring (e.g., synthetic), cydized, branched molecules and the like. The term also includes molecules comprising one or more N-substituted glycine residues (a "peptoid") and other synthetic amino acids or peptides. (See, e.g., U.S. Patent Nos. 5,831,005; 5,877,278; and 5,977,301 ; Nguyen et a!., Chem. Biol. (2000) 7:463-473; and Simon et ah, Proe. Natl. Acad. Sci. USA (1992) 89:9367-9371 for descriptions ofpeptoids). Methods for making polypeptide analogs and muteins are known in the art and are described further below.
[0069] As explained above, analogs generally include substitutions that are conservative in nature, i.e., those substitutions that take place within a family of amino acids that are related in their side chains. Specifically, amino acids are generally divided into four families: (1) acidic — aspartate and glutamate; (2) basic — lysine, arginine, histidine; (3) non-polar — alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan; and (4) uncharged polar — glycine, asparagine, glutamine, cysteine, serine threonine, and tyrosine. Phenylalanine, tryptophan, and tyrosine are sometimes classified as aromatic amino acids. For example, it is reasonably predictable that an isolated replacement of leucine with isoleucine or valine, an aspartate with a glutamate, a threonine with a serine, or a similar conservative replacement of an amino acid with a structurally related amino acid, will not have a major effect on the biological activity. For example, the polypeptide of interest may include up to about 5-10 conservative or non -conservative amino acid substitutions, or even up to about 15-25 conservative or non-conservative amino acid substitutions, or any integer between 5-25, so long as the desired function of the molecule remains intact. One of skill in the art may readily determine regions of the molecule of interest that can tolerate change by reference to Hopp/Woods and Kyte-Doolittle plots, well known m the art.
[0070] "Gene transfer" or "gene delivery'" refers to methods or systems for reliably inserting DNA or RNA of interest into a host cell. Such methods can result in transient expression of non-integrated transferred DMA, extrachromosomal replication and expression of transferred replicons (e.g., episomes), or integration of transferred genetic material into the genomic DMA of host cells. Gene delivery expression vectors include, but are not limited to, vectors derived from bacterial plasmid vectors, viral vectors, non-vira! vectors, adenoviruses, retroviruses, alphavimses, pox viruses, and vaccinia viruses.
[0071] The term "derived from" is used herein to identify the original source of a molecule but is not meant to limit the method by which the molecule is made which can be, for example, by chemical synthesis or recombinant means.
[0072] A polynucleotide "derived from" a designated sequence refers to a polynucleotide sequence which comprises a contiguous sequence of approximately at least about 6 nucleotides, preferably at least about 8 nucleotides, more preferably at least about 10-12 nucleotides, and even more preferably at least about 15-20 nucleotides corresponding, i.e., identical or complementary to, a region of the designated nucleotide sequence. The derived polynucleotide will not necessarily be derived physically from the nucleotide sequence of interest, but may be generated in any manner, including, but not limited to, chemical synthesis, replication, reverse transcription or transcription, which is based on the information provided by the sequence of bases in the region(s) from which the polynucleotide is derived. As such, it may represent either a sense or an antisense orientation of the original polynucleotide.
[0073] The term "subject" includes both vertebrates and invertebrates, including, without limitation, mammals, including human and non-human mammals such as non-human primates, including chimpanzees and other apes and monkey species; laboratory animals such as mice, rats, rabbits, hamsters, guinea pigs, and chinchillas; domestic animals such as dogs and cats; farm animals such as sheep, goats, pigs, horses and cows; and birds such as domestic, wild and game birds, including chickens, turkeys and other gallinaceous birds, ducks, geese, and the like. In some cases, the methods of the present disclosure find use in experimental animals, in veterinary application, and in the de velopment of animal models for disease, including, but not limited to, rodents including mice, rats, and hamsters; primates, and transgenic animals,
[0074] The terms ‘'subject,” “individual,” and “patient” are used interchangeably herein to refer to a vertebrate, preferably a mammal, more preferably a human. Mammals include, but are not limited to, murines, simians, humans, farm animals, sport animals, and pets. Tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro are also encompassed.
[0075] “Genetic disease” as used herein refers to a disease, partially or completely, directly or indirectly, caused by one or more abnormalities in the genome, especially a condition that is present from birth. The abnormality' may be a mutation, an insertion or a deletion. The abnormality' may affect the coding sequence of the gene or its regulatory' sequence . The genetic disease may be selected from the group consisting of an inherited muscle disease (e.g., congenital myopathy' or a muscular dystrophy}, a lysosomal storage disease, a heritable disorder of connective tissue, a neurodegenerative disorder, and a skeletal dysplasia. For example, the genetic disease may be, but is not limited to, Duchenne muscular dystrophy (DMD), Becker's muscular dystrophy, Lamb-girdle muscular dystrophy, dysferlinopathy, dystroglycanopatliy, aspartylglucosaminuria, Batten disease, cystinosis, Fabry-7 disease, Gaucher disease, Pompe disease, Tay Sachs disease, Sandhoff disease, metachromatic leukodystrophy, mucolipidosis, mucopolysaccharide storage diseases, Niemann-Pick disease, Schindler disease, Krabbe disease, Ehlers-Danlos syndrome, epidermolysis bullosa, Marfan syndrome, neurofibromatosis, spinal muscular atrophy, amyotrophic lateral sclerosis, progressive muscular atrophy, fragile X syndrome, Charcot-Marie-Tooth disease, osteogenesis imperfecta, achondroplasia, or osteopetrosis. Other genetic diseases include hemophilia, cystic fibrosis, Huntington's chorea, familial hypercholesterolemia (LDL receptor defect), hepatoblastoma, Wilson's disease, congenital hepatic porphyria, inherited disorders of hepatic metabolism, Lesch Nyhan syndrome, sickle cell anemia, thalassaemias, xeroderma pigmentosum, Fanconi’s anemia, retinitis pigmentosa, ataxia telangiectasia, Bloom's syndrome, retinoblastoma, and Tay-Sachs disease. [0076] The term “ribozyme” refers to an RNA molecule that is capable of catalyzing a biochemical reaction. In some instances, ribozymes function in protein synthesis, catalyzing the linking of amino acids m the ribosome. In other instances, ribozymes participate in various other RNA processing functions, such as splicing, viral replication, and tRNA biosynthesis. In some instances, ribozymes can be self-cleaving. Non-limiting examples of ribozymes include the HDV ribozyme, the Lariat capping ribozyrne (formally called GIR1 branching ribozyrne), the glmS ribozyme, group I and group II self-splicing introns, the hairpin ribozyme, the hammerhead ribozyme, various rRNA molecules, RNase P, the twister ribozyme, the VS ribozyme, the pistol ribozyme, and the hatchet ribozyrne. Other examples include the self- cleaving ribozyme -containing R2 elements, the LITc retrotransposon found in Trypanosoma eruzi, short interspaced nuclear elements (SINEs) in Schistosomes, Penelope-like elements and retrozymes. For more information regarding ribozymes, see, e.g., Doherty, et al. Ann. Rev. Biophys. Biomol. Struct. 30: 457-475 (2001) and Weinberg, et al., Nucleic Acids Research, (47) 18: 9480-9494 (2019); incorporated herein by reference in its entirety for all purposes.
[0077] As used herein, the term ‘'administering” includes oral administration, topical contact, administration as a suppository, intravenous, intraperitoneal, intramuscular, intraiesionai, intrathecal, intranasal, or subcutaneous administration to a subject. Administration is by any route, including parenteral and transmucosaJ (e.g., buccal, sublingual, palatal, gingival, nasal, vaginal, rectal, or transdermal). Parenteral administration includes, e.g., intravenous, intramuscular, intra-arteriole, intradermal, subcutaneous, intraperitoneal, intraventricular, and intracranial. Other modes of delivery include, but are not limited to, the use of liposomal formulations, intravenous infusion, transdermal patches, etc. Administering also refers to deli very of material, including biological material such as nucleic acids and/or proteins, into cells by transformation, transfection, transduction, ballistic methods and/or electroporation.
[0078] The term “treating” refers to an approach for obtaining beneficial or desired results including, but not limited to, a therapeutic benefit and/or a prophylactic benefit. By therapeutic benefit is meant any therapeutically relevant improvement in or effect on one or more diseases, conditions, or symptoms under treatment. For prophylactic benefit, the compositions may be administered to a subject at risk of de veloping a particular disease, condition, or symptom, or to a subject reporting one or more of the physiological symptoms of a disease, even though the disease, condition, or symptom may not have yet been manifested. [0079] The term “effective amount” or “sufficient amount” refers to the amount of an agent that is sufficient to effect beneficial or desired results. The therapeutically effective amount may vary depending upon one or more of: the subject and disease condition being treated, the weight and age of the subject, the severity of the disease condition, the manner of administration and the like, which can readily be determined by one of ordinary' skill m the art. The specific amount may vary depending on one or more of: the particular agent chosen, the host cell type, the location of the host cell in the subject, the dosing regimen to be followed, whether it is administered in combination with other compounds, timing of administration, and the physical delivery system in which it is carried.
[0080] The term “pharmaceutically acceptable carrier” refers to a substance that aids the administration of an active agent to a cell, an organism, or a subject. “Pharmaceutically acceptable carrier” refers to a carrier or excipient that can be included in the compositions of the invention and that causes no significant adverse toxicological effect on the patient. Non- limiting examples of pharmaceutically acceptable carrier include water, NaCl, normal saline solutions, lactated Ringer's, normal sucrose, normal glucose, cell culture media, and the like. One of skill in the art will recognize that other pharmaceutical carriers are useful in the present invention.
[0081] As used herein, the term “heterologous” refers to biological material that is introduced, inserted, or incorporated into a recipient (c.g . host) organism that originates from another organism. Typically, the heterologous material that is introduced into the recipient organism (e.g., a host cell) is not normally found in that organism. Heterologous material can include, but is not limited to, nucleic acids, amino acids, peptides, proteins, and structural elements such as genes, promoters, and cassettes. A host cell can be, but is not limited to, a bacterium, a yeast cell, a mammalian cell, or a plant cell. The introduction of heterologous material into a host cell or organism can result, in some instances, in the expression of additional heterologous material in or by the host cell or organism. As a non-limiting example, the transformation of a yeast host cell with an expression vector that contains DNA sequences encoding a bacterial protein may result in the expression of the bacterial protein by the yeast cell. The incorporation of heterologous material may be permanent or transient. Also, the expression of heterologous material may be permanent or transient. I. Compositions
|O082j Provided herein are compounds and compositions for altering a nucleic acid sequence in a cell. In aspects, provided herein is a nucleic acid encoding a retron that includes (a) one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences, (b) an msr sequence, (c) an msd sequence, and (d) a subject expression sequence within the msd sequence. In aspects, the subject expression sequence comprises a donor sequence for homologous directed repair (HDR).
|O083] The RNA binding domain recognition sequence is an RNA sequence specifically bound by an RNA binding domain of a polypeptide or an aptamer. Examples of RNA binding domain recognition sequences that bind polypeptide RNA binding domains include, but are not limited to, MS2 stem loop sequence which binds to the M82 coat protein (MCP), a Pumilio (PUF) recognition sequence, RNA Recognition Motif (RRM) recognition sequence, Double- Stranded RNA -Binding Domain (dsRBD) recognition sequence. Zinc finger (ZF) Domain recognition sequences, Z-alpha, arginine/glycine rich (RGG) domain recognition sequences, a K Homology (KH) Domain recognition sequence, or Poiy(A) tail. An exemplary MS2 coat protein (MCP) is a bacteriophage MS2 coat protein (see, for example UniProtKB - J9QBW2 (J9QBW2JBPMS2) and UniProtKB - P03612 (CAPSD_BPMS2)).
[0084] The single stranded nucleic acid binding domain recognition sequence is a single stranded DNA or RNA sequence specifically bound by a single stranded mieleici acid binding domain of a polypeptide or an aptamer. Non-limiting examples of single stranded nucleic acid binding domain recognition sequences are described in Dickey et al., “Single-Stranded DNA- Binding Proteins: Multiple Domains for Multiple Functions,” Structure 21(7), pgs 1074-1084, July 2, 2013, and references cited therein. As described in Dickey et al. (2013), single stranded DNA-binding proteins have a wide range of structures and functions, but many of them contain small autonomous domains whose recognition of ssDNA has been well studied. These domains include four structural topologies that have been structurally characterized with ssDNA: oligonucleotide/oligosaccharide/oiigopeptide-binding (OB) folds, K homology (KH) domains, RNA recognition motifs (RRMs), and whiriy domains. OB folds are formed from a five-stranded b barrel with interspersed loop and helical elements, show significant structural divergence and are capable of binding a variety of ligands in addition to ssDNA and ssRNA (Theobald et al., 2003). OB folds can bind ssDNA with high sequence specificity. For example, telomere-end protection (TEP) proteins utilize OB folds to sequence specifically bind the GT-rich 30 ssDNA overhang constitutively found at the end of eukaryotic telomeres (reviewed in Horvath, 2011; Lewis and Wuttke, 2012). Example of proteins containing OB folds include Potl, Cdcl3, and TEBP, which are responsible for coordinating end protection and telomerase recruitment at the telomere. Kid domains are small domains (approximately 70 aa) characterized by three a helices packed against a three-stranded b sheet (reviewed in Valverde et al., 2008), and KH domains from proteins structurally characterized in complex with ssDNA include heterogeneous ribonucleoprotein K (hnRNP K), far upstream element (FUSE)-binding protein (EBP), and poly(C)-binding proteins (PCBP) 1 and 2. RRMs most often bind RNA, but have also been shown to bind ssDNA (reviewed in Cle'ry et al., 2008). RRMs are typically about 90 aa in length and form a relatively large b sheet surface (more similar to OB folds than to KH domains) packed against two a helices. The majority of RRMs contain two conserved sequence motifs (RNPs) on strands 1 and 3 that form the primary nucleic acid-binding surface. Residues found elsewhere in the sheet (sometimes including an additional strand) and intervening loops also contribute to nucleic acid binding. Whirly domains are large (approximately 180 aa) domains that contain two roughly parallel four-stranded b sheets with interspersed helical elements, individual domains form tetramers through interaction of the helices, and these tetramers further interact to form hexamers of tetramers (Cappadocia et al, 2010, 2012). See Dickey et al., “Single-Stranded DNA-Binding Proteins: Multiple Domains for Multiple Functions,” Structure 21(7), pgs 1074-1084, July 2, 2013, and references cited therein.
|O085j Thus, in some embodiments, the one or more single stranded nucleic acid binding domain recognition sequences include, but are not limited to, oligoiiucleotide/oligosaceharide/oligopeptide-bindmg (OB) folds, such as in human POTl, Schizosaccharomyces pombe Potl, Sterkiella nova TEPB, CdcI3, CspB protein from Bacillus caldolyticus and Bacillus subtilis ; K homology (KH) domains, such as in KH domain- containing proteins heterogeneous ribonucleoprotein K (hnRNP K), far upstream element (FUSE)-bmding protein (FBP), and poly(C)~binding proteins (PCBP) 1 and 2; RNA recognition motifs (RRMs) which bind DNA such as in FBP-interacting repressor (FIR), hnRNP Al, and hnRNP D (also known as Aufl); and whirly domains such as in the mitochondrial whirly protein Why 2. and the mammalian transcriptional regulator Pur A. In some embodiments, the single stranded nucleic acid binding domain is a single stranded nucleic acid binding domain of a a G-quadruplex binding domain including nucleolin, hnRNP, serine/arginine-rich splicing factors (SRSF) 1 and 9, splicing factor U2AF, TRF2, FRM2, and the RNA helicase associated with AU-rich element (RHAU) proteins (see V. Brazda et al,, DNA and RNA quadruplex -binding proteins. Int I Mol Sci. 2014; 15(10): 17493-17517. doi : 10.3390/ijms 151017493 ) .
[0086] Also provided herein are chimeric constructs encoding a retron multicopy single- stranded DNA (msDNA), which comprises an msr RNA covalently attached to a msd DNA, wherein the RNA comprises one or more RNA binding domain recognition sequences and an msr sequence; and wherein the DNA comprises an msd sequence and a subject expression sequence within the msd sequence. In aspects, the subject expression sequence comprises a donor sequence for homologous directed repair (HDR).
[0087] Also provided herein are polypeptides and their encoding nucleic acids comprising an RNA binding domain or single stranded nucleic acid binding domain covalently bound to a DNA break site localizing domain. In aspects, the RNA binding domain is an RNA binding domain of a polypeptide that binds to a MS2 stem loop sequence which binds to the M82 coat protein (MCP), a Pumilio (PUF) recognition sequence, an RNA Recognition Motif (RRM) recognition sequence, a Double-Stranded RNA-Binding Domain (dsRBD) recognition sequence, a Zinc finger (ZF) Domain recognition sequence, a Z-a!pha, arginine/glycine rich (RGG) domain recognition sequence, a K Homology (KH) Domain recognition sequence, or a Poly(A) tail.
[0088] In aspects, the single stranded nucleic acid binding domain is a single stranded nucleic acid binding domain of a polypeptide that binds to a specific sequence of a single stranded DNA or RNA. Single stranded nucleic acid binding domain recognition domains of polypeptides include, but are not limited to, oiigonudeotide/oligosaccharide/oligopeptide- hinding (OB) folds, such as in such as human POTl, Schizosaccharomyces pornhe Pot I, Sterkiella nova TEPB, CspB protein from Bacillus caldolyticus and Bacillus subtilis ; K homology (KH) domains, such as in KH domain-containing proteins include heterogeneous ribonucleoprotein K (hnRNP K), far upstream element (FUSE)-binding protein (FBP), and polyfQ-binding proteins (PCBP) 1 and 2; RNA recognition motifs (RRMs) which bind DNA such as in FBP-interacting repressor (FIR), bnRNP Al, and linRNP D (also known as Aufl); and whirly domains such as in the mitochondrial whirly protein Why 2 and the mammalian transcriptional regulator PurA.
[0089] It will he understood that additional RNA binding proteins with well-characterized motifs can be utilized for recruiting the retron msDNA. As an alternative mechanism to recruit the retron via the cDNA, an inverted LexA-LexA repeat with an intervening loop sequence could be inserted into the reverse -transcribed portion of the retron donor as described in FIG IB. Upon reverse transcription these inverted repeats would fold back on one another creating a highly stable stem loop structure and enable the LexA DNA binding domain to be utilized. The FHA domain could be replaced with other domains known to bind to double-strand breaks, or the MCP could be fused directly to Cas9 to have retron donor present at the cut site when Cas9 cleavage occurs. Alternatively, other RNA binding domains and aptamers could be used in place of the MS2 system such as the programmable RNA-binding domains of Pumilio/fem- 3 mRNA binding factors (PUF domains) (Zhao et ah, Nucleic Acids Research, 2018 PMCID: PMC5961129) or using CRISPR-Cas systems, where the scaffold for a deactivated Cas nuclease could be introduced in place of MS2 loops, and the deactivated Cas enzyme fused to the FHA domain ,
[0090] In aspects, the DNA break site localizing domain is a DNA break site localizing domain of a polypeptide listed in Tables 1 -5 below.
Table 1. Human Proteins for Recruitment to DNA Break
Figure imgf000031_0001
Table 2, Mammalian FOX Genes
Figure imgf000031_0002
Figure imgf000032_0001
Table 3. Human DN A Damage-Binding Genes
Figure imgf000032_0002
Table 4: Hainan DNA Repair Genes
Figure imgf000032_0003
Figure imgf000033_0001
Figure imgf000034_0001
Figure imgf000035_0001
Table 5 : Yeast DNA Repair Genes
Figure imgf000035_0002
Figure imgf000036_0001
[0091] Also provided are constructs encoding a retron multicopy single-stranded DNA (msDNA), which comprises an msr RNA covalently attached to a msd DNA complexes including a chimera of an RNA hybridized to a DNA, wherein the RNA comprises one or more RNA binding domain recognition sequences and an msr sequence; and wherein the DNA comprises an msd sequence and a subject expression sequence within the msd sequence, and where the chimera is non-covalently bound to a polypeptide that includes an RNA binding domain or single stranded nucleic acid binding domain bound to a DNA break site localizing domain.
I. Retron s
Exemplary retrons comprising msr, msd, and inverted repeat sequences that can be used in the nucleic acids of the disclosure are provided in Table 6. The retrons in Table 6 also express reverse transcriptases that can be used in the methods of the disclosure..
Table 6. Exemplary retrons.
Figure imgf000036_0002
Figure imgf000037_0001
Research, Volume 47, Issue 21, 02 December 2019, Pages 11007-11019).
[0092] In some embodiments, the retron encoded by the nucleic acids described herein is a Retron-Eco I (Ec86) retron and reverse transcriptase system. II. Methods of use
|O093j Provided herein are methods of editing DMA in a cell comprising contacting the ceil with (i) any of the compositions described above encoding a retron that includes (a) one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences, (b) an msr sequence, (c) an msd sequence, and (d) a subject expression sequence within the msd sequence, (2) a polypeptide comprising an RNA binding domain or single stranded nucleic acid binding domain covalently bound to a DNA break site localizing domain or its encoding nucleic acid, (3) a reverse transcriptase or a nucleic acid encoding the same, and (4) a sequence specific endonuclease or a nucleic acid encoding, thereby editing the DNA of the cell.
[0094] In aspects, the sequence specific endonuclease is a Cas9 endonuclease, a Casl2a endonuclease, a Zinc -finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease. In aspects, the sequence specific endonuclease is a Cas9 endonuclease or a Casl2a endonuclease, and wherein the method comprises administering to the subject one or more a guide RNAs (gRNAs), or one or more nucleic acids encoding the same.
[0095] Provided herein are methods of treating a genetic disease in a subject in need comprising administering to the subject an effective amount of (i) any of the compositions described above encoding a retron that includes (a) one or more RNA binding domain recognition sequences or oue or more single stranded nucleic acid binding domain recognition sequences, (b) an msr sequence, (c) an msd sequence, and (d) a subject expression sequence within the msd sequence, (2) a polypeptide comprising an RNA binding domain or single stranded nucleic acid binding domain covalently bound to a DNA break site localizing domain or its encoding nucleic acid, (3) a reverse transcriptase or a nucleic acid encoding the same, and (4) a sequence specific endonuclease or a nucleic acid encoding, thereby treating the disease.
|O096j In aspects, the sequence specific endonuclease is a CRISPR-associated nuclease, such as Cas9 endonuclease, a Cpfl (also known as Casl2a) endonuclease, a Zinc -finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease. in aspects, the sequence specific endonuclease is a Cas9, Cpfl (also known as Cas!2a), C2cl, FokI-dCas9, dCasl3, or dCasl4 endonuclease, and wherein the method comprises administering to the subject one or more a guide RNAs (gRNAs), or one or more nucleic acids encoding the same.
[0097] Genome editing may be performed on a single cell or a population of cells of interest and can be performed on any type of cell, including any cell from a prokaryotic, eukaryotic, or archaeon organism, including bacteria, archaea, fungi, protists, plants, and animals. Cells from tissues, organs, and biopsies, as well as recombinant cells, genetically modified cells, cells from cell lines cultured in vitro , and artificial ceils (e.g., nanoparticles, liposomes, polymersomes, or microcapsules encapsulating nucleic acids) may ail be used in the practice of the present disclosure. The methods of the disclosure are also applicable to editing of nucleic acids m cellular fragments, cell components, or organelles comprising nucleic acids (e.g., mitochondria in animal and plant cells, plastids (e.g., chloroplasts) in plant cells and algae). Cells may be cultured or expanded prior to or after performing genome editing as described herein, in one embodiment, the cells are yeast cells.
[0098] An RNA-guided nuclease can be targeted to a particular genomic sequence (i.e., genomic target sequence to be modified) by altering its guide RRA sequence. A target-specific guide RNA comprises a nucleotide sequence that is complementary to a genomic target sequence, and thereby mediates binding of the nuclease-gRNA complex by hybridization at the target site. For example, the gRNA can be designed with a sequence complementary to the sequence of a minor allele to target the nuclease-gRNA complex to the site of a mutation. The mutation may comprise an insertion, a deletion, or a substitution. For example, the mutation may include a single nucleotide variation, gene fusion, translocation, inversion, duplication, frame shift, missense, nonsense, or other mutation associated with a phenotype or disease of interest. The targeted minor allele may be a common genetic variant or a rare genetic variant. In certain embodiments, the gRNA is designed to selectively bind to a minor allele with single base-pair discrimination, for example, to allow binding of the nuclease-gRNA complex to a single nucleotide polymorphism (SNP). In particular, the gRNA may be designed to target disease-relevant mutations of interest for the purpose of genome editing to remove the mutation from a gene. Alternatively, the gRNA can be designed with a sequence complementary to the sequence of a major or wild-type allele to target the nuclease-gRN A complex to the allele for the purpose of genome editing to introduce a mutation into a gene in the genomic DNA of the ceil, such as an insertion, deletion, or substitution. Such genetically modified cells can be used, tor example, to alter phenotype, confer new properties, or produce disease models for drug screening.
[0099] In certain embodiments, the RNA-guided nuclease used for genome modification is a clustered regularly interspaced short palindromic repeats (CRISPR) system Cas nuclease. Any RNA-guided Cas nuclease capable of catalyzing site-directed cleavage of DNA to allow integration of donor polynucleotides by the HDR mechanism can be used in genome editing, including CRISPR system type I, type P, or type III Cas nucleases. Examples of Cas proteins include Casl, Cas IB, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8al, Cas8a2, CasBb, CasBc, Cas9 (Csnl or Csxl2), CaslO, CaslOd, Casl2a/Cpfl, Mad7™ (INSCRIPTA ®), CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csni6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Csxl6, CsaX, Csx3, Csxl, Csxl5, Csfl, Csf2, Csf3, Csf4, and Cul966, and homologs or modified versions thereof.
[0100] In certain embodiments, a type Ii CRISPR system such as a Cas9 endonuclease is used. Cas9 nucleases from any species, or biologically active fragments, variants, analogs, or derivatives thereof that retain Cas9 endonuclease activity (i.e., catalyze site-directed cleavage of DNA to generate double-strand breaks) may be used to perform genome modification as described herein. The Cas9 need not be physically derived from an organism, but may be synthetically or reeombmantfy produced. Cas9 sequences from a number of bacterial species are well known in the art. and listed in the National Center for Biotechnology Information (NCBI) database. See, tor example, NCBI entries for Cas9 from: Streptococcus pyogenes (WP.002989955, WP 038434062, WP_ 011528583); Campylobacter jejuni (WP .022552435, YP_002344900), Campylobacter coli (WP_060786116); Campylobacter fetus (WP_059434633); Corynebacterium u!cerans (NC_015683, NC_017317); Corynebacterium diphtheria (NC 016782, NC 016786); Enterococcus faecalis (WP 033919308); Spiroplasma syrphidicola (NC_G21284); Prevotella intermedia (NC_017861); Spiroplasma taiwanense (NC_021846); Streptococcus iniae (NC_021314); BellieUa baltica (NC_018010); Psychroflexus torquisl (NC 018721); Streptococcus thermophilus (YP 820832), Streptococcus mutans (WP 061046374, WP 024786433); Listeria innocua (NP 472073); Listeria monocytogenes (WP_061665472); Legionella pneumophila (WP_062726656); Staphylococcus aureus (WP_001573634); Franciselia tularensis (WP_032729892, WP_014548420), Enterococcus faecalis (WP_033919308); Lactobacillus rhamnosus (WP 048482595, WP 032965177); and Neisseria meningitidis (WP 061704949, YP 002342100); all of which sequences (as entered by the date of filing of this application) are herein incorporated by reference. Any of these sequences or a variant thereof comprising a sequence having at least about 70-100% sequence identity thereto, including any percent identity within this range, such as 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91 , 92, 93, 94, 95, 96, 97, 98, or 99% sequence identity thereto, can he used tor genome editing, as described herein. See also Fonfara et al. (2014) Nucleic Acids Res. 42(4):2577-90; Kapitonov et al. (2015) J. Bacteriol. !98(5):797-807, Shmakov et al. (2015) Mol. Cell. 60(3):385-397, and Chylinski et al. (2014) Nucleic Acids Res. 42(10):6091-6105); for sequence comparisons and a discussion of genetic diversity' and phylogenetic analysis of Cas9.
[0101 j The CRISPR-Cas system naturally occurs in bacteria and archaea where it plays a role in RNA -mediated adaptive immunity against foreign DNA. The bacterial type II CRISPR system uses the endonuclease, Cas9, which forms a complex with a guide RNA (gKNA) that specifically hybridizes to a complementary genomic target sequence, where the Cas9 endonuclease catalyzes cleavage to produce a double-stranded break. Targeting of Cas9 typically further relies on the presence of a 3' protospacer-adjacent motif (PAM) in the DNA directly downstream of the gRNA-binding site.
[0102] The genomic target site will typically comprise a nucleotide sequence that is complementary to the gKNA and may further comprise a protospacer adjacent motif (PAM). In certain embodiments, the target site comprises 20-30 base pairs m addition to a 3 base pair PAM, Typically, the first nucleotide of a PAM can be any nucleotide, while the two other nucleotides will depend on the specific Cas9 protein that is chosen. Exemplary' PAM sequences are known to those of skill in the art and include, without limitation, NNG, NGN, NAG, and NGG, wherein N represents any nucleotide. In certain embodiments, the allele targeted by a gRNA comprises a mutation that creates a PAM within the allele, wherein the PAM promotes binding of the Cas9-gRNA complex to the allele.
[0103] In certain embodiments, the gRNA is 5-50 nucleotides, 10-30 nucleotides, 15-25 nucleotides, 18-22 nucleotides, or 19-21 nucleotides in length, or any length between the stated ranges, including, for example, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, or 35 nucleotides in length. The guide RNA may be a single guide RNA comprising crRNA and tracrRNA sequences in a single RNA molecule, or the guide RNA may comprise two RNA molecules with crRNA and tracrRNA sequences residing in separate RNA molecules.
[0104] In another embodiment, the CRISPR nuclease from Prevotella and Francisella 1 (Cpfl) may be used. Cpfl, also known as Casl2a, is another class P CRISPR/Cas system RNA-guided nuclease with similarities to Cas9 and may he used analogously. Unlike Cas9, Cpfl does not require a tracrRNA and only depends on a crRNA in its guide RNA, which provides the advantage that shorter guide RNAs can be used with Cpfl for targeting than Cas9. Cpfl is capable of cleaving either DNA or RNA. The PAM sites recognized by Cpfl have the sequences 5'-YTN-3' (where "Y" is a pyrimidine and "N" is any nucleobase) or 5 -TPU-3' and are located 5' to the gRNA binding site, in contrast to the G-rich PAM site recognized by Cas9 which is located 3' to the gRNA binding site. Cpfl /Cast 2a cleavage of DNA produces double-stranded breaks with a stieky-ends having a 4 or 5 nucleotide overhang. For a discussion of Cpfl, see, e.g., Ledford et ai. (2015) Nature. 526 (7571): 17-17, Zetsche et al. (2015) Cell. 163 (3):759-771, Murovec et al. (2017) Plant BiotechnoL .1 15(S):917-926, Zhang et al. (2017) Front. Plant Sci. 8: 177, Fernandes et al. (2016) Postepy Biochem. 62(3):315-326; herein incorporated by reference.
[0105] In another embodiment, a class 2 type V-A CRISPR-Cas (Cas 12a/Cpfl ) nuclease can be used, such as Mad7™. MAD7™ is an engineered class 2 type V-A CRISPR-Cas (Casl2a/Cpfl) system isolated from Eubacterium rectale. It is an RNA-guided nuclease with demonstrated gene editing activity in Escherichia coli, yeast, human, mice and rat cells. See Liu Z et al, CRISPR J. 2020 Apr;3(2):97-108.
[0106] C2cl is another class II CRISPR/Cas system RNA-guided nuclease that may be used. C2cl , similarly to Cas9, depends on both a crRNA and tracrRNA for guidance to target sites. For a description of C2cl, see, e.g., Shmakov et al. (2015) Mol Cell. 60(3):385-397, Zhang et al. (2017) Front Plant Sci. 8: 177; herein incorporated by reference.
[0107] In yet another embodiment, an engineered RNA-guided Fold nuclease may be used. RNA-guided Fokl nucleases comprise fusions of inactive Cas9 (dCas9) and the Foki endonuclease (FokI-dCas9), wherein the dCas9 portion confers guide RNA-dependent targeting on Fold. For a description of engineered RNA-gtdded Fokl nucleases, see, e.g., Havlicek et ai. (2017) Mol. Ther. 25 (2) : 342-355 , Pan et al. (2016) Sci Rep. 6:35794, Tsai et al. (2.014) Nat Biotechnol . 32(6):569-576; herein incorporated by reference.
[0108] The RNA-guided nuclease can be provided in the form of a protein, such as the nuclease eomplexed with a gRNA, or provided by a nucleic acid encoding the RNA-guided nuclease, such as an RNA (e.g., messenger RNA) or DNA (expression vector). Codon usage may be optimized to improve production of an RNA-guided nuclease in a particular cell or organism. For example, a nucleic acid encoding an RNA-guided nuclease can be modified to substitute codons having a higher frequency of usage in a yeast cell, a bacterial cell, a human ceil, a non-human cell, a mammalian cell, a rodent cell, a mouse ceil, a rat cell, or any other host cell of interest, as compared to the naturally occurring polynucleotide sequence. When a nucleic acid encoding the RNA-guided nuclease is introduced into ceils, the protein can be transiently, conditionally, or constitutively expressed in the cell.
[0109] Donor polynucleotides and gRNAs are readily synthesized by standard techniques, e.g., solid phase synthesis via phosphoramidite chemistry, as disclosed in U.S. Patent Nos. 4,458,066 and 4,415,732, incorporated herein by reference; Beaucage et al.. Tetrahedron (1992) 48:2223-2311; and Applied Biosystems User Bulletin No. 13 (1 April 1987). Other chemical synthesis methods include, for example, the phosphotri ester method described by Narang et al.,Meth. Enzymo!. (1979) 68:90 and the phosphodiester method disclosed by Brown et ai., Me th. Enzymol (1979) 68:109. In view of the short lengths of gRNAs (typically about 20 nucleotides in length) and donor polynucleotides (typically about 100-150 nucleotides), gRNA-donor polynucleotide cassettes can be produced by standard oligonucleotide synthesis techniques and subsequently ligated into vectors. Moreover, libraries of gRNA-donor polynucleotide cassettes directed against thousands of genomic targets can be readily created using highly parallel array-based oligonucleotide library synthesis methods (see, e.g., Cleary et al. (2004) Nature Methods 1:241-248, Svensen et al. (2011) PLoS One 6(9):e24906).
[0110] In addition, adapter sequences can be added to oligonucleotides to facilitate high- throughput amplification or sequencing. For example, a pair of adapter sequences can be added at the 5' and 3' ends of an oligonucleotide to allow amplification or sequencing of multiple oligonucleotides simultaneously by the same set of primers. Additionally, restriction sites can be incorporated into oligonucleotides to facilitate cloning of oligonucleotides into vectors. For example, oligonucleotides comprising gRNA -donor polynucleotide cassettes can be designed with a common 5' restriction site and a common 3' restriction site to facilitate ligation into the genome modification vectors. A restriction digest that selectively cleaves each oligonucleotide at the common 5' restriction site and the common 3' restriction site is performed to produce restriction fragments that can be cloned into vectors (e.g., plasmids or viral vectors), followed by transformation of cells with the vectors comprising the gRNA -donor polynucleotide cassetes. A restriction site can also be added in between the gRNA and donor polynucleotide sequences to enable a second cloning step for the introduction of a guide RNA scaffold sequence or other constructs into the vector,
[0111] Amplification of polynucleotides encoding gRNA-donor polynucleotide cassettes may be performed, for example, before ligation into genome modification vectors or before sequencing and after barcoding. Any method for amplifying oligonucleotides may be used, including, but not limited to polymerase chain reaction (PCR), isothermal amplification, nucleic acid sequence-based amplification (NA8BA), transcription mediated amplification (TMA), strand displacement amplification (SDA), and ligase chain reaction (LCR). In one embodiment, the genome editing cassetes comprise common 5' and 3' priming sites to allow amplification of the gRNA-donor polynucleotide sequences in parallel with a set of universal primers. In another embodiment, a set of selective primers is used to selectively amplify a subset of the gRNA-donor polynucleotides from a pooled mixture.
[0112] Cells that are transformed with recombinant polynucleotides comprising the genome editing cassettes may be prokaryotic cells or eukaryotic cells and are preferably designed for high-efficiency incorporation of gRNA-donor polynucleotide libraries by transformation. Methods of introducing nucleic acids into a host ceil are well known in the art. Commonly used methods of transformation include chemically-induced transformation, typically using divalent cations (e.g., CaCh), and electroporation. See, e.g., Sambrook et al. (2001) Molecular Cloning, a laboratory manual, 3’d edition, Cold Spring Harbor Laboratories, New York, Davis et al. (1995) Basic Methods m Molecular Biology, 2nd edition, McGraw-Hill, and Chu et al. (1981) Gene 13: 197; herein incorporated by reference in their entireties. [0113] Normally, random diffusion of donor DNA to a DNA break is rate-limiting for homologous repair. Active donor recruitment may be used to increase the frequency of ceils genetically modified by HDR. The method for active donor recruitment comprises: a) introducing into a cell a fusion protein comprising a protein that selectively binds to the DNA break connected to a polypeptide comprising a nucleic acid binding domain; and b) introducing into the cell a donor polynucleotide comprising i) a nucleotide sequence sufficiently complementary to hybridize to a sequence adjacent to the DNA break, and ii) a nucleotide sequence comprising a binding site recognized by the nucleic acid binding domain of the fusion protein, wherein the nucleic acid binding domain selectively binds to the binding site on the donor polynucleotide to produce a complex between the donor polynucleotide and the fusion protein, thereby recruiting the donor polynucleotide to the DNA break and promoting HDR.
[0114] The DNA break may be created by a site-specific nuclease, such as, but not limited to, a Cas nuclease (e.g., Cas9, Cpfl, or C2cl), an engineered RNA-guxded Fokl nuclease, a zinc finger nuclease (ZFN), a transcription activator-like effector-based nuclease (TALEN), a restriction endonuclease, a meganuclease, a homing endonuclease, and the like. Any site- specific nuclease that selectively cleaves a sequence at the target integration site for tire donor polynucleotide may be used.
[0115] The DNA break may be a single-stranded (nick) or double-stranded DNA break. If the DNA break is a single-stranded DNA break, the fusion protein used comprises a protein that selectively binds to the single-stranded DNA break, whereas if the DNA break is a double- stranded DNA break, the fusion protein used comprises a protein that selectively binds to the double-stranded DNA break. The fusion protein can also recognize both single -stranded and double -stranded DNA breaks.
[0116] In the fusion, the protein that selectively binds to the DNA break can be, for example, an RNA-guided nuclease, such as a Cas nuclease (e.g., Cas9 or Cpfl) or an engineered RNA- guided Fokl nuclease.
[0117] Donor polynucleotides may be single-stranded or double-stranded and may be composed of RNA or DNA. A donor polynucleotide comprising DNA can be produced from a donor polynucleotide comprising RNA, if desired, by reverse transcription using reverse transcriptase either in the cell (e.g. by a retron reverse transcriptase) or outside the cell (e.g. by a recombinant reverse transcriptase such as M-MLV).
[0118] The RNA binding domain may be any protein or domain from a protein that binds a known RNA sequence. Examples of each of these proteins are well known in the art. Nonlimiting examples of RNA binding domains include domains of proteins that bind to MS2 stem loop sequence, a Pumilio (PUF) recognition sequence, an RNA Recognition Motif (RRM) recognition sequence, a Double-Stranded RNA -Binding Domain (dsRBD) recognition sequence, a Zinc finger (ZF) Domain recognition sequence, a Z-alpha, arginine/glycine rich (RGG) domain recognition sequence, a K Homology (KH) Domain recognition sequence, or a Poly (A) tail.
[0119] The single stranded nucleic acid binding domain may be any protein or domain from a protein that binds a known single stranded nucleic acid sequence. Examples of each of these proteins are well known in the art. Single stranded nucleic acid binding domain recognition domains of polypeptides include, but are not limited to, oligonucleotide/oligosaccharide/oligopeptide-binding (OB) folds, such as in such as human POTT, Schizosaccharomyces pombe Potl, Sterkiella nova TEPB, CspB protein from Bacillus caldolyticus and Bacillus subtilis ; K homology (KH) domains, such as in KH domain- containing proteins include heterogeneous nbonucleoprotein K (hiiRNP K), far upstream element (FUSE)-binding protein (FBP), and poly(C)~binding proteins (PCBP) 1 and 2: RNA recognition motifs (RRMs) which bind DNA such as in FBP-iuteracting repressor (FIR), hnRNP Al, and hriRNP D (also known as Aufl); and whirly domains such as in the mitochondrial whirly protein Why 2. and the mammalian transcriptional regulator Pur A.
[0120] In another embodiment, the fusion protein may comprise a FHA phosphothreonine- binding domain, wherein the donor polynucleotide is selectively recruited to a DNA break having a protein comprising a phosphorylated threonine residue located sufficiently close to the DNA break for the FHA phosplioihreonine-bindmg domain to bind to the phosphorylated threonine residue. The FHA phosphothreonine-binding domain may be combined with any RNA binding domain (e.g., fusion with MCP) or single stranded nucleic acid binding domain (e.g. OB-fold) for donor recruitment. [0121] Without being bound by theory, it is contemplated that the donor recruitment protein includes a fusion of a polypeptide domain from any protein that has an RNA binding domain or single stranded nucleic acid binding domain with a polypeptide domain from any protein that has a DNA break localizing domain.
[0122] Non-limiting examples of DNA break localizing domains include domains of proteins that bind to areas of DNA damage and/or DNA repair proteins. Phospho-Ser/Tbr-binding domains have emerged as crucial regulators of cell cycle progression and DNA damage signaling. Such domains include 14-3-3 proteins, WW domains, Polo-box domains (in PLK1), WD40 repeats (including those m the E3 iigase sCFpTtCP), BRCT domains (including those in BRCAi) and FHA domains (such as in CHK2 and MDCI). These domains all have the potential to be used in donor recruitment systems. FHA domains are conserved between eukaryotes and bacteria and thus would also have utility in bacteria as well as eukaryotes for donor recruitment. Examples of proteins or genes encoding such proteins are provided, without limitation, in Tables 1-5. Additional genes/proteins are known in the art and can be found, for example, by searching public gene or protein databases for genes or proteins known to have a role in DNA repair or binding of DNA damage (e.g., gene ontology term analysis), it is contemplated that proteins from any species can be used (e.g., eukaryotic proteins, proteins from yeast, mammalian cells, including human proteins, and/or from fungus), in embodiments, the donor recruitment protein comprises a polypeptide sequence from a DNA break-recruiting protein from the same kingdom, phylum or division, class, order, family, genus, and/or species as the cell to be genetically modified.
[0123] In some embodiments, the fusion protein comprises an RN A binding domain of MS2 coat protein (MCP) joined to a forkhead-associated (FHA) domain. In some embodiments, the fusion protein comprises comprises an RNA binding domain of MS2 coat protein (MCP) joined to an FHA phosphothreonine-binding domain. In some embodiments, the fusion protein comprises a LexA domain located between the RNA binding domain of MCP and the FHA domain. In some embodiments, the LexA domain is from the LexA repressor protein (UniProtKB - P0A7C2).
[0124] In certain embodiments, an inhibitor of the noil-homologous end joining (NHEJ) pathway is used to further increase the frequency of cells genetically modified by HDR. Examples of inhibitors of the NHEJ pathway include any compound (agent) that inhibits or blocks either expression or activity of any protein component in the NHEJ pathway. Protein components of the NHEJ pathway include, but are not limited to, Kis70, Ku86, DNA protein kinase (DNA-PK), Rad50, MRE11, NBSi, DNA ligase IV, and XRCC4. An exemplary inhibitor is wortmannin which inhibits at least one protein component (e.g., DNA-PK) of the NHEJ pathway. Another exemplary inhibitor is 8cr7 (5,6-bis((E)-benzylideneamino)-2- mercaptopyrimidin-4~ol), which inhibits joining of DSBs (Maruyama et al. (2015) Nat. Biotechnol. 33(51:538-542, Lin etal. (2016) Sci. Rep. 6:34531). RNA interference or CRI8PR- interference may also be used to block expression of a protein component of the NHEJ pathway (e.g., DNA-PK or DNA ligase IV). For example, small interfering RNAs (siRNAs), hairpin RNAs, and other RNA or RNA:DNA species which can be cleaved or dissociated in vivo to form siRNAs may be used to inhibit the NHEJ pathway by RNA interference. Alternatively, deactivated Cas9 (dCas9) together with single guide RNAs (sgRNAs) complementary to the promoter or exonic sequences of genes of the NHEJ pathway can be used in transcriptional repression by CRISPR-interference. Alternatively, an HDR enhancer such as RS-1 maybe used to increase the frequency of HDR in cells (Song et al. (2016) Nat. Commun. 7:10548).
[0125] It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents, and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes.
EXAMPI.ES
Example 1 : Recruitment of retron-amplified donor DNA to double-strand breaks for enhanced homology-directed repair
[0126] The combination of amplifying many copies of single-stranded donor in the nucleus and recruiting those copies directly to targeted double-strand breaks is an effective strategy m traditionally hard-to-edit organisms and cell fines of industrial and medical importance (for example, Aspergillus niger and human iPSCs and T-cells). Described herein are methods to enhance homology -directed repair (HDR) by recruiting single-stranded donor deoxyribonucleic acid (DNA) expressed with the bacterial retron system to double strand breaks (DSBs). Specifically, donor recruitment technology (a.k.a. MAGE8TIC) was demonstrated which may be adapted for the retron system by introducing M82 ribonucleic acid (RNA) stem-loops into the retron and fusing the forkhead-associated (FHA) donor recruitment domain to the MS2 coat protein (MCP) which binds to the MS2 RNA. In principle, this approach may apply to any cell which may utilize HDR for double-strand break (DSB) repair.
[0127] A donor recruitment system whereby a LexA-FHA fusion protein consisting of the LexA DNA binding domain (DBD) and the forkhead-associated (FHA) domain of Fkhlp recruit donor plasmids to clustered regularly interspaced short palindromic repeats (CRISPR) double-strand breaks (DSBs) was described previously. In this system, the LexA -DBD binds to an array of LexA sites on the double-stranded DNA (dsDN A) donor plasmid, and the FHA domain binds to phosphothreonine-containing proteins which accumulate at DSBs. As single- stranded DNA (ssDNA) is known to be a superior substrate tor homology-directed repair, the FHA recruitment system w¾s sought to combine with the ssDNA donor retron amplification system. Because the LexA DBD does not bind to single-stranded DNA, advantage was taken of tire unique dual RNA-DNA structure found in the mature retron msDNA and two MS2 stem- loops were inserted directly upstream the 5' end of the retron.
FIG. 1A shows the expression locus for the retron donor (Triose-phosphate DeHydrogenase 3 (TDH3) promoter) and guide (small nucleolar RNA 52 (SNR52) promoter). Two MS2 stem- loop repeats are inserted in between the 5' Hepatitis Delta Virus (HDV) ribozyme and the 5' end of the retron. Hie retron donor introduces CC-to-TG mutation which results in a premature termination codon. The guide-donor plasmid also harbors a tandem array of 4 LexA sites to enable comparison of the results directly with the previously demonstrated LexA-FHA donor recruitment system.
[0128] FIG. IB shows the mature retron msDNA transcripts after the HDV ribozyme has cleaved off the 5' cap and after reverse transcription of the msd region and host cell RNase H activity has removed the msd RNA component. The 3' inverted repeat is still shown as base- paired to the 5' inverted repeat, although that is likely removed along with the 3' polyA tail by- host cell 3'-5f exonucleases. As shown in the left drawing of FIG. IB, the donor recruitment module consists of the MS2 coat protein (MCP) fused to the forkhead-associated (FHA) domain of fork head protein homolog 1 (Fkhlp), optionally containing a LexA DNA binding domain (DBD) in between to allow for simultaneous recruitment of double-stranded plasmid donor and single-stranded retron donor. MCP binds to the MS2 stem loops linked to the retron donor via the branched G of the retron msr RNA through the unusual 2' ribonucleic acid (RNA)-5' deoxyribonucleic acid (DNA) linkage catalyzed by the retron reverse transcriptase (RT) during initiation of complementary DNA (cDNA) synthesis. The FHA domain binds to phosphothreonine motifs on several proteins which localize to double-strand breaks, including Mutator Phenotype (Mphlp), Fdolp, and other unidentified protein(s). The middle drawing of Fig. IB show's a control retron donor lacking the MS2 loops. The right drawing of Fig. IB shows an alternate method for recruitment of the retron based on two inverted repeats of die LexA sequence downstream of the donor, which would be bound by the LexA-FHA fusion protein.
[0129] The top panel of FIG. 1C shows ADEnine requiring (ADE2) editing assay with a strain harboring Cas9 and a high-copy plasmid harboring a retron donor introducing a premature translation termination codon in the ADE2 open reading frame (ORF) and an artificially weakened guide sequence harboring genomic mismatches at positions 20, 19, and 18 from the protospacer adjacent motif (PAM) (i.e., a 17-mer guide). From left to right are strains without RT, with the RT, and then with either LexA-FHA, MCP-FHA or MCP -LexA- FHA fusion proteins. These strains were transformed with the guide-donor plasmids and plated on agar plates, which after 2 days of growth were washed for additional liquid outgrowth in selective media for the indicated number of generations. Editing efficiency was quantified by amplicon sequencing of the ADE2 target locus. Non-homologous end joining-mediated insertions or deletions (NHEJ indels) were identified as small indels at the Cas9 cleavage site. NHEJ indels were only observed at appreciable levels in the Cas9 only setup, indicating that all methods of donor DNA enhancement work to mitigate indel formation. Recruitment of the retron with either MCP-FHA or MCP-LexA-FHA significantly enhanced editing at time zero. The bottom panel of FIG. 1C show's the dual retron amplification-donor recruitment system enables editing with an even further weakened guide, with mismatches from 20 to 17 bp from the PAM (i.e., a I6~mer guide). Recruitment via the LexA inverted repeats does not improve editing over the HDV-retron donor control, while the MCP-FHA or MCP-LexA-FHA systems both improve editing substantially. For comparison, prime editing efficiency at the same target site is shown with a full 20-mer guide and the same CC-to-TG edit encoded in the RT template of the prime editing guide RNA (pegRNA).
[0130] In summary, the results in Fig 1C show that the effect of retron recruitment is more than simply combining that of donor recruitment and retron separately, as the MCP-FHA and MCP-LexA-FHA constructs perform tire best. The data also show that retron-based editing is possible even when the guide is truncated down to a 16-mer which is supposed to eliminate cleavage capacity of Cas9 and may actually have nicking activity. In this scenario, the MS2 retron recruitment again shows improved editing over retron alone or an alternative retron donor recruitment construct with an inverted LexA-LexA repeat. For comparison, all of these systems outperform prime editing in yeast,
[0131] FIGS . 2A shows the levels of retron cDNA produced by the different editing cassettes from FIG 1. FIG. 2.B shows next-generation sequencing (NGS)-based quantification of retron donor cDNA levels in the absence of Cas9 or donor recruitment proteins. Primers were designed to amplify both the single stranded donor template and the genomic target. The donor encodes a CC-to-TG mutation in the middle (asterisk), so the ratio of reads containing the donor mutation relative to the wild type (WT) genomic locus is proportional to the ratio of donor cDNA to genome copies. The different cassettes are sorted left to right by greatest to least retron cDNA produced. There is no Cas9 or editing in these experiments, so the 17mer and 16mer guides serve as additional replicate experiments. The primers also amplify the double- stranded donor on die retron guide cassete, which resides on a high copy 2-micron vector (except for the TDH3-HDV cassette labeled “on Cen/Ars”). To account for the background level of plasmid donor, the same retron cassettes were transformed into cells lacking an RT, and the donor: genome ratio in such cells grown in glucose was first subtracted from the levels observed in the cells with the RT. The genome has two strands which can bind both primers in the first round of polymerase chain reaction (PCR), while the donor has only one strand, so the donorgenome ratio is multiplied by 2 to obtain the values on the y-axis, cDNA copies per genome equivalent.
[0132] In addition to ssDNA being a superior donor template for HDR than dsDNA, the number of copies of retron donor can vastly exceed the highest levels of donor plasmids observed in cells. By expressing the retron from a Pol IT promoter with the HDV ribozyme at the 5' end, >500 copies of ssDNA per ceil was achieved. By contrast, the high copy two-micron plasmids in budding yeast only accumulate to -20-30 copies per cell (Karim et ah, FEMS Yeast Research, 2013, PMCID: PMC3546148).
[0133] In summary, the retron donors driven by the TDH3 promoter 5'HDV-3'none produce similar levels of cDNA (-'-800 copies per cell) to that observed with the GAL7 promoter. Interestingly the addition of the 2.X-MS2 loops slightly reduces retron cDNA levels (-600 copies per cell). Apparently, this reduction is more than offset by the recruitment function of MCP-FHA. in other words, simply producing more retron is not as effective as recruiting the retron directly to the cut site.
Example 2: Comparison of different retron donor systems in multiplexed editing.
[0134] Further experiments were performed to test how the different donor DNA systems perform in the context of multiplexed editing, where all possible single nucleotide variants (SNVs) across a genomic region are introduced into a pool of cells such that each cell recei ves a single edit. To show that the system is versatile and applicable to different nucleases, the experiments were performed with two different nucleases, 8pCas9 and LbCasl2a. As shown in Fig. 3 A, two different windows rich in NGG protospacer adjacent motifs (PAMs) and TTTV PAMs were chosen for SpCas9 (20-bp guides) and LbCasl2a (23-bp guides), respectively. For each window, two different sets (A and B) of non-overlapping guides were designed so that each SNV could be unambiguously attributed to a single guide. Each guide wus paired with a library of donor DNAs with all possible SNVs across the target sequence including the PAM. Flash region was analyzed by next-generation sequencing (NGS) to quantify the levels of each SNV along the targeted region (Fig. 3B). For visualization purposes all three SNVs at each position are combined into a single column. The arrows at the top of the plot denote the position and directionality of the guides, with PAMs for SpCas9 and LbCasl2a represented by the end and beginning of each arrow, respectively. The fraction in the upper right hand comer represents the total amount of edited SNV fraction for each donor system, which is plotted for ease of visualization in bar chart format in FIG 3C, with the total edited fractions obtained in replicate experiments denoted by circles and triangles.
[0135] In FIG 3D, the edit fractions for each designed variant in SpCas9 set A (438 SNVs for 6 guides), SpCas9 set B (339 SNVs for 5 guides), LbCasl2a set A (348 SNVs for 4 guides), LbCasI2a set B (348 SNVs for 4 guides) are plotted as box plots to demonstrate how the abundance distribution of individual variants varies tor each donor DNA enhancement system and nuclease. Designed edits which were not observed are indicated by the numbers at the bottom of each box plot, and visualized by adding a pseudo-fraction of le-05 to all variants. Note that the retron appears to benefit editing to a greater extent with LbCasl2a than 8pCas9. Also note that retron donor recruitment with the MCP fusion proteins has a marked improvement over separated plasmid recruitment and retron expression (LexA-FHA + RT) for LbCasl2a. Direct comparison between overall editing efficiency between SpCas9 and LbCasl2a is complicated by the fact that the SpCas9 and LbCasl2a libraries were synthesized separately and exhibited different ohgo error rates.
[0136] In summary, relative to the retron alone, recruiting the retron to cut sites with the MCP fusion improves editing dramatically, and in the ease of LbCasl2a, outperforms LexA- FHA donor recruitment alone by a large margin. Of note is that cleavage of DNA by LbCas 12a produces a 5' overhang of 4-5 nt, whereas cleavage of DNA by Cas9 produces a blunt end. Therefore, while not bound by theory, the single-stranded retron could initiate pairing more efficiently than Cas9 due to the difference in DNA termini produced by the nucleases.
Exemplary Embodiments
[0137] Exemplary embodiments provided in accordance with the presently disclosed subject matter include, but are not limited to, the claims and the following embodiments:
1. A nucleic acid encoding a retron comprising: a. one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences; b. an msr sequence; c. an msd sequence; d. a subject expression sequence within the msd sequence; and, e. a first inverted repeat sequence and a second inverted repeat sequence.
2. The nucleic acid of embodiment 1, wherein the subject expression sequence comprises a donor sequence for homologous directed repair (HDR).
3. The nucleic acid of embodiment 1, where said RNA binding domain recognition sequence is a MS2 stem loop sequence, a Pumilio (PUF) recognition sequence, an RNA Recognition Motif (RRM) recognition sequence, a Double-Stranded RNA-Binding Domain (dsRBD) recognition sequence, a Zinc finger (ZF) Domain recognition sequence, a G- quadruplex-formiiig sequence, a Z-alpha, argimne/glyeine rich (RGG) domain recognition sequence, or a K Homology (KH) Domain recognition sequence.
4 . The nucleic acid of embodiment 1, wherein single stranded nucleic acid binding domain recognition sequence is a sequence recognized by the single stranded nucleic acid binding domain of a CRISP R associated endonuclease, POT1, TEPB, CspB, a K homology (KH) domains, a far upstream element (FUSE)-binding protein (FBP), a poiy(C)-binding protein, a G-quadruplex binding domain including nucleolin, hiiRNP, serine/arginine-rich splicing factors (SRSF) 1 and 9, splicing factor U2AF, TRF2, FRM2, and the RNA heiicase associated with A U -rich element (RHAU) proteins, an FBP-interacting repressor (FIR), hiiRNP A 1, hnRNP D, or a whirly domain.
5. A chimeric construct comprising an RNA hybridized to a DNA, wherein the RNA comprises one or more RNA binding domain recognition sequences and an msr sequence: and wherein the DNA comprises an msd sequence and a subject expression sequence within the msd sequence.
6. The chimeric construct of embodiment 5, wherein the subject expression sequence comprises a donor sequence for homology-directed repair (HDR).
7. A polypeptide comprising an RNA binding domain or single stranded nucleic acid binding domain bound to a DNA break site localizing domain.
8. The polypeptide of embodiment 7. wherein the RNA binding domain is an RNA binding domain of a polypeptide that binds to a M82 stem loop sequence, a Pumilio (PUF) recognition sequence, an RNA Recognition Motif (RRM) recognition sequence, a Double- Stranded RNA-Binding Domain (dsKBD) recognition sequence, a Zinc finger (ZF) Domain recognition sequence, a Z -alpha, arginine/glycine rich (RGG) domain recognition sequence, or a K Homology (KH) Domain recognition sequence.
9. The polypeptide of embodiment 7, wherein single stranded nucleic acid binding domain is a single stranded nucleic acid binding domain of a CRISPR associated endonuclease, POT1, TEPB, CspB, a K homology (KH) domain, a far upstream element (FUSE)-binding protein (FBI5), a poiy(C)-bindmg protein, an FBP-interacting repressor (FIR), linRNP A I, hnRNP D, or a whirly domain,
10. The polypeptide of embodiment 7, wherein the DNA break site localizing domain is a DNA break site localizing domain of a polypeptide listed in any of Tables 1 to 5.
1 i . The polypeptide of embodiment 7, wherein the RNA binding domain comprises an RNA binding domain of M82. coat protein (MCP) and the DNA break site localizing domain comprises a forkhead-associated (FHA) domain.
12. The polypeptide of embodiment I I, further comprising a LexA domain located between the RNA binding domain of MCP and the FHA domain
13. A nucleic acid encoding the poly peptide of any one of embodiments 7 to 12.
14. The chimeric construct of embodiment 5 non -covalently bound to the polypeptide of any one of embodiments 7 to 12. 15. A method of editing DNA in a cell, comprising contacting the cell with the nucleic acid of any one of embodiments 1 to 4 and the polypeptide of any one of embodiments 7 to 12 or the nucleic acid of embodiment 13, a reverse transcriptase or a nucleic acid encoding the same, and a sequence specific endonuclease or a nucleic acid encoding the same, thereby editing the DNA of the cell .
16. The method of embodiment 15, wherein the sequence specific endonuclease is a CRISPR associated (Cas) nuclease, a Zinc -finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease.
17. The method of embodiment 15 or 16, wherein the method comprises administering to the subject one or more a guide RNAs (gRNAs), or one or more nucleic acids encoding the same.
18. The method of embodiment 16 or 17, wherein the Cas nuclease is Cas9, SpCas9, Cpfl (Cas 12a), Mad?™, C2cl, or FokI-dCas9.
19. A method of treating a genetic disease in a subject in need comprising administering to the subject the nucleic acid of any one of embodiments 1 to 4 and the polypeptide of any one of embodiments 7 to 12 or the nucleic acid of embodiment 13, a reverse transcriptase or a nucleic acid encoding the same, and a sequence specific endonuclease or a nucleic acid encoding the same, thereby editing the DNA.
20. The method of embodiment 19. wherein the sequence specific endonuclease is a Cas nuclease, a Zinc-finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease.
21. The method of embodiment 19 or 20, wherein the method comprises administering to the subject one or more a guide RNAs (gRNAs), or one or more nucleic acids encoding the same.
22. The method of embodiment 20 or 21 , wherein the Cas nuclease is Cas9, 8pCas9, Cpfl (Cas 12a), Mad7™, C2cl, or Fokl-dCas9.
23. The method of any one of embodiments 16 to 22, wherein the Cas nuclease is selected from the goup consisting of Casl, CaslB, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cash, Cas6e, Cas6f, Cas7, CasBa!, Cas8a2, Cas8b, Cas8c, Cas9 (Csnl or Csxl2), SpCas9, FokI-dCas9, CaslO, CaslOd, Casl2a/Cpfl, Mad7™, CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Csci, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Csxl6, CsaX, Csx3, Csxl, Csxl5, Csfl, Csf2, Csf3, Csf4, C2cl and Cul966, and homologs or modified versions thereof
[0138] All publications and patent applications mentioned in this disclosure are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference.
[0139] No admission is made that any reference cited herein constitutes prior art. The discussion of the references states what their authors assert, and the Applican t reserves the righ t to challenge the accuracy and pertinence of the cited documents. It will be clearly understood that, although a number of information sources, including scientific journal articles, patent documents, and textbooks, may be referred to herein; this reference does not constitute an admission that any of these documents forms part of the common general knowledge in the art.
[0140] The discussion of the general methods given herein is intended for illustrative purposes only. Other alternative methods and alternatives will be apparent to those of skill in the art upon review of this disclosure and are to be included within the scope of this application.
[0141] While particular alternatives of the present disclosure have been disclosed, it is to be understood that various modifications and combinations are possible and are contemplated within the scope of the appended claims. There is no intention, therefore, of limitations to the exact abstract and disclosure herein presented.
INFORMAL SEQUENCE LISTING
MSR:
ATGCGCACCCTTAGCGAGAGGTTTATCATTAAGGTCAACCTCTGGATGTTGTTTC GGCATCCTGCATTGAATCTGAGTTACT (SEQ ID NO; I)
MSI.)·
TCTGAGTTACTGTCTGTnTCCT (SEQ ID NO:2). (first fragment), programmable loop, AGGAAACCCGTTTCTTCTGACGTAAGGGTGCGCA (SEQ ID NO:3) (second fragment with inverted repeat).
SV40 NLS -linker at N-terminus of MCP-LexA-FHA and MCP-FHA fusion constructs:
MPPKKKRKVGSGS (SEQ ID NO:4)
MCP domain:
N terminus-
ASNFTQFVLVDNGGTGDVTVAPSNFANGVAEWISSNSRSQAYKVTCSVRQSSAQKR
KYTIKVEVPKVATQTVGGVELPVAAWRSYLNMELTIPIFATNSDCELIVKAMQGLLK
DGNPTPS AI A AN SG GU -C terminus (SEQ ID NO:5)
LexA DNA binding domain:
N terminus-
MKALTARQQEVFDLIRDHISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEI VSGASRGIRLLQEEEEGLPLVGRVAAGEPLLAQQFflEGHYQVDPSLFKPNADFLLRVS GMSMKDIGIMDGDLLAVHKTQDVRNGQVWARTDDEVTVKRLKKQGNKVELLPEN SEFKPIVVDLRQQSFTiEGLAVGVIRNGDWL - C terminus (SEQ ID NO:6) Glvcine-Seriiie-ridi linker and SV40 NLS separating domains of MCP and LexA DBD:
N terminus-SAGGGGSGGGGSGGGGSGPKKKRKVAAAGSG-C terminus (SEQ ID NO:7)
FHA domain:
N terminus-
MSVTSREQKFSGKYSSYTAQDRQGLVNAVTCVLSSSSDPVAVSSDYSNSLSIAREVN AYAKIAGCDWTYYVQKLEVTIGRNTDSLNLNAVPGTVVKKNIDIDLGPAKJVSRKHA AIRFNLESGSWELQIFGRNGAKVNFRRIPTGPDSPPTVLQSGCIIDIGGVQMIFILPEQE TIISDYCLNHLMPKLLSTYGTNGNNNPLLRNIIEGSTYLREQRLQEEARLQRLDHL - C terminus (SEQ ID NO: 8)
NLS-linker-MCP-linker-NLS-linker-LexA-iinker-FHA:
N terminus-
MPPKKKRKVGSGSASNTTQFVLVDNGGTGDVTVAPSNFANGVAEWISSNSRSQAYK
VTCSVRQSSAQKRKYTIKVEVPKVATQTVGGVELPVAAWRSYLNMELTIPIFATNSD
CELIVKAMQGLLKDGNPIPSAIAANSGIYSAGGGGSGGGGSGGGGSGPKKKRKVAA
AGSGMKALTARQQEVFDLIRDHISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARK
GVIEIVSGA8RGIRLLQEEEEGLPLVGRVAAGEPLLAQQHIEGEIYQVDPSLFKPNADF
LLRVSGMSMKDIGIMDGDLLAVHKTQDVRNGQVWARIDDEVTVKRLKKQGNKVE
LLPENSEFKPIVVDLRQQSFΉEGLAVGVIRNGDWLEFPGIRMSVTSREQKFSGKYSSY
TAQDRQGLVNAVTCVLSSSSDPVAVSSDYSNSLSIAREVNAYAKIAGCbWTYYVQK
LEVTIGRNTDSLNLNAVPGTVVKKNIDIDLGPAKIVSRKHAAIRFNLESGSWELQIFGR NGAKVNFRRIPTGPDSPPWLQSGCJIDIGGVQMIFTLPEQETITSDYCLNHLMPKLLST YGTNGNNNPLLRMIEGSTYLREQRLQEEARLQRLDHL* - C tertninus (SEQ ID NO:9)
NLS-linker-MCP-linker-NLS -iinker-FHA:
N terminus-
MPPKKKRKVGSGSASNFTQFVLVDNGGTCDVTVAPSNFANGVAEWISSNSRSQAYK
VTCSVRQSSAQKRKYTIKVEVPKVATQTVGGVELPVAAWRSYLNMELTIPIFATNSD
CELIVKAMQGLLKDGNPIPSAIAANSGIYSAGGGGSGGGGSGGGGSGPKKKRKVAA
AGSGIRMSVTSREQKFSGKYSSYTAQDRQGLVNAVTCVLSSSSDPVAVSSDYSNSLSI
AREVNAYAKIAGCDWTYYVQKLEVTIGRNTDSLNLNAVPGTWKKNIDIDLGPAKI
VSRKHAAIRFNLESGSWELQIFGRNGAKVNFRRIPTGPDSPPTVLQSGCnDIGGYQMI
FILPEQETOSDYCLNHLMPKLLSTYGTNGNNNPLLRNIEGSTYLREQRLQEEARLQR
LDHL*- C terminus (SEQ ID NO: 10)

Claims

WHAT IS CLAIMED IS:
1. A nucleic acid encoding a retron comprising: a. one or more RNA binding domain recognition sequences or one or more single stranded nucleic acid binding domain recognition sequences; b. an msr sequence; c. an msd sequence; d. a subject expression sequence within die msd sequence; and, e. a first inverted repeat sequence and a second inverted repeat sequence.
2. The nucleic acid of claim 1, wherein the subject expression sequence comprises a donor sequence for homologous directed repair (HDR).
3. The nucleic acid of claim 1, where said RNA binding domain recognition sequence is a MS2 stem loop sequence, a Pumilio (PUF) recognition sequence, an RNA Recognition Motif (RRM) recognition sequence, a Double-Stranded RNA-Binding Domain (dsRBD) recognition sequence, a Zinc finger (ZF) Domain recognition sequence, aG~ quadrupl ex-forming sequence, a Z-alpha, arginine/glycine rich (RGG) domain recognition sequence, or a K Homology (KH) Domain recognition sequence.
4. The nucleic acid of claim 1 , wherein single stranded nucleic acid binding domain recognition sequence is a sequence recognized by the single stranded nucleic acid binding domain of a CRISPR associated endonuclease, POTI, TEPB, CspB, a K homology (KH) domain, a far upstream element (FUSE)-binding protein (FBP), a poly(C)-binding protein, a G-quadraplex binding domain including uucleolin, hiiRNP, serine/arginine-rich splicing factors (SRSF) 1 and 9, splicing factor U2AF, TRF2, FRM2, and the RNA helicase associated with AU-ricb element (RHAIJ) proteins, an FBP-interacting repressor (FIR), hnRNP A!, hnRNP D, or a whirly domain.
5. A chimeric construct comprising an RNA hybridized to a DNA, wherein the RNA comprises one or more RNA binding domain recognition sequences and an msr sequence; and wherein the DNA comprises an msd sequence and a subject expression sequence within the msd sequence.
6. The chimeric construct of claim 5, wherein the subject expression sequence comprises a donor sequence for homology-directed repair (HDR).
7. A polypeptide comprising an RNA binding domain or single stranded nucleic acid binding domain bound to a DNA break site localizing domain.
8. The polypeptide of claim 7, wherein the RN A binding domain is an RNA binding domain of a polypeptide that binds to a MS2 stem loop sequence, a Pumilio (PDF) recognition sequence, an RNA Recognition Motif (RRM) recognition sequence, a Double-Stranded RNA-Binding Domain (dsRBD) recognition sequence, a Zinc finger (ZF) Domain recognition sequence, , a G-quadmplex-forming sequence, a Z-alpha, arginine/giyeine rich (RGG) domain recognition sequence, or a K Homology (KH) Domain recognition sequence.
9. The polypeptide of claim 7, wherein single stranded nucleic acid binding domain is a single stranded nucleic acid binding domain of a CRISPR associated endonuclease, POTi, TEPB, CspB, a K homology (KH) domain, a far upstream element (FUSE)-binding protein (FBP), a poly (C) -binding protein, a G-quadruplex binding domain including nucleolin, hnRNP, serine/arginine-rich splicing factors (SRSF) 1 and 9, splicing factor U2AF, TRF2, FRM2, and the RNA helicase associated with AU-rich element (RHAU) proteins, an FBP- interacting repressor (FIR), hnRNP A i, hnRNP D, or a whirly domain.
10. The polypeptide of claim 7, wherein die DMA break site localizing domain is a DNA break site localizing domain of a polypeptide listed in any of Tables 1 to 5.
11. The polypeptide of claim 7, wherein the RNA binding domain comprises an RNA binding domain of MS2 coat protein (MCP) and the DNA break site localizing domain comprises a forkhead-associated (FHA) domain.
12. The polypeptide of claim 11 , further comprising a LexA domain located between the RNA binding domain of MCP and the FHA domain.
13. A nucleic acid encoding the polypeptide of claim 7.
14. The chimeric construct of claim 5 non-covalently hound to the polypeptide of claim 7.
15. A method of editing DNA in a cell, comprising contacting the cell with the nucleic acid of claim 1 and the polypeptide of claim 7 or the nucleic acid of claim 13, a reverse transcriptase or a nucleic acid encoding the same, and a sequence specific endonuclease or a nucleic acid encoding the same, thereby editing the DNA of the cell .
16. The method of claim 15, wherein the sequence specific endonuclease is a CRISPR associated (Cas) nuclease, a Zinc -finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease.
17. The method of claim 15, wherein the method comprises contacting the ceil with one or more a guide RNAs (gRNAs), or one or more nucleic acids encoding the same.
18. Tire method of claim 16, wherein the Cas nuclease is selected from the goup consisting of Cast, Cas IB, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cash, Cas6e, Cas6f, Cas7, CasSal, Cas8a2, Cas8b, CasBe, Cas9 (Csnl or Csxl2), SpCas9, FokI-dCas9, Cas 10, CaslOd, Casl2a/Cpfl, Mad7™, CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Csnr.5, Cmr6, Csbl, Csb2, Csb3. Csxl7, Csxl4, CsxlO, Csxi 6, CsaX, Csx3, Csxl, Csxl5, Csfl, Csf2, Csf3, Csf4, C2el and Cul966, and homologs or modified versions thereof. .
19. A method of treating a genetic disease in a subject in need comprising administering to the subject the nucleic acid of claim 1 and the polypeptide of claim 7 or the nucleic acid of claim 13, a reverse transcriptase or a nucleic acid encoding the same, and a sequence specific endonuclease or a nucleic acid encoding the same, thereby editing the DNA.
20. Tire method of claim 19, wherein the sequence specific endonuclease is a Cas nuclease, a Zinc -finger nuclease, a Transcription activator-like effector nuclease (TALEN), or a meganuclease.
21. The method of claim 19, wherein the method comprises administering to the subject one or more a guide RNAs (gRNAs), or one or more nucleic acids encoding the same.
2.2. The method of claim 2.0, wherein the Cas nuclease is selected from the goup consisting of Cas t, Cas IB, Cas2, Cas3, Cas4, Cas5, Cas5e (CasD), Cash, Cas6e, Cashf, Cas7, CasBal, Cas8a2, Cas8b, Cas8e, Cas9 (Csnl or Csxl2), SpCas9, FokI-dCas9Cas 10, CaslOd, Casl2a/Cpfl, Mad7™, CasF, CasG, CasH, Csyl, Csy2, Csy3, Csel (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Crnr6, Csbl. Csb2, Csb3, CsxI7, Csxl4, CsxlO, Csxl6, CsaX, Csx3, Csxl, CsxlS, Csfl, Csf2, Csf3, Csf4, C2cl and Cul966, and homologs or modified versions thereof.
PCT/US2022/073130 2021-06-23 2022-06-23 Compositions and methods for efficient retron recruitment to dna breaks WO2022272294A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163214196P 2021-06-23 2021-06-23
US63/214,196 2021-06-23

Publications (1)

Publication Number Publication Date
WO2022272294A1 true WO2022272294A1 (en) 2022-12-29

Family

ID=84545857

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/073130 WO2022272294A1 (en) 2021-06-23 2022-06-23 Compositions and methods for efficient retron recruitment to dna breaks

Country Status (1)

Country Link
WO (1) WO2022272294A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016148994A1 (en) * 2015-03-13 2016-09-22 The Jackson Laboratory A three-component crispr/cas complex system and uses thereof
WO2018172556A1 (en) * 2017-03-24 2018-09-27 Curevac Ag Nucleic acids encoding crispr-associated proteins and uses thereof
WO2019055878A2 (en) * 2017-09-15 2019-03-21 The Board Of Trustees Of The Leland Stanford Junior University Multiplex production and barcoding of genetically engineered cells
US20190330619A1 (en) * 2016-09-09 2019-10-31 The Board Of Trustees Of The Leland Stanford Junior University High-throughput precision genome editing
US20210010006A1 (en) * 2019-07-08 2021-01-14 Inscripta, Inc. Increased nucleic acid-guided cell editing via a lexa-rad51 fusion protein

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016148994A1 (en) * 2015-03-13 2016-09-22 The Jackson Laboratory A three-component crispr/cas complex system and uses thereof
US20190330619A1 (en) * 2016-09-09 2019-10-31 The Board Of Trustees Of The Leland Stanford Junior University High-throughput precision genome editing
WO2018172556A1 (en) * 2017-03-24 2018-09-27 Curevac Ag Nucleic acids encoding crispr-associated proteins and uses thereof
WO2019055878A2 (en) * 2017-09-15 2019-03-21 The Board Of Trustees Of The Leland Stanford Junior University Multiplex production and barcoding of genetically engineered cells
US20210010006A1 (en) * 2019-07-08 2021-01-14 Inscripta, Inc. Increased nucleic acid-guided cell editing via a lexa-rad51 fusion protein

Similar Documents

Publication Publication Date Title
JP6737974B1 (en) Nuclease-mediated DNA assembly
US10526590B2 (en) Compounds and methods for CRISPR/Cas-based genome editing by homologous recombination
US20230091847A1 (en) Compositions and methods for improving homogeneity of dna generated using a crispr/cas9 cleavage system
CN107208078B (en) Methods and compositions for targeted genetic modification using paired guide RNAs
US20230125704A1 (en) Modified bacterial retroelement with enhanced dna production
US20210047375A1 (en) Lentiviral-based vectors and related systems and methods for eukaryotic gene editing
KR20220019794A (en) Targeted gene editing constructs and methods of use thereof
US20220389415A1 (en) Production and tracking of engineered cells with combinatorial genetic modifications
WO2022272293A1 (en) Compositions and methods for efficient retron production and genetic editing
US20240110163A1 (en) Crispr-associated based-editing of the complementary strand
WO2022272294A1 (en) Compositions and methods for efficient retron recruitment to dna breaks
WO2024044767A2 (en) Recruitment of donor dna from in vivo assembled plasmids for saturation genome editing
WO2019060631A1 (en) Expression systems that facilitate nucleic acid delivery and methods of use
KR20240155953A (en) Compositions, systems and methods for eukaryotic gene editing
WO2023086953A1 (en) Compositions and methods for the treatment of hereditary angioedema (hae)
CN116685684A (en) Compositions and methods for treating type 1a glycogen storage disease
WO2024173573A1 (en) Crispr-transposon systems and components
WO2024052681A1 (en) Rett syndrome therapy
WO2023154011A2 (en) Compositions and methods for genome editing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22829520

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22829520

Country of ref document: EP

Kind code of ref document: A1