WO2021178709A1 - Methods and compositions for modulating a genome - Google Patents
Methods and compositions for modulating a genome Download PDFInfo
- Publication number
- WO2021178709A1 WO2021178709A1 PCT/US2021/020933 US2021020933W WO2021178709A1 WO 2021178709 A1 WO2021178709 A1 WO 2021178709A1 US 2021020933 W US2021020933 W US 2021020933W WO 2021178709 A1 WO2021178709 A1 WO 2021178709A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- polypeptide
- domain
- dna
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases [RNase]; Deoxyribonucleases [DNase]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/102—Mutagenizing nucleic acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/10—Transferases (2.)
- C12N9/12—Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
- C12N9/1241—Nucleotidyltransferases (2.7.7)
- C12N9/1276—RNA-directed DNA polymerase (2.7.7.49), i.e. reverse transcriptase or telomerase
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
- C07K2319/80—Fusion polypeptide containing a DNA binding domain, e.g. Lacl or Tet-repressor
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/113—Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
Definitions
- compositions, systems and methods for altering a genome at one or more locations in a host cell, tissue or subject, in vivo or in vitro are novel compositions, systems and methods for the introduction of exogenous genetic elements into a host genome.
- compositions or methods can include one or more of the following enumerated embodiments. 1.
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain; and (b) a template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds the polypeptide and (ii) a heterologous object sequence, wherein the polypeptide comprises a mutation inactivating and/or deleting a nucleolar localization signal.
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a first target DNA binding domain, e.g., comprising a first Zn finger domain, (ii) a reverse transcriptase domain, (iii) an endonuclease domain, and (iv) a second target DNA binding domain, e.g., comprising a second Zn finger domain, heterologous to the first target DNA binding domain; and optionally (b) a template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds the polypeptide and (ii) a heterologous object sequence, wherein (a) binds to a smaller number of target DNA sequences in a target cell than a similar polypeptide that comprises only the first target DNA binding domain, e.g., wherein the presence of the second target DNA binding domain in the polypeptide with the first DNA binding domain
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain; and optionally, (b) a template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds the polypeptide and (ii) a heterologous object sequence, wherein the system is capable of cutting the first strand of the target DNA at least twice (e.g., twice), and optionally wherein the cuts are at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, or 200 nucleotides away one another (and optionally no more than 500, 400, 300, 200, or 100 nucleotides away from one another).
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain; and optionally, (b) a template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds the polypeptide and (ii) a heterologous object sequence, wherein the system is capable of cutting the first strand and the second strand of the target DNA, and wherein the distance between the cuts is the same as the distance between cuts made by the endonuclease domain, e.g., the endonuclease domain of a naturally occurring retrotransposase.
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain; and (b) a template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds the polypeptide and (ii) a heterologous object sequence, wherein (a), (b), or (a) and (b) further comprises a 5’ UTR and/or 3’ UTR operably linked to the sequence encoding the polypeptide, the heterologous object sequence (e.g., a coding sequence contained in the heterologous object sequence), or both. 7.
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding the polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase (RT) domain, (ii) a DNA-binding domain (DBD); and (iii) an endonuclease domain, e.g., a nickase domain; and (b) a template RNA (or DNA encoding the template RNA) comprising (e.g., from 5’ to 3’) (i) optionally a sequence that binds a target site (e.g., a second strand of a site in a target genome), (ii) optionally a sequence that binds the polypeptide, (iii) a heterologous object sequence, and (iv) a 3’ target homology domain; wherein: (i) the polypeptide comprises a heterologous targeting domain (e.g., in the DBD or the endonuclease
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain; and (b) a template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds the polypeptide, (ii) a heterologous object sequence, and (iii) a ribozyme that is heterologous to (a)(i), (a)(ii), (b)(i), or a combination thereof. 10. The system of embodiment 9, wherein the ribozyme is heterologous to (b)(i). 11.
- the template RNA comprises (iv) a second ribozyme, e.g., that is endogenous to (a)(i), (a)(ii), (b)(i), or a combination thereof, e.g., wherein the second ribozyme is endogenous to (b)(i).
- a second ribozyme e.g., that is endogenous to (a)(i), (a)(ii), (b)(i), or a combination thereof, e.g., wherein the second ribozyme is endogenous to (b)(i).
- a system for modifying DNA comprising: optionally (a) a polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain; and (b) a template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds the polypeptide, (ii) a heterologous object sequence, (iii) a 5’ UTR capable of being cleaved into a fragment and a cleaved template RNA, wherein the 5’ UTR is optionally the sequence that binds the polypeptide, wherein the 5’ UTR comprises one or more mutations (e.g., relative to a wildtype 5’ UTR, e.g., described herein) which increase the affinity of the fragment for the cleaved template RNA, e.g., such that the fragment hybridizes to the cleaved template RNA
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain; and (b) a template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds the polypeptide and (ii) a heterologous object sequence, wherein (a), (b), or (a) and (b) comprise an intron that increases the expression of the polypeptide, the heterologous object sequence (e.g., a coding sequence situated in the heterologous object sequence), or both.
- a method of modifying a target DNA strand in a cell, tissue or subject comprising administering a system to a cell, wherein the system comprises: (a) a polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain; and (b) a template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds the polypeptide and (ii) a heterologous object sequence, wherein the system reverse transcribes the template RNA sequence into the target DNA strand, thereby modifying the target DNA strand, and wherein the cell has decreased Rad51 repair pathway activity, decreased expression of Rad51 or a component of the Rad51 repair pathway, or does not comprise a functional Rad51 repair pathway, e.g., does not comprise a functional Rad51 gene, e.g., comprises a mutation (e.g., deletion) inactivating one
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain; and (b) a template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds the polypeptide and (ii) a heterologous object sequence, wherein the heterologous object sequence comprises a sequence, e.g., a gene or fragment thereof, of any of Tables 10A-10D or 11A-11G. 18.
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain, wherein the polypeptide is modified for enhanced activity or altered specificity; and (b) a template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds the polypeptide and (ii) a heterologous object sequence. 19.
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain; and (b) a template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds the polypeptide and (ii) a heterologous object sequence, wherein the template RNA comprises one or more chemical modification selected from dihydrouridine, inosine, 7- methylguanosine, 5-methylcytidine (5mC), 5′ Phosphate ribothymidine, 2′-O-methyl ribothymidine, 2′-O-ethyl ribothymidine, 2′-fluoro ribothymidine, C-5 propynyl-deoxycytidine (pdC), C-5 propynyl-deoxyuridine (pdU),
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a target DNA binding domain, (ii) a reverse transcriptase domain, optionally (iii) an endonuclease domain, wherein the polypeptide comprises a heterologous linker replacing a portion of (i), (ii), or (iii), or replacing an endogenous linker connecting two of (i), (ii), or (iii); and optionally (b) a template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds the polypeptide and (ii) a heterologous object sequence. 21.
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain; and (b) a template RNA (or DNA encoding the template RNA) comprising (i) a sequence that binds the polypeptide, (ii) a heterologous object sequence, (iii) a first homology domain having at least 5 or at least 10 bases of 100% identity to a target DNA strand, at the 5’ end of the template RNA, and (iv) a second homology domain having at least 5 or at least 10 bases of 100% identity to a target DNA strand, at the 3’ end of the template RNA.
- polypeptide comprises a mutation inactivating and/or deleting a nucleolar localization signal.
- activity of the nucleolar localization signal is reduced by at least 50%, 60%, 70%, 80%, 90%, 95%, or 99%.
- polypeptide comprises a nuclear localization signal (NLS), e.g., an endogenous NLS or an exogenous NLS. 25.
- NLS nuclear localization signal
- the polypeptide of (a) comprises a target DNA binding domain (e.g., the endonuclease domain comprises a target DNA binding domain), e.g., a first target DNA binding domain, or (a) further comprises a target DNA binding domain, e.g., a first target binding domain.
- the polypeptide of (a) further comprises a second target DNA binding domain, e.g., a Zn finger domain, that is heterologous, e.g., to the first target DNA binding domain or to the endonuclease domain.
- the endonuclease domain comprises the second target DNA binding domain.
- polypeptide of (a) binds to a smaller number of target DNA sequences than a similar polypeptide that comprises only the first target DNA binding domain or the second target DNA binding domain, e.g., wherein the presence of the second target DNA binding domain in the polypeptide with the first target DNA binding domain refines the target sequence specificity of the polypeptide relative to the polypeptide target sequence specificity of the polypeptide comprising only the first target DNA binding domain.
- the second target DNA binding domain binds to a genomic DNA sequence that is less than 100, 90, 80, 70, 60, 50, 40, 30, 20, 15, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 nucleotides away from a genomic sequence to which the first target DNA binding domain binds. 34.
- the second target DNA binding domain binds to a genomic DNA sequence that is 1-100, 1-90, 1-80, 1-70, 1-60, 1-50, 1-40, 1-30, 1-20, 1-10, 1-5, 5-100, 5-90, 5-80, 5-70, 5-60, 5-50, 5-40, 5-30, 5-20, 5-10, 10-100, 10-90, 10- 80, 10-70, 10-60, 10-50, 10-40, 10-30, 10-20, 20-100, 20-90, 20-80, 20-70, 20-60, 20-50, 20-40, 20-30, 30-100, 30-90, 30-80, 30-70, 30-60, 30-50, 30-40, 40-100, 40-90, 40-80, 40-70, 40-60, 40-50, 50-100, 50-90, 50-80, 50-70, 50-60, 60-100, 60-90, 60-80, 60-70, 70-100, 70-90, 70-80, 80-100, 80-90, or 90-100 nucleotides away from
- the first or second target DNA binding domain comprises a CRISPR/Cas protein, a TAL Effector domain, a Zn finger domain, or a meganuclease domain.
- the first target DNA binding domain comprises a CRISPR/Cas protein and the second target DNA binding domain comprises a TAL effector domain.
- the first target DNA binding domain comprises a CRISPR/Cas protein and the second target DNA binding domain comprises a Zn finger domain. 38.
- the first target DNA binding domain comprises a CRISPR/Cas protein and the second target DNA binding domain comprises a CRISPR/Cas protein.
- the first target DNA binding domain comprises a CRISPR/Cas protein and the second target DNA binding domain comprises a meganuclease domain.
- the first target DNA binding domain comprises a TAL effector domain and the second target DNA binding domain comprises a Zn finger domain.
- the first target DNA binding domain comprises a TAL effector domain and the second target DNA binding domain comprises a TAL effector domain. 42.
- the first target DNA binding domain comprises a TAL effector domain and the second target DNA binding domain comprises a meganuclease domain. 43. The system of any preceding embodiments, wherein the first target DNA binding domain comprises a Zn finger domain and the second target DNA binding domain comprises a Zn finger domain. 44. The system of any preceding embodiments, wherein the first target DNA binding domain comprises a Zn finger domain and the second target DNA binding domain comprises a meganuclease domain. 45. The system of any preceding embodiments, wherein the second DNA binding domain binds to a sequence in a genomic safe harbor (GSH) site or a genomic Natural Harbor TM site. 46.
- GSH genomic safe harbor
- the system is capable of cutting the first strand of the target DNA and the second strand of the target DNA, e.g., wherein the cuts are at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, or 200 nucleotides away from one another (and optionally no more than 500, 400, 300, 200, or 100 nucleotides away from one another). 47.
- the system is capable of cutting the first strand of the target DNA at least twice (e.g., twice), e.g., wherein the cuts are at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, or 200 nucleotides away from one another (and optionally no more than 500, 400, 300, 200, or 100 nucleotides away from one another).
- the cuts are 1-500, 1-400, 1-300, 1- 200, 1-100, 1-90, 1-80, 1-70, 1-60, 1-50, 1-40, 1-30, 1-20, 1-10, 1-5, 5-500, 5-400, 5-300, 5-200, 5-100, 5-90, 5-80, 5-70, 5-60, 5-50, 5-40, 5-30, 5-20, 5-10, 10-500, 10-400, 10-300, 10-200, 10- 100, 10-90, 10-80, 10-70, 10-60, 10-50, 10-40, 10-30, 10-20, 20-500, 20-400, 20-300, 20-200, 20-100, 20-90, 20-80, 20-70, 20-60, 20-50, 20-40, 20-30, 30-500, 30-400, 30-300, 30-200, 30- 100, 30-90, 30-80, 30-70, 30-80, 30-70, 30-80, 30-70, 30-80, 30-70, 30-80, 30-70, 30-80, 30-70, 30-80, 30-70, 30-80, 30-70, 30-80, 30-70, 30-80, 30-70, 30-80,
- the distance between the cuts is the same as the distance between cuts made by the endonuclease domain, e.g., the endonuclease domain of a naturally occurring retrotransposase.
- the two cuts are both made by the same endonuclease domain (e.g., a CRISPR/Cas protein, e.g., directed by a plurality of gRNAs, e.g., disposed in the template RNA).
- the polypeptide further comprises a second endonuclease domain.
- the first endonuclease domain e.g., nickase
- the second endonuclease domain e.g., nickase
- the first endonuclease domain makes one of the two cuts to the to-be- edited strand of the target DNA
- the second endonuclease domain e.g., nickase
- the 5’ UTR comprises a 5’ UTR from complement factor 3 (C3) or a functional fragment or variant thereof.
- C3 complement factor 3
- the 3’ UTR comprises a 3’ UTR from orosomucoid 1 (ORM1) or a functional fragment or variant thereof.
- the 5’ UTR increases the rate of translation, e.g., relative to an otherwise similar nucleic acid comprising the endogenous UTR(s) associated with the heterologous object sequence or a minimal 5’ UTR and a minimal 3’ UTR
- the 3’ UTR increases nucleic acid half-life, e.g., relative to an otherwise similar nucleic acid comprising the endogenous UTR(s) associated with the heterologous object sequence or a minimal 5’ UTR and a minimal 3’ UTR, or iii) both i) and ii).
- the template RNA comprises a ribozyme that is heterologous to (a)(i), (a)(ii), (b)(i), or a combination thereof.
- the heterologous ribozyme replaced a ribozyme endogenous to (a)(i), (a)(ii), (b)(i), or a combination thereof.
- the template RNA comprises a second ribozyme, e.g., that is endogenous to (a)(i), (a)(ii), (b)(i), or a combination thereof.
- heterologous ribozyme is situated in a 5’ UTR or 3’ UTR of the template RNA.
- the heterologous ribozyme is 5’ of the heterologous object sequence or 3’ of the heterologous object sequence.
- the heterologous ribozyme is capable of cleaving RNA comprising the ribozyme, e.g., 5’ of the ribozyme, 3’ of the ribozyme, or within the ribozyme. 64.
- heterologous ribozyme is 5’ of the heterologous object sequence and cleaves 3’ of the heterologous ribozyme, e.g., wherein the heterologous ribozyme is a synthetic or naturally occurring hammerhead ribozyme.
- heterologous ribozyme is 3’ of the heterologous object sequence and cleaves 5’ of the heterologous ribozyme, e.g., wherein the heterologous ribozyme is chosen from an HDV family ribozyme or a hatchet ribozyme.
- the template RNA further comprises a ribozyme-hybridizing region, e.g., a template with altered targeting, such as through a homology arm, comprises a modified 5’ UTR comprising the ribozyme-hybridizing region.
- a portion of the ribozyme hybridizes (e.g. via Watson-crick basepairing) to sequence 5’ or 3’ of the ribozyme.
- the ribozyme sequence is altered from its natural sequence by at least 1, 2, 3, 4, 5, 6, 8, 9, 10, 15, 20, 25 or more basepairs. 69.
- the ribozyme sequence is altered from its natural sequence in order to hybridize to a homology arm that is 5’ or 3’ of the target ribozyme 70.
- the system integrates a heterologous object sequence into a target genome with a greater efficiency than an otherwise similar system lacking the heterologous ribozyme, e.g., wherein at least 10%, 20%, 30%, 405, 50%, 60%, 70%, 80%, 90%, or 100% more cells show integration in the presence of the system comprising the heterologous ribozyme compared to the system lacking the heterologous ribozyme. 71.
- the template RNA comprises a 5’ UTR capable of being cleaved into a fragment and a cleaved template RNA.
- the template RNA comprises a ribozyme which cleaves the template RNA, e.g., in the 5’ UTR.
- the 5’ UTR comprises one or more mutations (e.g., relative to a wildtype 5’ UTR described herein, e.g., in Tables 1 or 3, or from a protein domain listed in Table 2). 74.
- the one or more mutations increase the affinity of the fragment for the cleaved template RNA, e.g., such that the fragment hybridizes to the cleaved template RNA (e.g., the 5’ UTR of the cleaved template RNA) under stringent conditions, e.g., wherein the stringent conditions for hybridization includes hybridization in 4x sodium chloride/sodium citrate (SSC), at about 65°C, followed by a wash in 1xSSC, at about 65°C. 76.
- SSC sodium chloride/sodium citrate
- the intron is situated in a coding sequence of the heterologous object sequence. 80. The system of any preceding embodiments, wherein the intron is situated in the forward direction in relation to the coding sequence of the heterologous object sequence. 81. The system of any preceding embodiments, wherein the intron is situated in the reverse direction in relation to the coding sequence of the heterologous object sequence. 82. The system of any preceding embodiment, wherein the intron is spliced after transcription of the template RNA and before target primed reverse transcription into target, e.g., genomic, DNA. 83.
- the intron is spliced after transcription of the heterologous object sequence after the heterologous object sequence is integrated in the target, e.g., genomic, DNA.
- the intron comprises a microRNA binding site.
- the enonuclease domain e.g., an endonuclease domain of R2Tg or R2-1_ZA
- a motif e.g., GG or AAGG, TAAGGT, or TTAAGGTAGC
- the heterologous DNA binding domain recognizes a genomic DNA sequence, wherein the motif and the genomic DNA sequence are within 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-100, 100-150, 150-200, or 200-250 nucleotides of each other, optionally wherein the motif recognized by the endonuclease domain comprises 4, 5, 6, 7, 8, 9, or 10 consecutive nucleotides of TTAAGGTAGC, AAGGTAGCCAAA, or TAAGGTAGCCAAA, or wherein the motif recognized by the endonuclease domain comprises 2 or 3 consecutive nucleotides of AAGG.
- the motif is upstream of the genomic DNA sequence, e.g., the motif is about 30-80, 40-70, 50-60, or 55 nt upstream of the genomic DNA sequence.
- the motif is downstream of the genomic DNA sequence, e.g., the motif is about 10-30, 15-25, or 20 nt downtream of the genomic DNA sequence.
- the motif is in the same orientation as the genomic DNA sequence or in the reverse complement orientation as the genomic DNA sequence. 89.
- heterologous DNA binding domain e.g., a zinc finger domain
- a linker e.g., a linker of Table 38
- the system comprises one or more circular RNA molecules (circRNAs).
- the circRNA encodes the Gene Writer polypeptide.
- the circRNA comprises a template RNA.
- circRNA is delivered to a host cell.
- the circRNA is capable of being linearized, e.g., in a host cell, e.g., in the nucleus of the host cell.
- the circRNA comprises a cleavage site.
- the circRNA further comprises a second cleavage site. 98.
- cleavage site can be cleaved by a ribozyme, e.g., a ribozyme comprised in the circRNA (e.g., by autocleavage).
- a ribozyme e.g., a ribozyme comprised in the circRNA (e.g., by autocleavage).
- the circRNA comprises a ribozyme sequence.
- the ribozyme sequence is capable of autocleavage, e.g., in a host cell, e.g., in the nucleus of the host cell.
- the ribozyme is an inducible ribozyme.
- ribozyme is a protein-responsive ribozyme, e.g., a ribozyme responsive to a nuclear protein, e.g., a genome-interacting protein, e.g., an epigenetic modifier, e.g., EZH2. 103.
- ribozyme is a nucleic acid- responsive ribozyme. 104.
- a target nucleic acid molecule e.g., an RNA molecule, e.g., an mRNA, miRNA, ncRNA, lncRNA, tRNA, snRNA, or mtRNA.
- a target nucleic acid molecule e.g., an RNA molecule, e.g., an mRNA, miRNA, ncRNA, lncRNA, tRNA, snRNA, or mtRNA.
- the ribozyme is responsive to a target protein (e.g., an MS2 coat protein).
- the target protein localized to the cytoplasm or localized to the nucleus (e.g., an epigenetic modifier or a transcription factor).
- ribozyme comprises the ribozyme sequence of a B2 or ALU retrotransposon, or a nucleic acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity thereto.
- the ribozyme comprises the sequence of a tobacco ringspot virus hammerhead ribozyme, or a nucleic acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity thereto.
- the ribozyme comprises the sequence of a hepatitis delta virus (HDV) ribozyme, or a nucleic acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity thereto.
- HDV hepatitis delta virus
- the ribozyme is activated by a moiety expressed in a target cell or target tissue.
- the ribozyme is activated by a moiety expressed in a target subcellular compartment (e.g., a nucleus, nucleolus, cytoplasm, or mitochondria).
- ribozyme is comprised in a circular RNA or a linear RNA.
- a system comprising a first circular RNA encoding the polypeptide of a Gene Writing system; and a second circular RNA comprising the template RNA of a Gene Writing system.
- the template RNA e.g., the 5’ UTR
- the template RNA comprises a ribozyme that is heterologous to (a)(i) (the a reverse transcriptase domain), (a)(ii) (the endonuclease domain), (b)(i) (a sequence of the template RNA that binds the polypeptide), or a combination thereof.
- the heterologous ribozyme is capable of cleaving RNA comprising the ribozyme, e.g., 5’ of the ribozyme, 3’ of the ribozyme, or within the ribozyme.
- a lipid nanoparticle comprising the system, polypeptide (or RNA encoding the same), nucleic acid molecule, or DNA encoding the system or polypeptide, of any preceding embodiment.
- LNP lipid nanoparticle
- a system comprising a first lipid nanoparticle comprising the polypeptide (or DNA or RNA encoding the same) of a Gene Writing system (e.g., as described herein); and a second lipid nanoparticle comprising a nucleic acid molecule of a Gene Writing System (e.g., as described herein).
- LNP lipid nanoparticle
- the LNP of any any preceding embodiments further comprising one or more neutral lipid, e.g., DSPC, DPPC, DMPC, DOPC, POPC, DOPE, SM, a steroid, e.g., cholesterol, and/or one or more polymer conjugated lipid, e.g., a pegylated lipid, e.g., PEG-DAG, PEG-PE, PEG-S- DAG, PEG-cer or a PEG dialkyoxypropylcarbamate.
- neutral lipid e.g., DSPC, DPPC, DMPC, DOPC, POPC, DOPE, SM
- a steroid e.g., cholesterol
- polymer conjugated lipid e.g., a pegylated lipid, e.g., PEG-DAG, PEG-PE, PEG-S- DAG, PEG-cer or a PEG dialkyoxypropylcarbamate.
- lipid nanoparticle LNP
- lipid nanoparticle or a formulation comprising a plurality of the lipid nanoparticles
- reactive impurities e.g., aldehydes
- a preselected level of reactive impurities e.g., aldehydes
- lipid nanoparticle or a formulation comprising a plurality of the lipid nanoparticles
- the lipid nanoparticle is comprised in a formulation comprising a plurality of the lipid nanoparticles.
- lipid nanoparticle formulation is produced using one or more lipid reagents comprising less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% total reactive impurity (e.g., aldehyde) content. 128.
- lipid nanoparticle formulation is produced using one or more lipid reagents comprising less than 3% total reactive impurity (e.g., aldehyde) content. 128.
- lipid nanoparticle formulation is produced using one or more lipid reagents comprising less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species.
- lipid reagents comprising less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species.
- lipid nanoparticle formulation is produced using one or more lipid reagents comprising less than 0.1% of any single reactive impurity (e.g., aldehyde) species.
- lipid nanoparticle formulation comprises less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% total reactive impurity (e.g., aldehyde) content.
- lipid nanoparticle formulation comprises less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% total reactive impurity (e.g., aldehyde) content.
- lipid nanoparticle formulation comprises less than 3% total reactive impurity (e.g., aldehyde) content.
- lipid nanoparticle formulation comprises less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species.
- lipid nanoparticle formulation comprises less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species.
- lipid nanoparticle formulation comprises less than 0.3% of any single reactive impurity (e.g., aldehyde) species. 135.
- lipid nanoparticle formulation comprises less than 0.1% of any single reactive impurity (e.g., aldehyde) species. 136.
- lipid reagents used for a lipid nanoparticle as described herein or a formulation thereof comprise less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% total reactive impurity (e.g., aldehyde) content. 137.
- lipid reagents used for a lipid nanoparticle as described herein or a formulation thereof comprise less than 3% total reactive impurity (e.g., aldehyde) content. 138.
- lipid reagents used for a lipid nanoparticle as described herein or a formulation thereof comprise less than 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.4%, 0.3%, 0.2%, or 0.1% of any single reactive impurity (e.g., aldehyde) species. 139.
- any single reactive impurity e.g., aldehyde
- LC liquid chromatography
- MS/MS tandem mass spectrometry
- the total aldehyde content and/or quantity of reactive impurity (e.g., aldehyde) species is determined by detecting one or more chemical modifications of a nucleic acid molecule (e.g., as described herein) associated with the presence of reactive impurities (e.g., aldehydes), e.g., in the lipid reagents. 143.
- reactive impurities e.g., aldehydes
- nucleotide or nucleoside e.g., a ribonucleotide or ribonucleoside, e.g., comprised in or isolated from a nucleic acid molecule, e.g., as described herein
- reactive impurities e.g., aldehydes
- lipid reagents e.g., as described in Example 27. 144.
- a method of modifying a target DNA strand in a cell, tissue or subject comprising administering any preceding numbered system to the cell, tissue or subject, wherein the system reverse transcribes the template RNA sequence into the target DNA strand, thereby modifying the target DNA strand, and wherein the cell has decreased Rad51 repair pathway activity, decreased expression of Rad51 or a component of the Rad51 repair pathway, or does not comprise a functional Rad51 repair pathway, e.g., does not comprise a functional Rad51 gene, e.g., comprises a mutation (e.g., deletion) inactivating one or both copies of the Rad51 gene or another gene in the Rad51 repair pathway.
- a mutation e.g., deletion
- a host cell e.g., a mammalian cell, e.g., a human cell
- the host cell has decreased Rad51 repair pathway activity, decreased expression of Rad51 or a component of the Rad51 repair pathway, or does not comprise a functional Rad51 repair pathway, e.g., does not comprise a functional Rad51 gene, e.g., comprises a mutation (e.g., deletion) inactivating one or both copies of the Rad51 gene or another gene in the Rad51 repair pathway.
- polypeptide binds a promoter region, a 5’ UTR region, an exon, an intron, or a 3’ UTR region of a sequence, e.g., a gene or fragment thereof, of any of Tables 10A-10D or 11A-11G . 148.
- the polypeptide further comprises a heterologous linker replacing a portion of (i) a target DNA binding domain, (ii) a reverse transcriptase domain, optionally (iii) an endonuclease domain, or replacing an endogenous linker connecting two of (i), (ii), or (iii), wherein optionally the linker is a linker of Table 38. 149.
- the heterologous linker replaces, e.g., deletes, a portion of (i).
- the heterologous linker replaces, e.g., deletes, a portion of (ii). 151.
- heterologous linker replaces, e.g., deletes, a portion of (iii).
- the heterologous linker replaces, e.g., deletes, a portion of (i) and (ii).
- the heterologous linker replaces, e.g., deletes, a portion of (i) and (iii).
- the heterologous linker replaces, e.g., deletes, a portion of (ii) and (iii).
- heterologous linker replaces, e.g., deletes, the endogenous linker connecting (i) and (ii).
- heterologous linker replaces, e.g., deletes, the endogenous linker connecting (i) and (iii).
- heterologous linker replaces, e.g., deletes, the endogenous linker connecting (ii) and (iii).
- heterologous linker comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 1023) or GGGS (SEQ ID NO: 1024).
- the heterologous linker comprises at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 200, 300, 400, or 500 amino acids.
- the tissue is liver, lung, skin, muscle tissue (e.g., skeletal muscle), eye or ocular tissue, blood, blood cells, immune cells, or central nervous system.
- the cell is a hematopoietic stem cell (HSC), a T-cell, a B cell, or a Natural Killer (NK) cell.
- HSC hematopoietic stem cell
- NK Natural Killer
- heterologous object sequence comprises an open reading frame in a 5’ to 3’ orientation on the template RNA. 170. The system of any preceding embodiments, wherein the heterologous object sequence comprises an open reading frame in a 3’ to 5’ orientation on the template RNA. 171. The system of any of the preceding embodiments, wherein the sequence that binds the polypeptide is a 3’ untranslated sequence. 172. The system of any preceding embodiments, wherein the template RNA further comprises a 5’ untranslated sequence. 173.
- the template RNA further comprises a promoter operably linked to the heterologous object sequence
- the heterologous object sequence can, in some embodiment, comprise a promoter operably linked to a sequence, such as a protein coding sequence.
- the promoter is disposed between the 5’ untranslated sequence and the heterologous object sequence. 175.
- the promoter is disposed between the 3’ untranslated sequence that binds the polypeptide and the heterologous object sequence. 176.
- the 5’ untranslated sequence is a sequence of column 5 of Table 3, or a sequence having at least 80% identity thereto.
- the 3’ untranslated sequence is a sequence of column 6 of Table 3, or a sequence having at least 80% identity thereto.
- the heterologous object sequence comprises an enzyme, a membrane protein, a blood factor, an intracellular protein, an extracellular protein, a structural protein, a signaling protein, a regulatory protein, a transport protein, a sensory protein, a motor protein, a defense protein, a storage protein, an immune receptor protein, (e.g.
- RNA comprises at least 5 based or at least 10 bases of 100% identity to a target DNA strand, at the 5’ end of the template RNA.
- the template RNA comprises at least 5 bases or at least 10 bases of 100% identity to a target DNA strand, at the 3’ end of the template RNA.
- a method of modifying a target DNA strand in a cell, tissue, or subject comprising administering the system of any preceding embodiments to the cell, tissue, or subject, thereby modifying the target DNA strand. 182.
- the method of any preceding embodiments which results in the addition of at least 5 base pairs of exogenous DNA sequence to the genome of the cell.
- the method of any preceding embodiments which results in the addition of at least 100 base pairs of exogenous DNA sequence to the genome of the cell.
- any preceding embodiments which results in about 50-100% of insertions of the heterologous object sequence into the target DNA being non-truncated.
- the nucleic acid of (a) is not integrated into the genome of the cell.
- the template RNA comprises at least 5 or at least 10 bases of 100% identity to the target DNA strand, at the 5’ end of the template RNA.
- the template RNA comprises at least 5 or at least 10 bases of 100% identity to the target DNA strand, at the 3’ end of the template RNA.
- the heterologous object sequence encodes a therapeutic polypeptide or that encodes a mammalian (e.g., human) polypeptide, or a fragment or variant thereof.
- the heterologous object sequence comprises a tissue specific promoter or enhancer; iii.
- the heterologous object sequence encodes a polypeptide of greater than 250, 300, 400, 500, or 1,000 amino acids, and optionally up to 7,500 amino acids; iv. the heterologous object sequence encodes a fragment of a mammalian gene but does not encode the full mammalian gene, e.g., encodes one or more exons but does not encode a full-length protein; v. the heterologous object sequence encodes one or more introns; vi. the heterologous object sequence is other than a GFP, e.g., is other than a fluorescent protein or is other than a reporter protein; or vii. the heterologous object sequence is other than a T cell chimeric antigen receptor. 191.
- polypeptide is derived from an avian retrotransposase, e.g., an avian retrotransposase of column 8 of Table 3, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. 194.
- an avian retrotransposase e.g., an avian retrotransposase of column 8 of Table 3, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. 194.
- the avian retrotransposase is a retrotransposase from Taeniopygia guttata, Geospiza fortis, Zonotrichia albicollis, or Tinamus guttatus, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.
- the polypeptide is derived from a retrotransposase of column 8 of Table 3, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.
- the template RNA comprises a sequence of Table 3 (e.g., one or both of a 5’ untranslated region of column 6 of Table 3 and a 3’ untranslated region of column 7 of Table 3), or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.
- Table 3 e.g., one or both of a 5’ untranslated region of column 6 of Table 3 and a 3’ untranslated region of column 7 of Table 3
- the nucleic acid encoding the polypeptide and the template RNA or a nucleic acid encoding the template RNA are separate nucleic acids; ii.
- the template RNA does not encode an active reverse transcriptase, e.g., comprises an inactivated mutant reverse transcriptase, e.g., as described in Examples 1-2, or does not comprise a reverse transcriptase sequence; or iii. the template RNA does not encode an active endonuclease, e.g., comprises an inactivated endonuclease or does not comprise an endonuclease; or iv. the template RNA comprises one or more chemical modifications. 198.
- the template RNA (or DNA encoding the template RNA) further comprises a promoter operably linked to the heterologous object sequence, wherein the promoter is disposed between the 5’ untranslated sequence that binds the polypeptide and the heterologous sequence, or wherein the promoter is disposed between the 3’ untranslated sequence that binds the polypeptide and the heterologous sequence.
- the template RNA (or DNA encoding the template RNA) further comprises a 5’ untranslated sequence that binds the polypeptide and a 3’ untranslated sequence that binds the polypeptide, and wherein the heterologous object sequence comprises an open reading frame (or the reverse complement thereof) in a 5’ to 3’ orientation on the template RNA; or wherein the heterologous object sequence comprises an open reading frame (or the reverse complement thereof) in a 3’ to 5’ orientation on the template RNA.
- at least one of the reverse transcriptase domain, endonuclease domain, or target DNA binding domain are heterologous.
- polypeptide comprises a sequence at least 80% identical (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identical) to a reverse transcriptase domain of a purinic/apyrimidinic endonuclease (APE)-type non-LTR retrotransposon and a sequence at least 80% identical (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identical) to an endonuclease domain of an APE-type non-LTR retrotransposon.
- APE purinic/apyrimidinic endonuclease
- polypeptide comprises a sequence at least 80% identical (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identical) to a reverse transcriptase domain of a restriction enzyme-like endonuclease (RLE)-type non-LTR retrotransposon and a sequence at least 80% identical (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identical) to an endonuclease domain of a RLE-type non-LTR retrotransposon.
- RLE restriction enzyme-like endonuclease
- the RT domain comprises a sequence selected of Table 1 or 3 or a sequence of a reverse transcriptase domain of Table 2, wherein the RT domain further comprises a number of substitutions relative to the natural sequence, e.g., at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions.
- the RT domain comprises a sequence selected of Table 1 or 3 or a sequence of a reverse transcriptase domain of Table 2, or a sequence that has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.
- the RT domain comprises a sequence selected of Table 1 or 3 or a sequence of a reverse transcriptase domain of Table 2, wherein the RT domain comprises a sequence selected of Table 1 or 3 or a sequence of a reverse transcriptase domain of Table 2, or a sequence that has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or
- the template RNA comprises a promoter operably linked to the heterologous object sequence.
- the polypeptide further comprises (iii) a DNA-binding domain.
- the polypeptide comprises a sequence at least 80% identical (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identical) to the sequence of SEQ ID NO: 1016. 208.
- the polypeptide comprises a sequence at least 80% identical (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identical) to a sequence in column 8 of Table 3. 209.
- the system or method of any of the preceding embodiments, wherein the nucleic acid encoding the polypeptide and the template RNA or the nucleic acid encoding the template RNA are covalently linked, e.g., are part of a fusion nucleic acid.
- the fusion nucleic acid comprises RNA.
- the fusion nucleic acid comprises DNA.
- RNA comprises a pseudoknot sequence, e.g., 5’ of the heterologous object sequence. 218.
- the RNA comprises a stem-loop sequence or a helix, 5’ of the pseudoknot sequence. 219.
- the RNA comprises one or more (e.g., 2, 3, or more) stem-loop sequences or helices 3’ of the pseudoknot sequence, e.g. 3’ of the pseudoknot sequence and 5’ of the heterologous object sequence. 220.
- RNA-cleaving activity e.g., cis-RNA- cleaving activity.
- the RNA comprises at least one stem-loop sequence or helix, e.g., 3’ of the heterologous object sequence, e.g.1, 2, 3, 4, 5 or more stem-loop sequences, hairpins or helices sequences. 222.
- polypeptide comprises a sequence of at least 50 amino acids (e.g., at least 100, 150, 200, 300, 500 amino acids) having at least 80% identity (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identity) to a sequence of a polypeptide listed in Table 1-3, or a reverse transcriptase domain or endonuclease domain thereof. 223.
- polypeptide comprises a sequence of at least 50 amino acids (e.g., at least 100, 150, 200, 300, 500 amino acids) having at least 80% identity (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identity) to a sequence of a polypeptide listed in any of Tables 1-3 or a reverse transcriptase domain, endonuclease domain, or DNA binding domain thereof. 224.
- polypeptide comprises a sequence of at least 50 amino acids (e.g., at least 100, 150, 200, 300, 500 amino acids) having at least 80% identity (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identity) to the amino acid sequence of column 8 of Table 3, or a reverse transcriptase domain, endonuclease domain, or DNA binding domain thereof.
- amino acids e.g., at least 100, 150, 200, 300, 500 amino acids
- 80% identity e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identity
- the template RNA comprises a sequence of Table 3 (e.g., one or both of a 5’ untranslated region of column 6 of Table 3 and a 3’ untranslated region of column 7 of Table 3), or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. 226.
- Table 3 e.g., one or both of a 5’ untranslated region of column 6 of Table 3 and a 3’ untranslated region of column 7 of Table 3
- the template RNA comprises a sequence of about 100-125 bp from a 3’ untranslated region of column 7 of Table 3, e.g., wherein the sequence comprises nucleotides 1-100, 101-200, or 201-325 of the 3’ untranslated region of column 7 of Table 3, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. 227. Any above-numbered system or method, wherein (a) comprises RNA and (b) comprises RNA. 228.
- Any above-numbered system which is capable of modifying DNA by insertion of a heterologous object sequence in the presence of an inhibitor of a DNA repair pathway (e.g., SCR7, a PARP inhibitor), or in a cell line deficient for a DNA repair pathway (e.g., a cell line deficient for the nucleotide excision repair pathway or the homology-directed repair pathway).
- a DNA repair pathway e.g., SCR7, a PARP inhibitor
- a cell line deficient for a DNA repair pathway e.g., a cell line deficient for the nucleotide excision repair pathway or the homology-directed repair pathway.
- Any above-numbered system which does not cause formation of a detectable level of double stranded breaks in a target cell.
- Any above-numbered system which is capable of modifying DNA using reverse transcriptase activity, and optionally in the absence of homologous recombination activity. 234.
- any above-numbered system wherein the template RNA has been treated to reduce secondary structure, e.g., was heated, e.g., to a temperature that reduces secondary structure, e.g., to at least 70, 75, 80, 85, 90, or 95°C. 235.
- a host cell e.g., a mammalian cell, e.g., a human cell comprising any preceding numbered system. 237.
- the cell, tissue or subject is a mammalian (e.g., human) cell, tissue or subject. 238.
- the method of any of the preceding embodiments, wherein the cell is a fibroblast. 239.
- the method of any of the preceding embodiments, wherein the cell is a primary cell.
- the method of any of the preceding embodiments, where in the cell is not immortalized.
- 241. A method of modifying the genome of a mammalian cell, comprising contacting the cell with the system of any preceding embodiments. 242.
- a method of adding at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 500, 1000 bp of exogenous DNA to the genome of a mammalian cell comprising contacting the cell with a system of any preceding embodiments, wherein the method does not comprise contacting the mammalian cell with DNA, or wherein the method comprises contacting the mammalian cell with a composition comprising less than 1%, 0.5%, 0.2%, 0.1%, 0.05%, 0.02%, or 0.01% DNA by mass or by molar amount of nucleic acid. 245.
- a method of adding at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 500, 1000 bp of exogenous DNA to the genome of a mammalian cell comprising contacting the cell with a system of any preceding embodiments, wherein the method delivers only RNA to the mammalian cell. 246.
- a method of adding at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 500, 1000 bp of exogenous DNA to the genome of a mammalian cell comprising contacting the cell with a system of any preceding embodiments, wherein the method delivers RNA and protein to the mammalian cell. 247.
- a method of modifying the genome of a human cell comprising contacting the cell with a system of any preceding embodiments, wherein the method results in insertion of the heterologous object sequence into the human cell’s genome, wherein the human cell does not show upregulation of any DNA repair genes and/or tumor suppressor genes, or wherein no DNA repair gene and/or tumor suppressor gene is upregulated by more than 10%, 5%, 2%, or 1%, e.g., wherein upregulation is measured by RNA- seq, e.g., as described in Example 14 of PCT/US2019/048607, incorporated herein by reference.
- a method of adding an exogenous coding region to the genome of a cell comprising contacting the cell with a system of any preceding embodiments, wherein the template RNA comprises the non-coding strand of the exogenous coding region, wherein optionally the template RNA does not comprise a coding strand of the exogenous coding region, wherein optionally the delivery comprises non-viral delivery.
- a cell e.g., a mammalian cell
- the template RNA comprises the non-coding strand of the exogenous coding region, wherein optionally the template RNA does not comprise a coding strand of the exogenous coding region, wherein optionally the delivery comprises non-viral delivery.
- a method of expressing a polypeptide in a cell comprising contacting the cell with a system of any preceding embodiments, wherein the template RNA comprises a non-coding strand that is the reverse complement of a sequence that would encoding the polypeptide, wherein optionally the template RNA does not comprise a coding strand encoding the polypeptide, wherein optionally the delivery comprises non-viral delivery.
- the sequence that is inserted into the mammalian genome is a sequence that is exogenous to the mammalian genome.
- the system operates independently of a DNA template.
- the cell is part of a tissue.
- the mammalian cell is euploid, is not immortalized, is part of an organism, is a primary cell, is non-dividing, is a hepatocyte, or is from a subject having a genetic disease.
- the contacting comprises contacting the cell with a plasmid, virus, viral-like particle, virosome, liposome, vesicle, exosome, fusosome, or lipid nanoparticle.
- the contacting comprises using non- viral delivery. 259.
- any preceding embodiments which comprises comprising contacting the cell with the template RNA (or DNA encoding the template RNA), wherein the template RNA comprises the non-coding strand of an exogenous coding region, wherein optionally the template RNA does not comprise a coding strand of the exogenous coding region, wherein optionally the delivery comprises non-viral delivery, thereby adding the exogenous coding region to the genome of the cell. 260.
- any preceding embodiments which comprises contacting the cell with the template RNA (or DNA encoding the template RNA), wherein the template RNA comprises a non-coding strand that is the reverse complement of a sequence that would encoding the polypeptide, wherein optionally the template RNA does not comprise a coding strand encoding the polypeptide, wherein optionally the delivery comprises non-viral delivery, thereby expressing the polypeptide in the cell.
- the contacting comprises administering (a) and (b) to a subject, e.g., intravenously.
- the method of any preceding embodiments wherein the contacting comprises administering a dose of (a) and (b) to a subject at least twice. 263.
- 264. The method of any preceding embodiments, wherein (a) and (b) are administered separately.
- 265. The method of any preceding embodiments, wherein (a) and (b) are administered together.
- 266 The method of any any preceding embodiments, wherein the nucleic acid of (a) is not integrated into the genome of the host cell. 267.
- any preceding numbered method wherein the sequence that binds the polypeptide has one or more of the following characteristics: (a) is at the 3’ end of the template RNA; (b) is at the 5’ end of the template RNA; (b) is a non-coding sequence; (c) is a structured RNA; or (d) forms at least 1 hairpin loop structures. 268.
- the template RNA further comprises a sequence comprising at least 20 nucleotides of at least 80% identity (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identity) to a target DNA strand. 269.
- the template RNA further comprises a sequence comprising at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150 nucleotides of at least 80% identity (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identity) to a target DNA strand.
- nucleotides of at least 80% identity e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identity
- any preceding numbered method wherein the sequence comprising at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150 nucleotides, or about: 2-10, 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, 90-100, 10-100, or 2-100 nucleotides, of at least 80% identity to a target DNA strand is at the 3’ end of the template RNA. 271.
- the template RNA further comprises a sequence comprising at least 100 nucleotides of at least 80% identity (e.g., at least 85%, 90%, 95%, 97%, 98%, 99%, 100% identity) to a target DNA strand, e.g., at the 3’ end of the template RNA. 272.
- the site in the target DNA strand to which the sequence comprises at least 80% identity is proximal to (e.g., within about: 0-10, 10- 20, 20-30, 30-50, or 50-100 nucleotides of) a target site on the target DNA strand that is recognized (e.g., bound and/or cleaved) by the polypeptide comprising the endonuclease. 273.
- any preceding numbered method wherein the sequence comprising at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150 nucleotides, or about: 2-10, 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, 90-100, 10-100, or 2-100 nucleotides, of at least 80% identity to a target DNA strand is at the 3’ end of the template RNA; optionally wherein the site in the target DNA strand to which the sequence comprises at least 80% identity is proximal to (e.g., within about: 0-10, 10-20, or 20-30 nucleotides of) a target site on the target DNA strand that is recognized (e.g., bound and/or cleaved) by the polypeptide comprising the endonuclease.
- the target site is the site in the human genome that has the closest identity to a native target site of the polypeptide comprising the endonuclease, e.g., wherein the target site in the human genome has at least about: 16, 17, 18, 19, or 20 nucleotides identical to the native target site.
- the template RNA has at least 3, 4, 5, 6, 7, 8, 9, or 10 bases of 100% identity to the target DNA strand.
- the at least 3, 4, 5, 6, 7, 8, 9, or 10 bases of 100% identity to the target DNA strand are at the 3’ end of the template RNA. 277.
- any preceding numbered method wherein the at least 3, 4, 5, 6, 7, 8, 9, or 10 bases of 100% identity to the target DNA strand are at the 5’ end of the template RNA.
- the template RNA comprises at least 3, 4, 5, 6, 7, 8, 9, or 10 bases of 100% identity to the target DNA strand at the 5’ end of the template RNA and at least 3, 4, 5, 6, 7, 8, 9, or 10 bases of 100% identity to the target DNA strand at the 3’ end of the template RNA.
- any preceding numbered method, wherein the heterologous object sequence is between 50-50,000 base pairs (e.g., between 50-40,000 bp, between 500-30,000 bp between 500-20,000 bp, between 100-15,000 bp, between 500-10,000 bp, between 50-10,000 bp, between 50-5,000 bp).
- the heterologous object sequence is at least 10, 25, 50, 100, 150, 200, 250, 300, 400, 500, 600, or 700 bp. 281.
- Any preceding numbered method, wherein the heterologous object sequence is at least 715, 750, 800, 950, 1,000, 2,000, 3,000, or 4,000 bp. 282.
- any preceding numbered method, wherein the heterologous object sequence is less than 5,000, 10,000, 15,000, 20,000, 30,000, or 40,000 bp. 283. Any preceding numbered method, wherein the heterologous object sequence is less than 700, 600, 500, 400, 300, 200, 150, or 100 bp. 284.
- the heterologous object sequence comprises: (a) an open reading frame, e.g., a sequence encoding a polypeptide, e.g., an enzyme (e.g., a lysosomal enzyme), a membrane protein, a blood factor, an exon, an intracellular protein (e.g., a cytoplasmic protein, a nuclear protein, an organellar protein such as a mitochondrial protein or lysosomal protein), an extracellular protein, a structural protein, a signaling protein, a regulatory protein, a transport protein, a sensory protein, a motor protein, a defense protein, or a storage protein; (b) a non-coding and/or regulatory sequence, e.g., a sequence that binds a transcriptional modulator, e.g., a promoter, an enhancer, an insulator; (c) a splice acceptor site; (d) a polyA site; (e) an epigenetic
- any preceding numbered method, wherein the target DNA is a genomic safe harbor (GSH) site. 286. Any preceding numbered method, wherein the target DNA is a genomic Natural Harbor TM site. 287. Any preceding numbered method, which results in insertion of the heterologous object sequence into the a target site in the genome at an average copy number of at least 0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.5, 3, 4, or 5 copies per genome. 288.
- Any preceding numbered method which results in about 25-100%, 50-100%, 60-100%, 70-100%, 75-95%, 80%-90%, of integrants into a target site in the genome being non-truncated, as measured by an assay described herein, e.g., an assay of Example 6. 289. Any preceding numbered method, which results in insertion of the heterologous object sequence only at one target site in the genome of the cell. 290.
- any preceding numbered method which results in insertion of the heterologous object sequence into a target site in a cell, wherein the insertered heterologous sequence comprises less than 10%, 5%, 2%, 1%, 0.5%, 0.2%, or 0.1% mutations (e.g., SNPs or one or more deletions, e.g., truncations or internal deletions) relative to the heterologous sequence prior to insertion, e.g., as measured by an assay of Example 12 of PCT/US2019/048607, incorporated herein by reference. 291.
- mutations e.g., SNPs or one or more deletions, e.g., truncations or internal deletions
- any preceding numbered method which results in insertion of the heterologous object sequence into a target site in a plurality of cells, wherein less than 10%, 5%, 2%, or 1% of copies of the inserted heterologous sequence comprise a mutation (e.g., a SNP or a deletion, e.g., a truncation or an internal deletion), e.g., as measured by an assay of Example 12 of PCT/US2019/048607, incorporated herein by reference. 292.
- a mutation e.g., a SNP or a deletion, e.g., a truncation or an internal deletion
- any preceding numbered method which results in insertion of the heterologous object sequence into a target cell genome, and wherein the target cell does not show upregulation of p53, or shows upregulation of p53 by less than 10%, 5%, 2%, or 1%, e.g., wherein upregulation of p53 is measured by p53 protein level, e.g., according to the method described in Example 30 of PCT/US2019/048607, incorporated herein by reference, or by the level of p53 phosphorylated at Ser15 and Ser20. 293.
- any preceding numbered method which results in insertion of the heterologous object sequence into a target cell genome, and wherein the target cell does not show upregulation of any DNA repair genes and/or tumor suppressor genes, or wherein no DNA repair gene and/or tumor suppressor gene is upregulated by more than 10%, 5%, 2%, or 1%, e.g., wherein upregulation is measured by RNA-seq, e.g., as described in Example 14 of PCT/US2019/048607, incorporated herein by reference. 294.
- any preceding numbered method which results in insertion of the heterologous object sequence into the target site (e.g., at a copy number of 1 insertion or more than one insertion) in about 1-80% of cells in a population of cells contacted with the system, e.g., about: 1-10%, 10- 20%, 20-30%, 30-40%, 40-50%, 50-60%, 60-70%, or 70-80% of cells, e.g., as measured using single cell ddPCR, e.g., as described in Example 17 of PCT/US2019/048607, incorporated herein by reference. 295.
- any preceding numbered method which results in insertion of the heterologous object sequence into the target site at a copy number of 1 insertion in about 1-80% of cells in a population of cells contacted with the system, e.g., about: 1-10%, 10-20%, 20-30%, 30-40%, 40- 50%, 50-60%, 60-70%, or 70-80% of cells, e.g., as measured using colony isolation and ddPCR, e.g., as described in Example 18 of PCT/US2019/048607, incorporated herein by reference. 296.
- any preceding numbered method which results in insertion of the heterologous object sequence into the target site (on-target insertions) at a higher rate that insertion into a non-target site (off-target insertions) in a population of cells, wherein the ratio of on-target insertions to off- target insertions is greater than 10:1, 20:1, 30:1, 40:1, 50:1, 60:1, 70:1, 80:1.90:1, 100:1, 200:1, 500:1, or 1,000:1, e.g., using an assay of Example 11 of PCT/US2019/048607, incorporated herein by reference. 297.
- Any above-numbered method results in insertion of a heterologous object sequence in the presence of an inhibitor of a DNA repair pathway (e.g., SCR7, a PARP inhibitor), or in a cell line deficient for a DNA repair pathway (e.g., a cell line deficient for the nucleotide excision repair pathway or the homology-directed repair pathway).
- a DNA repair pathway e.g., SCR7, a PARP inhibitor
- a cell line deficient for a DNA repair pathway e.g., a cell line deficient for the nucleotide excision repair pathway or the homology-directed repair pathway.
- a method of making a system for modifying DNA comprising: (a) providing a template nucleic acid (e.g., a template RNA or DNA) comprising a heterologous homology sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% homology to a sequence comprised in a target DNA molecule, and/or (b) providing a polypeptide of the system (e.g., comprising a DNA-binding domain (DBD) and/or an endonuclease domain) comprising a heterologous targeting domain that binds specifically to a sequence comprised in the target DNA molecule.
- a template nucleic acid e.g., a template RNA or DNA
- a heterologous homology sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% homology to a sequence comprised in a target DNA molecule
- a polypeptide of the system e.g., comprising
- a) comprises introducing into the template nucleic acid (e.g., a template RNA or DNA) a heterologous homology sequence having at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% homology to the sequence comprised in a target DNA molecule
- b) comprises introducing into the polypeptide of the system (e.g., comprising a DNA-binding domain (DBD) and/or an endonuclease domain) the heterologous targeting domain that binds specifically to a sequence comprised in the target DNA molecule.
- DBD DNA-binding domain
- endonuclease domain an endonuclease domain
- the introducing of (a) comprises inserting the homology sequence into the template nucleic acid.
- the introducing of (a) comprises replacing a segment of the template nucleic acid with the homology sequence.
- the introducing of (a) comprises mutating one or more nucleotides (e.g., at least 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, or 100 nucleotides) of the template nucleic acid, thereby producing a segment of the template nucleic acid having the sequence of the homology sequence.
- nucleotides e.g., at least 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, or 100 nucleotides
- the introducing of (b) comprises inserting the amino acid sequence of the targeting domain into the amino acid sequence of the polypeptide. 306.
- the introducing of (b) comprises inserting a nucleic acid sequence encoding the targeting domain into a coding sequence of the polypeptide comprised in a nucleic acid molecule. 307.
- the introducing of (b) comprises replacing at least a portion of the polypeptide with the targeting domain. 308.
- the introducing of (a) comprises mutating one or more amino acids (e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 400, 500, or more amino acids) of the polypeptide. 309.
- one or more amino acids e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 400, 500, or more amino acids
- the motif recognized by the endonuclease domain e.g., at least 2, 4, 6, 8, 10, 20, 30, 40, or at least 50 nt, or no more than 50, 40, 30, 20, 10, 8, 6, 4, or 2) or less than 3 less than Gene Write polypeptide
- the DNA binding domain is modified such that the binding of the Gene Writer polypeptide to the new target site results in the proper positioning of the endonuclease domain to the core motif to enable endonuclease activity
- the motif recognized by the endonuclease domain comprises 4, 5, 6, 7, 8, 9, or 10 consecutive nucleotides of TTAAGGTAGC, AAGGTAGCCAAA, or TAAGGTAGCCAAA, or wherein the motif recognized by the endonuclease domain comprises 2 or 3, or 4 consecutive nucleotides of AAGG.
- a method for modifying a target site in genomic DNA in a cell comprising contacting the cell with: (a) a polypeptide or a nucleic acid encoding the polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase (RT) domain, (ii) a DNA-binding domain (DBD); and (iii) an endonuclease domain, e.g., a nickase domain; and (b) a template RNA (or DNA encoding the template RNA) comprising (e.g., from 5’ to 3’) (i) optionally a sequence that binds the target site (e.g., a second strand of a site in a target genome), (ii) optionally a sequence that binds the polypeptide, (iii) a heterologous object sequence, and (iv) a 3’ target homology domain, wherein: (i) the polypeptide comprises a heterologous targeting domain (e.
- a method of making a system for modifying the genome of a mammalian cell comprising: a) providing a template RNA as described in any of the preceding embodiments, e.g., wherein the template RNA comprises (i) a sequence that binds a polypeptide comprising a reverse transcriptase domain and an endonuclease domain, and (ii) a heterologous object sequence; and b) treating the template RNA to reduce secondary structure, e.g., heating the template RNA, e.g., to at least 70, 75, 80, 85, 90, or 95°C, and c) subsequently cooling the template RNA, e.g., to a temperature that allows for secondary structure, e.g, to less than or equal to 30, 25, or 20°C.
- the template RNA comprises (i) a sequence that binds a polypeptide comprising a reverse transcriptase domain and an endonuclease domain, and (ii) a heterologous
- any preceding embodiments which further comprises contacting the template RNA with a polypeptide that comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain, or with a nucleic acid (e.g., RNA) encoding the polypeptide. 314.
- the method of any preceding embodiments which further comprises contacting the template RNA with a cell. 315.
- the system or method of any of the preceding embodiments, wherein the heterologous object sequence encodes a therapeutic polypeptide.
- the heterologous object sequence encodes a mammalian (e.g., human) polypeptide, or a fragment or variant thereof. 317.
- the heterologous object sequence encodes an enzyme (e.g., a lysosomal enzyme), a blood factor (e.g., Factor I, II, V, VII, X, XI, XII or XIII), a membrane protein, an exon, an intracellular protein (e.g., a cytoplasmic protein, a nuclear protein, an organellar protein such as a mitochondrial protein or lysosomal protein), an extracellular protein, a structural protein, a signaling protein, a regulatory protein, a transport protein, a sensory protein, a motor protein, a defense protein, or a storage protein.
- an enzyme e.g., a lysosomal enzyme
- a blood factor e.g., Factor I, II, V, VII, X, XI, XII or XIII
- a membrane protein e.g., an exon
- an intracellular protein e.g., a cytoplasmic protein,
- heterologous object sequence comprises a tissue specific promoter or enhancer. 319.
- the heterologous object sequence encodes a polypeptide of greater than 250, 300, 400, 500, or 1,000 amino acids, and optionally up to 1300 amino acids.
- 320. The system or method of any of the preceding embodiments, wherein the heterologous object sequence encodes a fragment of a mammalian gene but does not encode the full mammalian gene, e.g., encodes one or more exons but does not encode a full-length protein. 321.
- heterologous object sequence encodes one or more introns.
- heterologous object sequence is other than a GFP, e.g., is other than a fluorescent protein or is other than a reporter protein.
- polypeptide comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain, wherein one or both of (i) or (ii) are derived from an avian retrotransposase, e.g., have a sequence of Table 1 or 3, or of a protein domain listed in Table 2, or at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. 324.
- polypeptide comprises (i) a reverse transcriptase domain and (ii) an endonuclease domain, wherein one or both of (i) or (ii) are derived from an avian retrotransposase, and wherein one or both of (i) or (ii) further comprises a number of substitutions relative to the natural sequence, e.g., at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions. 325.
- polypeptide has an activity at 37°C that is no less than 70%, 75%, 80%, 85%, 90%, or 95% of its activity at 25°C under otherwise similar conditions.
- nucleic acid encoding the polypeptide and the template RNA or a nucleic acid encoding the template RNA are separate nucleic acids.
- the template RNA does not encode an active reverse transcriptase, e.g., comprises an inactivated mutant reverse transcriptase, e.g., as described in Example 1 or 2 of PCT/US2019/048607, incorporated herein by reference, or does not comprise a reverse transcriptase sequence.
- the template RNA comprises one or more chemical modifications.
- the heterologous object sequence is disposed between the promoter and the sequence that binds the polypeptide. 330.
- the promoter is disposed between the heterologous object sequence and the sequence that binds the polypeptide.
- the heterologous object sequence comprises an open reading frame (or the reverse complement thereof) in a 5’ to 3’ orientation on the template RNA.
- the heterologous object sequence comprises an open reading frame (or the reverse complement thereof) in a 3’ to 5’ orientation on the template RNA.
- polypeptide comprises (a) a reverse transcriptase domain and (b) an endonuclease domain, wherein at least one of (a) or (b) is heterologous.
- polypeptide comprises (a) a target DNA binding domain, (b) a reverse transcriptase domain and (c) an endonuclease domain, wherein at least one of (a), (b) or (c) is heterologous. 335.
- a polypeptide or a nucleic acid encoding the polypeptide wherein the polypeptide comprises (i) a reverse transcriptase (RT) domain, (ii) a DNA-binding domain (DBD); and (iii) an endonuclease domain; wherein the DBD and/or the endonuclease domain comprise a heterologous targeting domain that binds specifically to a sequence comprised in a target DNA molecule (e.g., a genomic DNA).
- RT reverse transcriptase
- DBD DNA-binding domain
- an endonuclease domain comprise a heterologous targeting domain that binds specifically to a sequence comprised in a target DNA molecule (e.g., a genomic DNA).
- a polypeptide or a nucleic acid encoding a polypeptide wherein the polypeptide comprises (i) a first target DNA binding domain, e.g., comprising a first Zn finger domain, (ii) a reverse transcriptase domain, (iii) an endonuclease domain, and (iv) a second target DNA binding domain, e.g., comprising a second Zn finger domain, heterologous to the first target DNA binding domain.
- a first target DNA binding domain e.g., comprising a first Zn finger domain
- a reverse transcriptase domain e.g., an endonuclease domain
- a second target DNA binding domain e.g., comprising a second Zn finger domain
- a polypeptide or a nucleic acid encoding a polypeptide wherein the polypeptide comprises (i) a target DNA binding domain, (ii) a reverse transcriptase domain, optionally (iii) an endonuclease domain, wherein the polypeptide comprises a heterologous linker replacing a portion of (i), (ii), or (iii), or replacing an endogenous linker connecting two of (i), (ii), or (iii). 339.
- the heterologous linker replaces, e.g., deletes, a portion of (i).
- heterologous linker replaces, e.g., deletes, a portion of (ii) and (iii). 345.
- heterologous linker replaces, e.g., deletes, the endogenous linker connecting (i) and (ii). 346.
- heterologous linker replaces, e.g., deletes, the endogenous linker connecting (i) and (iii). 347.
- heterologous linker replaces, e.g., deletes, the endogenous linker connecting (ii) and (iii).
- heterologous linker comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 1023) or GGGS (SEQ ID NO: 1024).
- a host cell comprising the nucleic acid of any preceding embodiments. 353.
- a pharmaceutical composition comprising any preceding numbered system, nucleic acid, polypeptide, or vector; and a pharmaceutically acceptable excipient or carrier.
- the pharmaceutically acceptable excipient or carrier is selected from a vector (e.g., a viral or plasmid vector), a vesicle (e.g., a liposome, an exosome, a natural or synthetic lipid bilayer), a lipid nanoparticle.
- a vector e.g., a viral or plasmid vector
- a vesicle e.g., a liposome, an exosome, a natural or synthetic lipid bilayer
- a lipid nanoparticle e.g., a lipid nanoparticle.
- polypeptide comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 1023) or GGGS (SEQ ID NO: 1024).
- the reverse transcriptase domain comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 1023) or GGGS (SEQ ID NO: 1024).
- the retrotransposase comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 1023) or GGGS (SEQ ID NO: 1024). 361.
- polypeptide, reverse transcriptase domain, or retrotransposase comprises a linker comprising an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 1023) or GGGS (SEQ ID NO: 1024). 362.
- the polypeptide comprises a DNA binding doman covalently attached to the remainder of the polypeptide by a linker, e.g., a linker comprising at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 200, 300, 400, or 500 amino acids.
- a linker e.g., a linker comprising at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 200, 300, 400, or 500 amino acids.
- linker is attached to the remainder of the polypeptide at a position in the N-terminal side of an alpha helical region of the polypeptide, e.g., at a position corresponding to version v1 as described in Example 26 of PCT/US2019/048607, incorporated herein by reference. 365.
- RNA binding motif e.g., a -1 RNA binding motif
- linker is attached to the remainder of the polypeptide at a position in the C-terminal side of a random coil region of the polypeptide, e.g., N-terminal relative to a DNA binding motif (e.g., a c-myb DNA binding motif), e.g., at a position corresponding to version v3 as described in Example 26 of PCT/US2019/048607, incorporated herein by reference. 367.
- a DNA binding motif e.g., a c-myb DNA binding motif
- linker comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 1023) or GGGS (SEQ ID NO: 1024).
- SGSETPGTSESATPES SEQ ID NO: 1023
- GGGS SEQ ID NO: 1024
- a polynucleotide sequence comprising at least about 500, 1000, 2000, 2500, 2600, 2700, 2800, 2900, or 3000 contiguous nucleotides from the 3’ end of the template RNA sequence are integrated into a target cell genome. 370.
- nucleic acid sequence of the template RNA integrates into the genomes of a population of target cells at a copy number of at least about 0.21, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, or 1.0 integrants/genome. 371.
- nucleic acid sequence of the template RNA integrates into the genomes of a population of target cells at a copy number of at least about 0.085, 0.09, 0.1, 0.15, or 0.2 integrants/genome. 372.
- nucleic acid sequence of the template RNA integrates into the genomes of a population of target cells at a copy number of at least about 0.036, 0.04, 0.05, 0.06, 0.07, or 0.08 integrants/genome. 373.
- polypeptide comprises a functional endonuclease domain (e.g., wherein the endonuclease domain does not comprise a mutation that abolishes endonuclease activity, e.g., as described herein).
- polypeptide comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to the R2 polypeptide from a medium ground finch, e.g., Geospiza fortis (e.g., as described herein), or a functional fragment thereof.
- a medium ground finch e.g., Geospiza fortis (e.g., as described herein)
- polypeptide comprises an amino acid sequence of the R2 polypeptide from a medium ground finch, e.g., Geospiza fortis (e.g., as described herein), or a functional fragment thereof, and further comprises a number of substitutions relative to the the sequence the natural sequence, e.g., at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions. 376.
- a medium ground finch e.g., Geospiza fortis (e.g., as described herein)
- a number of substitutions relative to the the sequence the natural sequence e.g., at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions. 376.
- the reverse transcriptase domain comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to the R2 polypeptide from a medium ground finch, e.g., Geospiza fortis (e.g., as described herein), or a functional fragment thereof. 377.
- a medium ground finch e.g., Geospiza fortis (e.g., as described herein)
- the reverse transcriptase domain comprises an amino acid sequence of the R2 polypeptide from a medium ground finch, e.g., Geospiza fortis (e.g., as described herein), or a functional fragment thereof and further comprises a number of substitutions relative to the the sequence the natural sequence, e.g., at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions. 378.
- the retrotransposase comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to the R2 polypeptide from a medium ground finch, e.g., Geospiza fortis (e.g., as described herein), or a functional fragment thereof. 379.
- the retrotransposase comprises an amino acid sequence of the R2 polypeptide from a medium ground finch, e.g., Geospiza fortis (e.g., as described herein), or a functional fragment thereof and further comprises a number of substitutions relative to the the sequence the natural sequence, e.g., at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions. 380.
- nucleic acid sequence of the template RNA integrates into the genomes of a population of target cells at a copy number of at least about 0.21 integrants/genome. 381.
- polypeptide comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to the R4 polypeptide from a large roundworm, e.g., Ascaris lumbricoides (e.g., as described herein), or a functional fragment thereof. 382.
- polypeptide comprises an amino acid sequence of the R4 polypeptide from a large roundworm, e.g., Ascaris lumbricoides (e.g., as described herein), or a functional fragment thereof, and further comprises a number of substitutions relative to the the sequence the natural sequence, e.g., at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions. 383.
- the reverse transcriptase domain comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to the R4 polypeptidefrom a large roundworm, e.g., Ascaris lumbricoides (e.g., as described herein), or a functional fragment thereof. 384.
- the reverse transcriptase domain comprises an amino acid sequence of the R4 polypeptidefrom a large roundworm, e.g., Ascaris lumbricoides (e.g., as described herein), or a functional fragment thereof and further comprises a number of substitutions relative to the the sequence the natural sequence, e.g., at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions. 385.
- the retrotransposase comprises an amino acid sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to the R4 polypeptide from a large roundworm, e.g., Ascaris lumbricoides (e.g., as described herein), or a functional fragment thereof. 386.
- the retrotransposase comprises an amino acid sequence of the R4 polypeptide from a large roundworm, e.g., Ascaris lumbricoides (e.g., as described herein), or a functional fragment thereof and further comprises a number of substitutions relative to the the sequence the natural sequence, e.g., at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions. 387.
- nucleic acid sequence of the template RNA integrates into the genomes of a population of target cells at a copy number of at least about 0.085 integrants/genome. 388.
- H2AX phosphorylation e.g., gamma H2AX
- ATM phosphorylation e.g., ATM phosphorylation
- ATR phosphorylation e.g., ATR phosphorylation
- Chk1 phosphorylation e.g., Chk2 phosphorylation
- p53 phosphorylation e.g., gamma H2AX
- a site-specific nuclease e.g., Cas9
- any preceding embodiments, wherein the p53 protein level is determined according to the method described in Example 30 of PCT/US2019/048607, incorporated herein by reference. 391. Any preceding numbered embodiment, wherein introduction of the system into a target cell results in upregulation of p53 phosphorylation level in the target cell to a level that is less than about 0.001%, 0.005%, 0.01%, 0.05%, 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 30%, 40%, 45%, 50%, 55%, 60%, 70%, 80%, or 90% of the p53 phosphorylation level induced by introducing a site-specific nuclease, e.g., Cas9, that targets the same genomic site as said system.
- a site-specific nuclease e.g., Cas9
- a site-specific nuclease e.g., Cas9
- any preceding embodiments, wherein the p21 protein level is determined according to the method described in Example 30 of PCT/US2019/048607, incorporated herein by reference. 394. Any preceding numbered embodiment, wherein introduction of the system into a target cell results in upregulation of H2AX phosphorylation level in the target cell to a level that is less than about 0.001%, 0.005%, 0.01%, 0.05%, 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 30%, 40%, 45%, 50%, 55%, 60%, 70%, 80%, or 90% of the H2AX phosphorylation level induced by introducing a site-specific nuclease, e.g., Cas9, that targets the same genomic site as said system.
- a site-specific nuclease e.g., Cas9
- a site-specific nuclease e.g., Cas9
- a site-specific nuclease e.g., Cas9
- a site-specific nuclease e.g., Cas9
- a site-specific nuclease e.g., Cas9
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding the polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase (RT) domain, (ii) a DNA-binding domain (DBD); and (iii) an endonuclease domain, e.g., a nickase domain; and (b) a template RNA (or DNA encoding the template RNA) comprising (e.g., from 5’ to 3’) (i) optionally a sequence (e.g., a CRISPR spacer) that binds a target site (e.g., a non-edited strand of a site in a target genome), (ii) optionally a sequence that binds the polypeptide, (iii) a heterologous object sequence, and (iv) a 3’ homology domain.
- RT reverse transcriptase
- DBD DNA-binding domain
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding the polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase (RT) domain, (ii) a DNA-binding domain (DBD); and (iii) an endonuclease domain, e.g., a nickase domain; and (b) a template RNA (or DNA encoding the template RNA) comprising (e.g., from 5’ to 3’) (i) optionally a sequence that binds a target site (e.g., a non-edited strand of a site in a target genome), (ii) optionally a sequence that binds the polypeptide, (iii) a heterologous object sequence, and (iv) a 3’ homology domain, wherein the RT domain has a sequence of Table 1 or 3, or of a protein domain listed in Table 2, or a sequence having at least 70%
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding the polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase (RT) domain, (ii) a DNA-binding domain (DBD); and (iii) an endonuclease domain, e.g., a nickase domain; and (b) a template RNA (etRNA) (or DNA encoding the template RNA) comprising (e.g., from 5’ to 3’) (i) optionally a sequence that binds a target site (e.g., a non-edited strand of a site in a target genome), (ii) optionally a sequence that binds the polypeptide, (iii) a heterologous object sequence, and (iv) a 3’ homology domain, wherein the system is capable of producing an insertion into the target site of at least 45, 50, 55, 60, 65,
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding the polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase (RT) domain, (ii) a DNA-binding domain (DBD); and (iii) an endonuclease domain, e.g., a nickase domain; and (b) a template RNA (or DNA encoding the template RNA) comprising (e.g., from 5’ to 3’) (i) optionally a sequence that binds a target site (e.g., a non-edited strand of a site in a target genome), (ii) optionally a sequence that binds the polypeptide, (iii) a heterologous object sequence, and (iv) a 3’ homology domain, wherein the heterologous object sequence is at least 74, 75, 76, 77, 78, 79, 80, 81,
- the RT domain is heterologous to the DBD; the DBD is heterologous to the endonuclease domain; or the RT domain is heterologous to the endonuclease domain. 414.
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding the polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase (RT) domain, (ii) a DNA-binding domain (DBD); and (iii) an endonuclease domain, e.g., a nickase domain; and (b) a template RNA (or DNA encoding the template RNA) comprising (e.g., from 5’ to 3’) (i) optionally a sequence that binds a target site (e.g., a non-edited strand of a site in a target genome), (ii) optionally a sequence that binds the polypeptide, (iii) a heterologous object sequence, and (iv) a 3’ homology domain, wherein the system is capable of producing a deletion into the target site of at least 81, 85, 90, 95, 100, 110, 120, 130
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding the polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase (RT) domain, (ii) a DNA-binding domain (DBD); and (iii) an endonuclease domain, e.g., a nickase domain; and (b) a template (or DNA encoding the template RNA) comprising (e.g., from 5’ to 3’) (i) optionally a sequence that binds a target site (e.g., a non-edited strand of a site in a target genome), (ii) optionally a sequence that binds the polypeptide, (iii) a heterologous object sequence, and (iv) a 3’ homology domain, wherein (a)(ii) and/or (a)(iii) comprises a TALE molecule; a zinc finger molecule;
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding the polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase (RT) domain, (ii) a DNA-binding domain (DBD); and (iii) an endonuclease domain, e.g., a nickase domain; and (b) a template RNA (or DNA encoding the template RNA) comprising (e.g., from 5’ to 3’) (i) optionally a sequence (e.g., a CRISPR spacer) that binds a target site (e.g., a non-edited strand of a site in a target genome), (ii) optionally a sequence that binds the polypeptide, (iii) a heterologous object sequence, and (iv) a 3’ homology domain, wherein the endonuclease domain, e.g., RT
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding the polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase (RT) domain, (ii) a DNA-binding domain (DBD); and (iii) an endonuclease domain, e.g., a nickase domain; and (b) a template RNA (or DNA encoding the template RNA) comprising (e.g., from 5’ to 3’) (i) optionally a sequence that binds a target site (e.g., a non-edited strand of a site in a target genome), (ii) a sequence that specifically binds the RT domain, (iii) a heterologous object sequence, and (iv) a 3’ homology domain.
- RT reverse transcriptase
- DBD DNA-binding domain
- an endonuclease domain e.g.,
- a system for modifying DNA comprising: (a) a first polypeptide or a nucleic acid encoding the first polypeptide, wherein the first polypeptide comprises (i) a reverse transcriptase (RT) domain and (ii) optionally a DNA-binding domain, (b) a second polypeptide or a nucleic acid encoding the second polypeptide, wherein the second polypeptide comprises (i) a DNA-binding domain (DBD); (ii) an endonuclease domain, e.g., a nickase domain; and (c) a template RNA (or DNA encoding the template RNA) comprising (e.g., from 5’ to 3’) (i) optionally a sequence that binds the second polypeptide (e.g., that
- a system for modifying DNA comprising: (a) a polypeptide or a nucleic acid encoding the polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase (RT) domain, and (ii) a DNA-binding domain (DBD); and (iii) an endonuclease domain, e.g., a nickase domain; (b) a first template RNA (or DNA encoding the RNA) comprising (e.g., from 5’ to 3’) (i) a sequence that binds the polypeptide (e.g., that binds (a)(ii) and/or (a)(iii)) and (ii) a sequence that binds a target site (e.g., a non-edited strand of a site in a target genome), (e.g., wherein the first RNA comprises a gRNA); (c) a second template RNA (or DNA en
- the second template RNA comprises (i). 422 The system of any preceding embodiments, wherein the first template RNA comprises a first conjugating domain and the second template RNA comprises a second conjugating domain. 423 The system of any preceding embodiments, wherein the first and second conjugating domains are capable of hybridizing to one another, e.g., under stringent conditions. 424 The system of any preceding embodiments, wherein association of the first conjugating domain and the second conjugating domain colocalizes the first template RNA and the second template RNA. 425. The system of any previous embodiment, wherein the template RNA comprises (i). 426. The system of any previous embodiment, wherein the template RNA comprises (ii). 427.
- the template RNA comprises (i) and (ii). 428.
- a template RNA (or DNA encoding the template RNA) comprising a targeting domain (e.g., a heterologous targeting domain) that binds specifically to a sequence comprised in the target DNA molecule (e.g., a genomic DNA), a sequence that specifically binds an RT domain of a polypeptide, and a heterologous object sequence.
- a targeting domain e.g., a heterologous targeting domain
- the polypeptide comprises a heterologous targeting domain that binds specifically to a sequence comprised in the target DNA molecule (e.g., a genomic DNA).
- heterologous targeting domain binds to a different nucleic acid sequence than the unmodified polypeptide.
- the polypeptide does not comprise a functional endogenous targeting domain (e.g., wherein the polypeptide does not comprise an endogenous targeting domain).
- the heterologous targeting domain comprises a zinc finger (e.g., a zinc finger that binds specifically to the sequence comprised in the target DNA molecule). 433.
- the heterologous targeting domain comprises a Cas domain (e.g., a Cas9 domain, or a mutant or variant thereof, e.g., a Cas9 domain that binds specifically to the sequence comprised in the target DNA molecule).
- a Cas domain e.g., a Cas9 domain, or a mutant or variant thereof, e.g., a Cas9 domain that binds specifically to the sequence comprised in the target DNA molecule.
- the Cas domain is associated with a guide RNA (gRNA).
- gRNA guide RNA
- the heterologous targeting domain comprises an endonuclease domain (e.g., a heterologous endonuclease domain). 436.
- gRNA guide RNA
- the template nucleic acid molecule comprises at least one (e.g., one or two) heterologous homology sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% homology to a sequence comprised in a target DNA molecule (e.g., a genomic DNA).
- a target DNA molecule e.g., a genomic DNA.
- one of the at least one heterologous homology sequences is positioned at or within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides of the 5’ end of the template nucleic acid molecule.
- heterologous homology sequence binds within 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides of a nick site (e.g., produced by a nickase, e.g., an endonuclease domain, e.g., as described herein) in the target DNA molecule. 443.
- a nick site e.g., produced by a nickase, e.g., an endonuclease domain, e.g., as described herein
- the heterologous homology sequence has less than 50%, 40%, 30%, 20%, 10%, 5%, 4%, 3%, 2%, or 1% sequence identity with a nucleic acid sequence complementary to an endogenous homology sequence of an unmodified form of the template RNA. 444.
- heterologous homology sequence has having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% homology to a sequence of the target DNA molecule that is different the sequence bound by an endogenous homology sequence (e.g., replaced by the heterologous homology sequence). 445.
- heterologous homology sequence comprises a sequence (e.g., at its 3’ end) having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% homology to a sequence positioned 5’ to a nick site of the target DNA molecule (e.g., a site nicked by a nickase, e.g., an endonuclease domain as described herein). 446.
- the heterologous homology sequence comprises a sequence (e.g., at its 5’ end) suitable for priming target-primed reverse transcription (TPRT) initiation.
- TPRT target-primed reverse transcription
- heterologous homology sequence has at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% homology to a sequence positioned within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides of (e.g., 3’ relative to) a target insertion site, e.g., for a heterologous object sequence (e.g., as described herein), in the target DNA molecule. 448.
- gRNA guide RNA
- the template nucleic acid molecule comprises a guide RNA (gRNA), e.g., as described herein. 449.
- gRNA spacer sequence e.g., at or within 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, or 100 nucleotides of its 5’ end.
- a template RNA (or DNA encoding the template RNA) comprising (e.g., from 5’ to 3’) (i) a sequence that binds a target site (e.g., a second strand of a site in a target genome), (ii) a sequence that specifically binds an RT domain of a polypeptide, (iii) a heterologous object sequence, and (iv) a 3’ target homology domain. 451.
- the template RNA of any preceding embodiments further comprising (v) a sequence that binds an endonuclease and/or a DNA-binding domain of a polypeptide (e.g., the same polypeptide comprising the RT domain). 452.
- RT domain comprises a sequence selected of Table 1 or 3 or a sequence of a reverse transcriptase domain of Table 2 or a sequence that has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. 453.
- the RT domain comprises a sequence selected of Table 1 or 3 or a sequence of a reverse transcriptase domain of Table 2, wherein the RT domain further comprises a number of substitutions relative to the natural sequence, e.g., at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions. 454.
- the template RNA of any preceding embodiments, wherein the sequence that specifically binds the RT domain is a sequence, e.g., a UTR sequence, that binds the RT domain in a wild- type context, or a sequence having at least 70, 75, 80, 85, 90, 95, or 99% identity thereto. 456.
- a template RNA (or DNA encoding the template RNA) comprising from 5’ to 3’: (ii) a sequence that binds an endonuclease and/or a DNA-binding domain of a polypeptide, (i) a sequence that binds a target site (e.g., a second strand of a site in a target genome), (iii) a heterologous object sequence, and (iv) a 3’ target homology domain. 457.
- a template RNA (or DNA encoding the template RNA) comprising from 5’ to 3’: (iii) a heterologous object sequence, (iv) a 3’ target homology domain, (i) a sequence that binds a target site (e.g., a second strand of a site in a target genome), and (ii) a sequence that binds an endonuclease and/or a DNA-binding domain of a polypeptide. 458.
- a template RNA (or DNA encoding the template RNA) comprising (e.g., from 5’ to 3’) (i) optionally a sequence that binds a target site (e.g., a non-edited strand of a site in a target genome), (ii) optionally a sequence that binds an endonuclease and/or a DNA-binding domain of a polypeptide, (iii) a heterologous object sequence, and (iv) a 3’ homology domain. 459.
- the template RNA of any preceding embodiments, wherein the template RNA comprises (ii). 461.
- a template RNA (or DNA encoding the template RNA) comprising (e.g., from 5’ to 3’) (i) a sequence that binds a target site (e.g., a non-edited strand of a site in a target genome), (ii) a sequence that specifically binds an RT domain of a polypeptide, (iii) a heterologous object sequence, and (iv) a 3’ homology domain. 462.
- the template RNA of any preceding embodiments wherein the RT domain comprises a sequence selected of Table 1 or 3, or of a protein domain listed in Table 2or a sequence that has at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. 463.
- the template RNA of any preceding embodiments further comprising (v) a sequence that binds an endonuclease and/or a DNA-binding domain of a polypeptide (e.g., the same polypeptide comprising the RT domain). 464.
- the template RNA of any preceding embodiments wherein the sequence of (ii) specifically binds an RT domain of Table 1 or 3, or listed in Table 2, or an RT domain sequence that has at least 70, 75, 80, 85, 90, 95, or 99% identity thereto. 465.
- the template RNA of any preceding embodiments, wherein the sequence that specifically binds the RT domain is a sequence of Table 1 or 3, or of a protein domain listed in Table 2, or a sequence having at least 70, 75, 80, 85, 90, 95, or 99% identity thereto. 466.
- a template RNA (or DNA encoding the template RNA) comprising from 5’ to 3’: (ii) a sequence that binds an endonuclease and/or a DNA-binding domain of a polypeptide, (i) a sequence that binds a target site (e.g., a non-edited strand of a site in a target genome), (iii) a heterologous object sequence, and (iv) a 3’ homology domain. 467.
- a template RNA (or DNA encoding the template RNA) comprising from 5’ to 3’: (iii) a heterologous object sequence, (iv) a 3’ homology domain, (i) a sequence that binds a target site (e.g., a non-edited strand of a site in a target genome), and (ii) a sequence that binds an endonuclease and/or a DNA-binding domain of a polypeptide,. 468.
- the system or template RNA of any preceding embodiments, wherein the template RNA, first template RNA, or second template RNA comprises a sequence that specifically binds the RT domain. 469.
- a system for modifying DNA comprising: (a) a first template RNA (or DNA encoding the first template RNA) comprising (i) sequence that binds an endonuclease domain, e.g., a nickase domain, and/or a DNA-binding domain (DBD) of a polypeptide, and (ii) a sequence that binds a target site (e.g., a non-edited strand of a site in a target genome), (e.g., wherein the first RNA comprises a gRNA); (b) a second template RNA (or DNA encoding the second template RNA) comprising (i) a sequence that specifically binds a reverse transcriptase (RT) domain of a polypeptide (e.g., the polypeptide of (a)),
- nucleic acid encoding the first template RNA and the nucleic acid encoding the second template RNA are two separate nucleic acids.
- nucleic acid encoding the first template RNA and the nucleic acid encoding the second template RNA are part of the same nucleic acid molecule, e.g., are present on the same vector. 477.
- a polypeptide or a nucleic acid encoding the polypeptide wherein the polypeptide comprises (i) a reverse transcriptase (RT) domain, (ii) a DNA-binding domain (DBD); and (iii) an endonuclease domain, e.g., a nickase domain, wherein the RT domain has a sequence of Table 1 or 3, or of a protein domain listed in Table 2, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto. 478.
- RT reverse transcriptase
- DBD DNA-binding domain
- an endonuclease domain e.g., a nickase domain
- a system for modifying DNA comprising: (a) a first polypeptide or a nucleic acid encoding the polypeptide, wherein the polypeptide comprises a reverse transcriptase (RT) domain, wherein the RT domain has a sequence of Table 1 or 3, or of a protein domain listed in Table 2, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto; and optionally a DNA-binding domain (DBD) (e.g., a first DBD); and (b) a second polypeptide or a nucleic acid encoding the polypeptide, wherein the polypeptide comprises (i) a DBD (e.g., a second DBD); and (ii) an endonuclease domain, e.g., a nickase domain.
- DBD DNA-binding domain
- nucleic acid encoding the first polypeptide and the nucleic acid encoding the second polypeptide are two separate nucleic acids.
- nucleic acid encoding the first polypeptide and the nucleic acid encoding the second polypeptide are part of the same nucleic acid molecule, e.g., are present on the same vector. 481.
- an RNA of the system e.g., template RNA, the RNA encoding the polypeptide of (a), or an RNA expressed from a heterologous object sequence integrated into a target DNA
- a microRNA binding site e.g., in a 3’ UTR.
- a first miRNA e.g., miR-142
- a second miRNA e.g., miR-182 or miR-183
- RNA expressed from a heterologous object sequence integrated into a target DNA comprises at least 2, 3, or 4 miRNA binding sites, e.g., wherein the miRNA binding sites are recognized by the same or different miRNAs.
- domain refers to a structure of a biomolecule that contributes to a specified function of the biomolecule.
- a domain may comprise a contiguous region (e.g., a contiguous sequence) or distinct, non-contiguous regions (e.g., non-contiguous sequences) of a biomolecule.
- protein domains include, but are not limited to, an endonuclease domain, a DNA binding domain, a reverse transcription domain; an example of a domain of a nucleic acid is a regulatory domain, such as a transcription factor binding domain.
- exogenous when used with reference to a biomolecule (such as a nucleic acid sequence or polypeptide) means that the biomolecule was introduced into a host genome, cell or organism by the hand of man.
- a nucleic acid that is as added into an existing genome, cell, tissue or subject using recombinant DNA techniques or other methods is exogenous to the existing nucleic acid sequence, cell, tissue or subject.
- first strand and second strand as used to describe the individual DNA strands of target DNA, distinguish the two DNA strands based upon which strand the reverse transcriptase domain initiates polymerization, e.g., based upon where target primed synthesis initiates.
- the first strand refers to the strand of the target DNA upon which the reverse transcriptase domain initiates polymerization, e.g., where target primed synthesis initiates.
- the second strand refers to the other strand of the target DNA.
- Genomic safe harbor site A genomic safe harbor site is a site in a host genome that is able to accommodate the integration of new genetic material, e.g., such that the inserted genetic element does not cause significant alterations of the host genome posing a risk to the host cell or organism.
- a GSH site generally meets 1, 2, 3, 4, 5, 6, 7, 8 or 9 of the following criteria: (i) is located >300kb from a cancer-related gene; (ii) is >300kb from a miRNA/other functional small RNA; (iii) is >50kb from a 5’ gene end; (iv) is >50kb from a replication origin; (v) is >50kb away from any ultraconservered element; (vi) has low transcriptional activity (i.e. no mRNA +/- 25 kb); (vii) is not in copy number variable region; (viii) is in open chromatin; and/or (ix) is unique, with 1 copy in the human genome.
- GSH sites in the human genome that meet some or all of these criteria include (i) the adeno-associated virus site 1 (AAVS1), a naturally occurring site of integration of AAV virus on chromosome 19; (ii) the chemokine (C-C motif) receptor 5 (CCR5) gene, a chemokine receptor gene known as an HIV-1 coreceptor; (iii) the human ortholog of the mouse Rosa26 locus; (iv) the rDNA locus (v) the albumin locus, e.g., for liver cell applications; (vi) the T-cell receptor alpha constant (TRAC) locus, e.g., for T-cell applications.
- AAVS1 adeno-associated virus site 1
- CCR5 chemokine receptor 5
- heterologous when used to describe a first element in reference to a second element means that the first element and second element do not exist in nature disposed as described.
- a heterologous polypeptide, nucleic acid molecule, construct or sequence refers to (a) a polypeptide, nucleic acid molecule or portion of a polypeptide or nucleic acid molecule sequence that is not native to a cell in which it is expressed, (b) a polypeptide or nucleic acid molecule or portion of a polypeptide or nucleic acid molecule that has been altered or mutated relative to its native state, or (c) a polypeptide or nucleic acid molecule with an altered expression as compared to the native expression levels under similar conditions.
- a heterologous regulatory sequence e.g., promoter, enhancer
- a heterologous domain of a polypeptide or nucleic acid sequence e.g., a DNA binding domain of a polypeptide or nucleic acid encoding a DNA binding domain of a polypeptide
- a heterologous nucleic acid molecule may exist in a native host cell genome, but may have an altered expression level or have a different sequence or both.
- heterologous nucleic acid molecules may not be endogenous to a host cell or host genome but instead may have been introduced into a host cell by transformation (e.g., transfection, electroporation), wherein the added molecule may integrate into the host genome or can exist as extra-chromosomal genetic material either transiently (e.g., mRNA) or semi-stably for more than one generation (e.g., episomal viral vector, plasmid or other self-replicating vector).
- Mutation or Mutated when applied to nucleic acid sequences means that nucleotides in a nucleic acid sequence may be inserted, deleted or changed compared to a reference (e.g., native) nucleic acid sequence. A single alteration may be made at a locus (a point mutation) or multiple nucleotides may be inserted, deleted or changed at a single locus. In addition, one or more alterations may be made at any number of loci within a nucleic acid sequence. A nucleic acid sequence may be mutated by any method known in the art.
- Nucleic acid molecule refers to both RNA and DNA molecules including, without limitation, cDNA, genomic DNA and mRNA, and also includes synthetic nucleic acid molecules, such as those that are chemically synthesized or recombinantly produced, such as RNA templates, as described herein.
- the nucleic acid molecule can be double-stranded or single-stranded, circular or linear. If single-stranded, the nucleic acid molecule can be the sense strand or the antisense strand. Unless otherwise indicated, and as an example for all sequences described herein under the general format “SEQ. ID NO:,” “nucleic acid comprising SEQ.
- ID NO:1 refers to a nucleic acid, at least a portion which has either (i) the sequence of SEQ. ID NO:1, or (ii) a sequence complimentary to SEQ. ID NO:1.
- the choice between the two is dictated by the context in which SEQ. ID NO:1 is used. For instance, if the nucleic acid is used as a probe, the choice between the two is dictated by the requirement that the probe be complimentary to the desired target.
- Nucleic acid sequences of the present disclosure may be modified chemically or biochemically or may contain non-natural or derivatized nucleotide bases, as will be readily appreciated by those of skill in the art.
- Such modifications include, for example, labels, methylation, substitution of one or more naturally occurring nucleotides with an analog, inter-nucleotide modifications such as uncharged linkages (for example, methyl phosphonates, phosphotriesters, phosphoramidates, carbamates, etc.), charged linkages (for example, phosphorothioates, phosphorodithioates, etc.), pendant moieties, (for example, polypeptides), intercalators (for example, acridine, psoralen, etc.), chelators, alkylators, and modified linkages (for example, alpha anomeric nucleic acids, etc.).
- uncharged linkages for example, methyl phosphonates, phosphotriesters, phosphoramidates, carbamates, etc.
- charged linkages for example, phosphorothioates, phosphorodithioates, etc.
- pendant moieties for example, polypeptides
- intercalators for example, acridine
- Gene expression unit is a nucleic acid sequence comprising at least one regulatory nucleic acid sequence operably linked to at least one effector sequence.
- a first nucleic acid sequence is operably linked with a second nucleic acid sequence when the first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence.
- a promoter or enhancer is operably linked to a coding sequence if the promoter or enhancer affects the transcription or expression of the coding sequence.
- Operably linked DNA sequences may be contiguous or non-contiguous. Where necessary to join two protein-coding regions, operably linked sequences may be in the same reading frame.
- Host The terms host genome or host cell, as used herein, refer to a cell and/or its genome into which protein and/or genetic material has been introduced.
- a host genome or host cell may be an isolated cell or cell line grown in culture, or genomic material isolated from such a cell or cell line, or may be a host cell or host genome which composing living tissue or an organism. In some instances, a host cell may be an animal cell or a plant cell, e.g., as described herein.
- a host cell may be a bovine cell, horse cell, pig cell, goat cell, sheep cell, chicken cell, or turkey cell. In certain instances, a host cell may be a corn cell, soy cell, wheat cell, or rice cell.
- Operative association describes a functional relationship between two nucleic acid sequences, such as a 1) promoter and 2) a heterologous object sequence, and means, in such example, the promoter and heterologous object sequence (e.g., a gene of interest) are oriented such that, under suitable conditions, the promoter drives expression of the heterologous object sequence.
- the template nucleic acid may be single-stranded, e.g., either the (+) or (-) orientation but an operative association between promoter and heterologous object sequence means whether or not the template nucleic acid will transcribe in a particular state, when it is in the suitable state (e.g., is in the (+) orientation, in the presence of required catalytic factors, and NTPs, etc.), it does accurately transcribe.
- Operative association applies analogously to other pairs of nucleic acids, including other tissue-specific expression control sequences (such as enhancers, repressors and microRNA recognition sequences), IR/DR, ITRs, UTRs, or homology regions and heterologous object sequences or sequences encoding a transposase.
- Pseudoknot sequence refers to a nucleic acid (e.g., RNA) having a sequence with suitable self-complementarity to form a pseudoknot structure, e.g., having: a first segment, a second segment between the first segment and a third segment, wherein the third segment is complementary to the first segment, and a fourth segment, wherein the fourth segment is complementary to the second segment.
- the pseudoknot may optionally have additional secondary structure, e.g., a stem loop disposed in the second segment, a stem-loop disposed between the second segment and third segment, sequence before the first segment, or sequence after the fourth segment.
- the pseudoknot may have additional sequence between the first and second segments, between the second and third segments, or between the third and fourth segments.
- the segments are arranged, from 5’ to 3’: first, second, third, and fourth.
- the first and third segments comprise five base pairs of perfect complementarity.
- the second and fourth segments comprise 10 base pairs, optionally with one or more (e.g., two) bulges.
- the second segment comprises one or more unpaired nucleotides, e.g., forming a loop.
- the third segment comprises one or more unpaired nucleotides, e.g., forming a loop.
- Stem-loop sequence refers to a nucleic acid sequence (e.g., RNA sequence) with sufficient self-complementarity to form a stem-loop, e.g., having a stem comprising at least two (e.g., 3, 4, 5, 6, 7, 8, 9, or 10) base pairs, and a loop with at least three (e.g., four) base pairs.
- the stem may comprise mismatches or bulges.
- tissue-specific expression-control sequence(s) means nucleic acid elements that increase or decrease the level of a transcript comprising the heterologous object sequence in the target tissue in a tissue-specific manner, e.g., preferentially in an on-target tissue(s), relative to an off-target tissue(s).
- a tissue-specific expression-control sequence preferentially drives or represses transcription, activity, or the half-life of a transcript comprising the heterologous object sequence in the target tissue in a tissue-specific manner, e.g., preferentially in an on-target tissue(s), relative to an off-target tissue(s).
- tissue-specific expression-control sequences include tissue-specific promoters, repressors, enhancers, or combinations thereof, as well as tissue-specific microRNA recognition sequences.
- Tissue specificity refers to on-target (tissue(s) where expression or activity of the template nucleic acid is desired or tolerable) and off-target (tissue(s) where expression or activity of the template nucleic acid is not desired or is not tolerable).
- a tissue-specific promoter (such as a promoter in a template nucleic acid or controlling expression of a transposase) drives expression preferentially in on-target tissues, relative to off-target tissues.
- a micro-RNA that binds the tissue-specific microRNA recognition sequences is preferentially expressed in off-target tissues, relative to on- target tissues, thereby reducing expression of a template nucleic acid (or transposase) in off- target tissues.
- a promoter and a microRNA recognition sequence that are specific for the same tissue, such as the target tissue have contrasting functions (promote and repress, respectively, with concordant expression levels, i.e., high levels of the microRNA in off-target tissues and low levels in on-target tissues, while promoters drive high expression in on-target tissues and low expression in off-target tissues) with regard to the transcription, activity, or half- life of an associated sequence in that tissue.
- the linker region at the C-terminus of the DNA-binding domain of R2Tg can be truncated and modified. Deletions in the Natural Linker from the myb domain at A or B to positions 1 or 2 along with replacement by 3GS or XTEN synthetic linkers were constructed (A). Integration efficiency was measured in HEK293T cells by ddPCR (B).
- Figure 3a ddPCR assay measuring percentage of integrations from all lentiviral integrated landing pads per cell
- Figure 3b Amplicon-sequencing and NGS analysis of indels present at landing pads sites.
- Figure 6. AAVS1 ZFP fusion to a Retrotransposase Gene Writer with or without functional DNA binding domain.
- Figure 7. Schematic of second strand nicking.
- a Cas9 nickase is fused to a Gene Writer protein. The Gene Writer protein introdces a nick in a DNA strand through its EN domain (shown as *), and the fused Cas9 nickase introduces a nicks on either top or bottom DNA strands (shown as X).
- FIG. 1 Schematic of donor transgene flanked by UTRs and homology to the cut site.
- Figure 10. Schematic of constructs.
- A Schematic of Gene Writer protein.
- B Schematic of donor transgene flanked by UTRs and homology to the cut site.
- C Schematic of Cas9 constructs used.
- Figure 11. The schematics for mRNA encoding Gene Writer (A). The native untranslated regions (UTRs) were replaced by 5’ and 3’ UTRs optimized for the protein expression (shown as 5’ UTRexp and 3’ UTRexp). The Gene Writer protein expression was assayed by HiBit assay by probing HiBit tag expression (B).
- the Gene Writing activity with non-native UTRs is stimulated by the presence of the RNA template bearing the retrotransposon native UTRs.
- Figure 13 Delivery of Gene Writer system using mRNA encoding the polypeptide and plasmid DNA encoding the RNA template for retrotransposition.
- HA homology arm
- K Kozak sequence
- pA poly A signal
- AMa A. maritima
- Rx other species of retrotransposon.
- Introns are shown by curved lines.5’HA: 5’ homology arm; 3’ HA: 3’ homology arm; 5’ UTR: Retrotransposon-specific 5’UTR; 3’ UTR: Retrotransposon-specific 3’ UTR; GOI: gene of interest.
- Orange blocks correspond to the sequence designed to be expressed from the genomic location harboring its own cell specific promoter, poly(A) signal and UTRs for the protein expression (5’ and 3’ UTRexp). The sequence can be oriented in the sense (shown above) or the antisense orientation related to retrotransposon UTRs and homology arms.
- the intron can be located within GOI, or within UTR exp . Figure 16.
- the Gene Writer mRNA at 0.5 ⁇ g/well was co-transfected with the RNA templates with or without enzymatically added cap 1 and the poly(A) tail.
- the Gene Writer mRNA to RNA transgene ratio was 1:1.
- Figure 19 The modules comprising a typical Gene Writer RNA template, where individual modules can be combined, re-arranged, and/or left out to produce a Gene Writer template.
- Figure 20 Construct diagram of driver and transgene plasmids. Homology arms (HA) and stuffer sequences are variable in this set of experiments Figure 21.
- A,B Integration efficiency as measured across the 3’ junction between transgene and host rDNA.
- C,D Integration efficiency as measured across the 5’ junction.
- Figure 22 Example illustration of homology shift design tested for +/-3bp. Red indicates homology to 5’ of the wildtype (WT) nick site, and blue indicates homology 3’ to the nick.3’ shifted constructs (+) begin 3’ homology farther downstream from the nick.5’ shifted constructs (-) incorporate homology from the 5’ of the nick into the 3’ homology arm.
- Figure 23.3’ integration results from shifting the 3’ homology arm of the transgene. Each data point represents a replicate, while the bar represents the mean of two replicates.
- Figure 24. (A) Timeline of experiment. (B) Schematic of R2Tg and transgene construct configurations. (C) Western Blot against Rad51 shows loss of Rad51 protein expression at day 3. Figure 25. U2OS cells were treated with a non targeting control siRNA (ctrl) or siRNA against Rad51, along with R2Tg Wt or control RT and EN mutants. ddPCR at the 3’ (A) or 5’ (B) junction was used to assess integration efficiency on day 3. Figure 26.
- A Sequence map of Ribozyme of R2 element from Taeniopygia guttata (R2Tg) in context of modules of Gene Writer transgene molecule RNA.
- the Ribozyme features are denoted as: P, based paired region; P′, based pair region complement strand; L, loop at end of P region; J, nucleotides joining base paired regions.
- SEQ ID NO: 1592 SEQ ID NO: 1592.
- B Prediction of ribozyme secondary structure of R2Tg. Shaded box indicates a predicted catalytic position that could be used to inactivate the ribozyme. This Figure discloses SEQ ID NO: 1592.
- Figure 27 The Ribozyme features are denoted as: P, based paired region; P′, based pair region complement strand; L, loop at end of P region; J, nucleotides joining base paired regions.
- Ribozyme of R2 element from Taeniopygia guttata (R2Tg) in context of modules of Gene Writer transgene molecule RNA.
- the Ribozyme features are denoted as: P, based paired region; P′, based pair region complement strand; L, loop at end of P region; J, nucleotides joining base paired regions.
- Figure 28. Prediction of ribozyme secondary structure of R2 element from Taeniopygia guttata.
- SEQ ID NO: 1592 Figures 29A and 29B are a series of diagrams showing examples of configurations of Gene Writers using domains derived from a variety of sources.
- Gene Writers as described herein may or may not comprise all domains depicted.
- a GeneWriter may, in some instances, lack an RNA-binding domain, or may have single domains that fulfill the functions of multiple domains, e.g., a Cas9 domain for DNA binding and endonuclease activity.
- Exemplary domains that can be included in a GeneWriter polypeptide include DNA binding domains (e.g., comprising a DNA binding domain of an element of a sequence listed in any of Tables 1 or 3, or a domain listed in Table 2; a zinc finger; a TAL domain; Cas9; dCas9; nickase Cas9; a transcription factor, or a meganuclease), RNA binding domains (e.g., comprising an RNA binding domain of B-box protein, MS2 coat protein, dCas, or an element of a sequence listed in any of Tables 1 or 3, or a domain listed in Table 2), reverse transcriptase domains (e.g., comprising a reverse transcriptase domain of an element of a sequence listed in any of Tables 1 or 3, or a domain listed in Table 2), and/or an endonuclease domain (e.g., comprising an endonuclease domain of an element of a sequence listed in any
- Figures 30A and B illustrates mutations to the DNA binding motifs in a Gene Writer polypeptide that inhibit native site integration.
- Figure 30A discloses a general domain structure of a R2Tg retrotransposase (top), comprising a DNA-binding domain containing multiple predicted DNA-binding elements (bottom). The two zinc finger motifs and c-myb motif indicated in the protein were mutated as according to Example 30.
- Figure 30B illustrates that integration activity for the mutants of the ZF1, ZF2, and c-myb domains was assessed in HEK293T cells by analyzing native rDNA site integration frequency using ddPCR.
- Figures 31A shows the predicted binding and cleavage locations in the target site of the R2Tg retrotransposase.
- Figure 31B shows the cleavage site of the R2Tg retrotransposase was validated by analysis of genome alterations resulting from endonuclease activity. Plasmid DNA encoding the R2Tg retrotransposase was nucleofected into U2OS cells and genomic DNA was harvested after three days. Target site amplicons were generated using site-specific primers and sequenced to determine the location of genome alterations indicative of endonuclease activity. Shown here is a graph depicting the frequence of insertions (circles) and deletions (triangles) per nucleotide of sequence (x-axis).
- FIG. 32A shows determination of sequence determinants for endonuclease activity of a retrotransposase by schematic representation of Landing pad screen.
- Figure 33A shows a lentiviral expression vector was used to clone landing pads containing a native R2 retrotransposase target site or sites comprising mutations relative to the native site. Lentiviral constructs were packaged and used to transduce U2OS cells for generating cell lines with the landing pads integrated into the genome.
- the landing pad additionally comprised a green fluorescent protein (GFP) reporter cassette for titer determinations.
- GFP green fluorescent protein
- Figure 33B shows Landing pad sequences comprising wild-type or mutational variants of the R2 site.
- a native rDNA sequence landing pad containing the unmodified rDNA sequence (WT_R2Tg) was used as a positive control.
- a series of 16 landing pads are shown with mutated regions indicated in dark gray and the GG cleavage site in light gray (left). The graph (right) was used to visualize the magnitude of each target site change on endonuclease activity of the enzyme.
- FIG. 33 shows the overview of landing pad screen for retargeting a Gene Writer polypeptide. Schematic of the landing pad library built to analyze the sequences recognized in R2Tg retargeting. The AAVS1-ZF binding site (dark gray and labeled AAVS1) was used as a DNA binding motif for retargeting, and all landing pads were built in the context of the human AAVS1 genomic sequence.
- rDNA sequence black was added to the AAVS1 sequence in various ways: (Category 1) different length of rDNA sequence, (Category 2) different distances between the AAVS1 ZF binding site and the rDNA sequence, (Category 3) different orientations of the rDNA sequence relative to the AAVS1 site. Categories 1, 2, and 3 were explored combinatorially, resulting in lading pads of various rDNA sequence lengths and various distances and orientations relative to the AAVS ZF binding site. The AAGG minimum sequence for R2Tg cleavage was maintained in all landing pads (black box with white fill). Each landing pad was designed with a unique barcode at the 3’ end of the sequence to enable computational extraction and analysis of landing pad sequences from the pool.
- Figure 34 represents sequencing-based determination of landing pad representation in U2OS pool.
- the landing pad pool of U2OS cells was sequenced and analyzed to determine barcode representation. Approximately 94% of landing pads were represented by at least 10,000 reads (horizontal black bar). The x-axis indicates landing pad identity and the y-axis shows the total reads for that barcode.
- Figures 35 A and B discloses generation of indel signatures in a landing pad library enables screening of chimeric Gene Writer polypeptides.
- Figure 35A shows a landing pad library comprising various compositions of AAVS1 and R2 rDNA target sequences was treated with a full-length R2Tg retrotransposase fused to a zinc finger for AAVS1 sequence recognition.
- Figures 36 A and B discloses generation of indel signatures in a landing pad library enables screening of chimeric Gene Writer polypeptides.
- Figure 36A shows a landing pad library comprising various compositions of AAVS1 and R2 rDNA target sequences was treated with a full-length R2Tg retrotransposase fused to a zinc finger for AAVS1 sequence recognition. Amplicon sequencing was performed and insertion frequencies at the GG target site (y-axis) are plotted for each landing pad (x-axis). A representative number of 230 landing pads is shown on the x-axis. The negative control lacking any rDNA sequence did not harbor any insertions.
- FIG. 36B is an illustrative representation of landing pad configurations found to contain signatures of endonuclease activity.
- Figure 37 Aand B describes luciferase activity assay for primary cells. LNPs formulated as according to Example 38 were analyzed for delivery of cargo to primary human (A) and mouse (B) hepatocytes, as according to Example 39. The luciferase assay revealed dose- responsive luciferase activity from cell lysates, indicating successful delivery of RNA to the cells and expression of Firefly luciferase from the mRNA cargo.
- Figure 38 shows LNP-mediated delivery of RNA cargo to the murine liver.
- Firefly luciferase mRNA-containing LNPs were formulated and delivered to mice by iv, and liver samples were harvested and assayed for luciferase activity at 6, 24, and 48 hours post administration.
- Reporter activity by the various formulations followed the ranking LIPIDV005>LIPIDV004>LIPIDV003.
- RNA expression was transient and enzyme levels returned near vehicle background by 48 hours, post-administration.
- Figure 39 Shows improving expression of Cas-RT fusions through choice of linker sequence.
- U2OS cells were transfected with Cas-RT expression plasmids harboring various linkers from Table 42 fusing the Cas9(N863A) nickase to the RT domain of an RNA-binding domain mutated R2Bm retrotransposase.
- Cell lysates were collected and analyzed by Western blot using a primary antibody against Cas9. A primary antibody against vinculin (left) or GADPH (right) was included as a loading control.
- Cas9 controls on the left represent titration of a Cas9 expression plasmid.
- the disclosure provides retrotransposon-based systems for inserting a sequence of interest into the genome.
- This disclosure is based, in part, on a bioinformatic analysis to identify retrotransposase sequences and the associated 5’ UTR and 3’ UTR from a variety of organisms (see Table 3).
- Gene-writerTM genome editors Non-long terminal repeat (LTR) retrotransposons are a type of mobile genetic elements that are widespread in eukaryotic genomes. They include two classes: the apurinic/apyrimidinic endonuclease (APE)-type and the restriction enzyme-like endonuclease (RLE)-type.
- APE apurinic/apyrimidinic endonuclease
- RLE restriction enzyme-like endonuclease
- the APE class retrotransposons are comprised of two functional domains: an endonuclease/DNA binding domain, and a reverse transcriptase domain.
- the RLE class are comprised of three functional domains: a DNA binding domain, a reverse transcription domain, and an endonuclease domain.
- the reverse transcriptase domain of non-LTR retrotransposon functions by binding an RNA sequence template and reverse transcribing it into the host genome’s target DNA.
- the RNA sequence template has a 3’ untranslated region which is specifically bound to the transposase, and a variable 5’ region generally having Open Reading Frame(s) (“ORF”) encoding transposase proteins.
- ORF Open Reading Frame
- the RNA sequence template may also comprise a 5’ untranslated region which specifically binds the retrotransposase.
- Reverse transcription by non-LTR retrotransposons occurs via a unique process described as target-primed reverse transcription (Luan et al. Cell 72, 595-605 (1993)).
- a first single-stranded nick is generated by an endonuclease domain of the retrotransposase, releasing a free 3’-OH.
- the retrotransposon RNA, bound by the retrotransposase using structural features at the 3’ end is then primed by the target site with polymerization at the free 3’-OH and used as a template for reverse transcription.
- a second nick is targeted to the second DNA strand and the new free 3’-OH is used to initiate second strand synthesis.
- Some non-LTR retrotransposons, e.g., R2 are believed to additionally require interaction with a second retrotransposase unit at the 5’ end of the retrotransposon RNA for this second nick, which is activated upon the release of the 5’ end (Craig, Mobile DNA III, ASM, ed.3 (2105)).
- non-LTR retrotransposons can be functionally modularized and/or modified to target, edit, modify or manipulate a target DNA sequence, e.g., to insert an object (e.g., heterologous) nucleic acid sequence into a target genome, e.g., a mammalian genome, by reverse transcription.
- object e.g., heterologous nucleic acid sequence
- Gene WriterTM gene editors Such modularized and modified nucleic acids, polypeptide compositions and systems are described herein and are referred to as Gene WriterTM gene editors.
- a Gene WriterTM gene editor system comprises: (A) a polypeptide or a nucleic acid encoding a polypeptide, wherein the polypeptide comprises (i) a reverse transcriptase domain, and either (x) an endonuclease domain that contains DNA binding functionality or (y) an endonuclease domain and separate DNA binding domain; and (B) a template RNA comprising (i) a sequence that binds the polypeptide and (ii) a heterologous insert sequence.
- the Gene Writer genome editor protein may comprise a DNA-binding domain, a reverse transcriptase domain, and an endonuclease domain.
- the Gene Writer genome editor protein may comprise a reverse transcriptase domain and an endonuclease domain.
- the elements of the Gene WriterTM gene editor polypeptide can be derived from sequences of non-LTR retrotransposons, e.g., APE-type or RLE-type retrotransposons or portions or domains thereof.
- the RLE-type non-LTR retrotransposon is from the R2, NeSL, HERO, R4, or CRE clade.
- the Gene Writer genome editor is derived from R4 element X4_Line, which is found in the human genome.
- the APE-type non-LTR retrotransposon is from the R1, or Tx1 clade.
- the Gene Writer genome editor is derived from Tx1 element Mare6, which is found in the human genome.
- the RNA template element of a Gene WriterTM gene editor system is typically heterologous to the polypeptide element and provides an object sequence to be inserted (reverse transcribed) into the host genome.
- the Gene Writer genome editor protein is capable of target primed reverse transcription.
- the Gene Writer genome editor protein is capable of second strand synthesis.
- the Gene Writer genome editor is combined with a second polypeptide.
- the second polypeptide is derived from an APE-type non- LTR retrotransposon.
- the second polypeptide has a zinc knuckle-like motif.
- the second polypeptide is a homolog of Gag proteins. In some embodiments, the second polypeptide possesses specific binding activity for the RNA template. In some embodiments, the second polypeptide aids in localization of the RNA template to the nucleus. In embodiments, the disclosure provides a nucleic acid molecule or a system for retargeting, e.g., of a Gene Writer polypeptide or nucleic acid molecule, or of a system as described herein.
- Retargeting (e.g., of a Gene Writer polypeptide or nucleic acid molecule, or of a system as described herein) generally comprises : (i) directing the polypeptide to bind and cleave at the target site; and/or (ii) designing the template RNA to have complementarity to the target sequence.
- the template RNA has complementarity to the target sequence 5’ of the first-strand nick, e.g., such that the 3’ end of the template RNA anneals and the 5’ end of the target site serves as the primer, e.g., for target-primed reverse transcription (TPRT).
- TPRT target-primed reverse transcription
- the endonuclease domain of the polypeptide and the 5’ end of the RNA template are also modified as described.
- Polypeptide component of Gene Writer gene editor system RT domain In certain aspects of the present invention, the reverse transcriptase domain of the Gene Writer system is based on a reverse transcriptase domain of an APE-type or RLE-type non-LTR retrotransposon.
- a wild-type reverse transcriptase domain of an APE-type or RLE-type non- LTR retrotransposon can be used in a Gene Writer system or can be modified (e.g., by insertion, deletion, or substitution of one or more residues) to alter the reverse transcriptase activity for target DNA sequences.
- the reverse transcriptase is altered from its natural sequence to have altered codon usage, e.g. improved for human cells.
- the reverse transcriptase domain is a heterologous reverse transcriptase from a different retrovirus, LTR-retrotransposon, or non-LTR retrotransposon.
- a Gene Writer system includes a polypeptide that comprises a reverse transcriptase domain of an RLE-type non-LTR retrotransposon from the R2, NeSL, HERO, R4, or CRE clade, or of an APE-type non- LTR retrotransposon from the R1, or Tx1 clade.
- a Gene WriterTM system includes a polypeptide that comprises a reverse transcriptase domain of a non-LTR retrotransposon, LTR retrotransposon, group II intron, diversity-generating element, retron, telomerase, retroplasmid, retrovirus, or an engineered polymerase listed in Table 1 or Table 3.
- a Gene WriterTM system includes a polypeptide that comprises a reverse transcriptase domain listed in Table 2.
- the amino acid sequence of the reverse transcriptase domain of a Gene Writer system is at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to the amino acid sequence of a reverse transcriptase domain of a non-LTR retrotransposon, LTR retrotransposon, group II intron, diversity-generating element, retron, telomerase, retroplasmid, retrovirus, or an engineered polymerase whose sequence is referenced in Table 1 or Table 3, or to a peptide comprising a reverse transcriptase domain listed in Table 2.
- the RT domain has a sequence selected from Table 1 or 3, or a sequence of a peptide comprising an RT domain selected from Table 2, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.
- the RT domain is derived from the RT of a retrovirus, e.g., HIV-1 RT, Moloney Murine Leukemia Virus (MMLV) RT, avian myeloblastosis virus (AMV) RT, Rous Sarcoma Virus (RSV) RT.
- HIV-1 RT HIV-1 RT
- MMLV Moloney Murine Leukemia Virus
- AMV avian myeloblastosis virus
- RSV Rous Sarcoma Virus
- the RT domain is derived from the RT of a Group II intron, e.g., the group II intron maturase RT from Eubacterium rectale (MarathonRT) (Zhao et al. RNA 24:22018), the RT domain from LtrA, the RT TGIRT (or trt).
- the RT domain is derived from the RT of a retron, e.g., the reverse transcriptase from Ec86 (RT86).
- the RT domain is derived from a diversity-generating retroelement, e.g., from the RT of Brt.
- the RT domain is derived from the RT of a retroplasmid, e.g., the RT from the Mauriceville plasmid.
- the RT domain is derived from a non-LTR retrotransposon, e.g., the RT from R2Bm, the RT from R2Tg, the RT from LINE-1, the RT from Penelope or a Penelope-like element (PLE).
- the RT domain is derived from an LTR retrotransposon, e.g., the reverse transcriptase from Ty1.
- the RT domain is derived from a telomerase, e.g., TERT.
- the reverse transcriptase contains the InterPro domain IPR000477. In some embodiments, the reverse transcriptase contains the pfam domain PF00078. In some embodiments, the reverse transcriptase contains the InterPro domain IPR013103. In some embodiments, the RT contains the pfam domain PF07727.
- the reverse transcriptase contains a conserved protein domain of the cd00304 RT_like family, e.g., cd01644 (RT_pepA17), cd01645 (RT_Rtv), cd01646 (RT_Bac_retron_I), cd01647 (RT_LTR), cd01648 (TERT), cd01650 (RT_nLTR_like), cd01651 (RT_G2_intron), cd01699 (RNA_dep_RNAP), cd01709 (RT_like_1), cd03487 (RT_Bac_retron_II), cd03714 (RT_DIRS1), cd03715 (RT_ZFREV_like).
- cd01644 RT_pepA17
- cd01645 RT_Rtv
- cd01646 RT_Bac_retron_I
- Proteins containing these domains can additionally be found by searching the domains on protein databases, such as InterPro (Mitchell et al. Nucleic Acids Res 47, D351-360 (2019)), UniProt (The UniProt Consortium Nucleic Acids Res 47, D506-515 (2019)), or the conserved domain database (Lu et al. Nucleic Acids Res 48, D265-268 (2020)), or by scanning open reading frames for reverse transcriptase domains using prediction tools, for example InterProScan.
- the diversity of reverse transcriptases e.g., comprising RT domains
- the RT domain exhibits enhanced stringency of target-primed reverse transcription (TPRT) initiation, e.g., relative to an endogenous RT domain.
- TPRT target-primed reverse transcription
- the RT domain initiates TPRT when the 3 nt in the target site immediately upstream of the first strand nick, e.g., the genomic DNA priming the RNA template, have at least 66% or 100% complementarity to the 3 nt of homology in the RNA template.
- the RT domain initiates TPRT when there are less than 5 nt mismatched (e.g., less than 1, 2, 3, 4, or 5 nt mismatched) between the template RNA homology and the target DNA priming reverse transcription.
- the RT domain is modified such that the stringency for mismatches in priming the TPRT reaction is increased, e.g., wherein the RT domain does not tolerate any mismatches or tolerates fewer mismatches in the priming region relative to a wild-type (e.g., unmodified) RT domain.
- the RT domain comprises a HIV-1 RT domain.
- the HIV-1 RT domain initiates lower levels of synthesis even with three nucleotide mismatches relative to an alternative RT domain (e.g., as described by Jamburuthugoda and Eickbush J Mol Biol 407(5):661-672 (2011); incorporated herein by reference in its entirety).
- the RT domain forms a dimer (e.g., a heterodimer or homodimer).
- the RT domain is monomeric.
- an RT domain e.g., a retroviral RT domain, naturally functions as a monomer or as a dimer (e.g., heterodimer or homodimer).
- an RT domain naturally functions as a monomer, e.g., is derived from a virus wherein it functions as a monomer.
- Exemplary monomeric RT domains, their viral sources, and the RT signatures associated with them can be found in Table 30 with descriptions of domain signatures in Table 32.
- the RT domain of a system described herein comprises an amino acid sequence of Table 30, or a functional fragment or variant thereof, or a sequence having at least 70%, 80%, 90%, 95%, or 99% identity thereto.
- the RT domain is selected from an RT domain from murine leukemia virus (MLV; sometimes referred to as MoMLV) (e.g., P03355), porcine endogenous retrovirus (PERV) (e.g., UniProt Q4VFZ2), mouse mammary tumor virus (MMTV) (e.g., UniProt P03365), Mason-Pfizer monkey virus (MPMV) (e.g., UniProt P07572), bovine leukemia virus (BLV) (e.g., UniProt P03361), human T-cell leukemia virus-1 (HTLV-1) (e.g., UniProt P03362), human foamy virus (HFV) (e.g., UniProt P14350), simian foamy virus (SFV) (e.g., UniProt P23074), or bovine foamy/syncytial virus (BFV/BSV) (e.g., UniProt O41894), or a functional fragment or
- an RT domain is dimeric in its natural functioning. Exemplary dimeric RT domains, their viral sources, and the RT signatures associated with them can be found in Table 31 with descriptions of domain signatures in Table 32.
- the RT domain of a system described herein comprises an amino acid sequence of Table 31, or a functional fragment or variant thereof, or a sequence having at least 70%, 80%, 90%, 95%, or 99% identity thereto.
- the RT domain is derived from a virus wherein it functions as a dimer.
- the RT domain is selected from an RT domain from avian sarcoma/leukemia virus (ASLV) (e.g., UniProt A0A142BKH1), Rous sarcoma virus (RSV) (e.g., UniProt P03354), avian myeloblastosis virus (AMV) (e.g., UniProt Q83133), human immunodeficiency virus type I (HIV-1) (e.g., UniProt P03369), human immunodeficiency virus type II (HIV-2) (e.g., UniProt P15833), simian immunodeficiency virus (SIV) (e.g., UniProt P05896), bovine immunodeficiency virus (BIV) (e.g., UniProt P19560), equine infectious anemia virus (EIAV) (e.g., UniProt P03371), or feline immunodeficiency virus (FIV) (e
- Naturally heterodimeric RT domains may, in some embodiments, also be functional as homodimers.
- dimeric RT domains are expressed as fusion proteins, e.g., as homodimeric fusion proteins or heterodimeric fusion proteins.
- the RT function of the system is fulfilled by multiple RT domains (e.g., as described herein).
- the multiple RT domains are fused or separate, e.g., may be on the same polypeptide or on different polypeptides.
- a GeneWriter described herein comprises an integrase domain, e.g., wherein the integrase domain may be part of the RT domain.
- an RT domain (e.g., as described herein) comprises an integrase domain.
- an RT domain (e.g., as described herein) lacks an integrase domain, or comprises an integrase domain that has been inactivated by mutation or deleted.
- a GeneWriter described herein comprises an RNase H domain, e.g., wherein the RNase H domain may be part of the RT domain.
- an RT domain (e.g., as described herein) comprises an RNase H domain, e.g., an endogenous RNAse H domain or a heterologous RNase H domain.
- an RT domain (e.g., as described herein) lacks an RNase H domain.
- an RT domain (e.g., as described herein) comprises an RNase H domain that has been added, deleted, mutated, or swapped for a heterologous RNase H domain.
- mutation of an RNase H domain yields a polypeptide exhibiting lower RNase activity, e.g., as determined by the methods described in Kotewicz et al.
- Nucleic Acids Res 16(1):265-277 (1988) (incorporated herein by reference in its entirety), e.g., lower by at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% compared to an otherwise similar domain without the mutation.
- RNase H activity is abolished.
- an RT domain is mutated to increase fidelity compared to to an otherwise similar domain without the mutation. For instance, in some embodiments, a YADD (SEQ ID NO: 1547) or YMDD (SEQ ID NO: 1548) motif in an RT domain (e.g., in a reverse transcriptase) is replaced with YVDD (SEQ ID NO: 1549).
- reverse transcriptase domains are modified, for example by site- specific mutation.
- reverse transcriptase domains comprise a number of amino acid substitutions relative to the natural sequence, e.g., at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 substitutions.
- the reverse transcriptase domain is engineered to bind a heterologous template RNA.
- Table 1 Exemplary reverse transcriptase domains from different types of sources. Sources include Group II intron, non-LTR retrotransposon, retrovirus, LTR retrotransposon, diversity-generating retroelement, retron, telomerase, retroplasmid, and evolved DNA polymerase. Also included are the associated RT signatures from the InterPro, pfam, and cd databases. Although the evolved polymerase RTX can perform RNA-dependent DNA polymerization, no RT signatures were identified by InterProScan, so polymerase signatures are included instead.
- Table 2 InterPro descriptions of signatures present in reverse transcriptases in Table 1. Table 30: Exemplary monomeric retroviral reverse transcriptases and their RT domain signatures
- Table 31 Exemplary dimeric retroviral reverse transcriptases and their RT domain signatures
- Table 32 InterPro descriptions of signatures present in reverse transcriptases in Table 30 (monomeric viral RTs) and Table 31 (dimeric viral RTs).
- Endonuclease domain In certain embodiments, the endonuclease/DNA binding domain of an APE-type retrotransposon or the endonuclease domain of an RLE-type retrotransposon can be used or can be modified (e.g., by insertion, deletion, or substitution of one or more residues) in a Gene Writer system described herein. In some embodiments the endonuclease domain or endonuclease/DNA binding domain is altered from its natural sequence to have altered codon usage, e.g. improved for human cells.
- the endonuclease element is a heterologous endonuclease element, such as Fok1 nuclease, a type-II restriction l-like endonuclease (RLE-type nuclease), or another RLE-type endonuclease (also known as REL).
- RLE-type nuclease a type-II restriction l-like endonuclease
- REL RLE-type endonuclease
- the heterologous endonuclease activity has nickase activity and does not form double stranded breaks.
- the heterologous endonuclease is a CRISPR-associated nuclease, e.g., Cas9, or a CRISPR-associated nuclease with nickase activity, e.g., a Cas9 nickase.
- the amino acid sequence of an endonuclease domain of a Gene Writer system described herein may be at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to the amino acid sequence of an endonuclease domain of a retrotransposon whose DNA sequence is referenced in Table 1, 2, or 3.
- BLAST Basic Local Alignment Search Tool
- the heterologous endonuclease is Fok1 or a functional fragment thereof.
- the heterologous endonuclease is a Holliday junction resolvase or homolog thereof, such as the Holliday junction resolving enzyme from Sulfolobus solfataricus––Ssol Hje (Govindaraju et al., Nucleic Acids Research 44:7, 2016).
- the heterologous endonuclease is the endonuclease of the large fragment of a spliceosomal protein, such as Prp8 (Mahbub et al., Mobile DNA 8:16, 2017).
- a Gene Writer polypeptide described herein may comprise a reverse transcriptase domain from an APE- or RLE-type retrotransposon and an endonuclease domain that comprises Fok1 or a functional fragment thereof.
- homologous endonuclease domains are modified, for example by site-specific mutation, to alter DNA endonuclease activity.
- endonuclease domains are modified to remove any latent DNA-sequence specificity.
- supplemental endonuclease activity may be beneficial for improving the resolution of the integration event (Anzalone et al., Nature 576, 149-157 (2019)).
- the endonuclease element of the polypeptide provides the nick for initiating target-primed reverse transcription and an additional heterologous domain of the polypeptide provides additional endonuclease activity.
- the additional endonuclease activity is provided by a nickase.
- the additional endonuclease activity may be provided by a heterologous DNA-binding element that also possesses endonuclease activity, e.g., a Cas9 nickase.
- the additional endonuclease activity may be contained within the first Gene Writer polypeptide.
- the additional endonuclease activity may be provided by a separate polypeptide.
- a Gene Writer polypeptide described herein comprises an endonuclease domain that cleaves at a predefined location in a target DNA sequence, e.g.. as measured using an assay of Example 32 herein.
- the endonuclease domain cleaves at a GG site in a target DNA sequence. In some embodiments, the endonuclease domain cleaves at an AAGG site in a target DNA sequence. In some embodiments, a target DNA sequence described herein comprises a GG or AAGG motif, e.g., a naturally occurring motif in the human genome.
- DNA binding domain In certain aspects, the DNA-binding domain of a Gene Writer polypeptide described herein is selected, designed, or constructed for binding to a desired host DNA target sequence. In certain embodiments, the DNA-binding domain of the engineered RLE is a heterologous DNA- binding protein or domain relative to a native retrotransposon sequence.
- the heterologous DNA binding element is a zinc-finger element or a TAL effector element, e.g., a zinc-finger or TAL polypeptide or functional fragment thereof.
- the heterologous DNA binding element is a sequence-guided DNA binding element, such as Cas9, Cpf1, or other CRISPR-related protein that has been altered to have no endonuclease activity.
- the heterologous DNA binding element retains endonuclease activity.
- the heterologous DNA binding element retains only single-stranded DNA cleavage activity, e.g., is a DNA nickase, e.g., is a Cas9 nickase.
- the heterologous DNA binding element with endonuclease activity replaces the endonuclease element of the polypeptide.
- the heterologous DNA binding element with endonuclease activity supplements the endonuclease element of the polypeptide, e.g., causes an additional nick at the target site.
- the heterologous DNA-binding domain can be any one or more of Cas9, TAL domain, ZF domain, Myb domain, combinations thereof, or multiples thereof.
- the heterologous DNA-binding domain is a DNA binding domain of a retrotransposon described in a table herein.
- DNA binding domains are modified, for example by site- specific mutation, increasing or decreasing DNA-binding elements (for example, number and/or specificity of zinc fingers), etc., to alter DNA-binding specificity and affinity.
- the DNA binding domain is altered from its natural sequence to have altered codon usage, e.g. improved for human cells.
- a polypeptide described herein comprises a mutation in a DNA binding domain.
- the mutation reduces or abrogates DNA-binding activity of the DNA binding domain, e.g., to less than 50%, 40%, 30%, 20%, 10%, 5%, 2%, or 1% of the corresponding wild-type sequence, e.g., in an assay of Example 30.
- the mutation may be, e.g., in a ZF1 domain, a ZF2 domain, or a c-myb domain.
- the mutation may be a point mutation.
- the mutation may be in a C residue (e.g., C to S), for instance in a C residue in a ZF1 or ZF2 domain; in an R residue (e.g., R to A), for instance in an R residue in a c-myb domain; or in a W residue (e.g., W to A), for instance in a W residue in a c-myb domain; or any combination thereof.
- the polypeptide ecomprising a mutation in a DNA binding domain further comprises a heterologous DNA binding domain.
- a naturally occurring AAGG sequence in the genome is used as a seed for retargeting an R2 retrotransposase-based Gene Writing system, wherein the DNA binding domain is mutated or replaced with a heterologous DNA binding domain such that the binding of the Gene Writer polypeptide to the new target site results in the proper positioning of the endonuclease domain to the AAGG motif to enable endonuclease activity.
- a target DNA sequence described herein comprises a motif recognized by an endonuclease domain (e.g., a GG or AAGG motif), e.g., a naturally occurring motif in the human genome.
- a GeneWriter comprises a DNA binding domain (e.g., a heterologous DNA binding domain) that binds near the motif recognized by the endonuclease domain, e.g., in such a way that the endonuclease domain of the GeneWriter is positioned to cleave the motif.
- the DNA binding domain binds a site that is within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, or 60 nucleotides of the motif recognized by an endonuclease domain (e.g., the GG or AAGG motif).
- the DNA binding domain may bind a site that is upstream or downstream of the GG or AAGG motif.
- the DNA binding domain may bind a site that is in the same orientation or the reverse complement orientation compared ot the motif recognized by an endonuclease domain (e.g., the GG or AAGG motif).
- a retargeted GeneWriter polypeptide comprises (i) an endonuclease domain that recognizes a motif, and (ii) a heterologous DNA binding domain that recognizes a genomic DNA sequence.
- the motif is about 30-80, 40-70, 50-60, or 55 nt upstream of the genomic DNA sequence, wherein optionally the motif and the genomic DNA sequence are in the same orientation.
- the motif is about 10-30, 15-25, or 20 nt downtream of the genomic DNA sequence, wherein optionally the motif is in the reverse orientation to the genomic DNA sequence.
- the DNA binding domain comprises a meganuclease domain (e.g., as described herein, e.g., in the endonuclease domain section), or a functional fragment thereof.
- the meganuclease domain possesses endonuclease activity, e.g., double-strand cleavage and/or nickase activity.
- the meganuclease domain has reduced activity, e.g., lacks endonuclease activity, e.g., the meganuclease is catalytically inactive.
- a catalytically inactive meganuclease is used as a DNA binding domain, e.g., as described in Fonfara et al. Nucleic Acids Res 40(2):847-860 (2012), incorporated herein by reference in its entirety.
- the DNA binding domain comprises one or more modifications relative to a wild- type DNA binding domain, e.g., a modification via directed evolution, e.g., phage-assisted continuous evolution (PACE).
- PACE phage-assisted continuous evolution
- the host DNA-binding site integrated into by the Gene Writer system can be in a gene, in an intron, in an exon, an ORF, outside of a coding region of any gene, in a regulatory region of a gene, or outside of a regulatory region of a gene.
- the engineered RLE may bind to one or more than one host DNA sequence.
- a Gene Writing system is used to edit a target locus in multiple alleles.
- a Gene Writing system is designed to edit a specific allele.
- a Gene Writing polypeptide may be directed to a specific sequence that is only present on one allele, e.g., comprises a template RNA with homology to a target allele, e.g., a gRNA or annealing domain, but not to a second cognate allele.
- a Gene Writing system can alter a haplotype-specific allele.
- a Gene Writing system that targets a specific allele preferentially targets that allele, e.g., has at least a 2, 4, 6, 8, or 10-fold preference for a target allele.
- a Gene WriterTM gene editor system RNA further comprises an intracellular localization sequence, e.g., a nuclear localization sequence.
- the nuclear localization sequence may be an RNA sequence that promotes the import of the RNA into the nucleus.
- the nuclear localization signal is located on the template RNA.
- the retrotransposase polypeptide is encoded on a first RNA, and the template RNA is a second, separate, RNA, and the nuclear localization signal is located on the template RNA and not on an RNA encoding the retrotransposase polypeptide.
- the RNA encoding the retrotransposase is targeted primarily to the cytoplasm to promote its translation, while the template RNA is targeted primarily to the nucleus to promote its retrotransposition into the genome.
- the nuclear localization signal is at the 3’ end, 5’ end, or in an internal region of the template RNA. In some embodiments the nuclear localization signal is 3’ of the heterologous sequence (e.g., is directly 3’ of the heterologous sequence) or is 5’ of the heterologous sequence (e.g., is directly 5’ of the heterologous sequence). In some embodiments the nuclear localization signal is placed outside of the 5’ UTR or outside of the 3’ UTR of the template RNA.
- the nuclear localization signal is placed between the 5’ UTR and the 3’ UTR, wherein optionally the nuclear localization signal is not transcribed with the transgene (e.g., the nuclear localization signal is an anti-sense orientation or is downstream of a transcriptional termination signal or polyadenylation signal).
- the nuclear localization sequence is situated inside of an intron.
- a plurality of the same or different nuclear localization signals are in the RNA, e.g., in the template RNA.
- the nuclear localization signal is less than 5, 10, 25, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900 or 1000 bp in legnth.
- RNA nuclear localization sequences can be used. For example, Lubelsky and Ulitsky, Nature 555 (107-111), 2018 describe RNA sequences which drive RNA localization into the nucleus.
- the nuclear localization signal is a SINE-derived nuclear RNA localization (SIRLOIN) signal.
- the nuclear localization signal binds a nuclear-enriched protein.
- the nuclear localization signal binds the HNRNPK protein.
- the nuclear localization signal is rich in pyrimidines, e.g., is a C/T rich, C/U rich, C rich, T rich, or U rich region.
- the nuclear localization signal is derived from a long non-coding RNA.
- the nuclear localization signal is derived from MALAT1 long non-coding RNA or is the 600 nucleotide M region of MALAT1 (described in Miyagawa et al., RNA 18, (738-751), 2012).
- the nuclear localization signal is derived from BORG long non-coding RNA or is a AGCCC motif (described in Zhang et al., Molecular and Cellular Biology 34, 2318-2329 (2014).
- the nuclear localization sequence is described in Shukla et al., The EMBO Journal e98452 (2016).
- the nuclear localization signal is derived from a non-LTR retrotransposon, an LTR retrotransposon, retrovirus, or an endogenous retrovirus.
- a polypeptide described herein comprises one or more (e.g., 2, 3, 4, 5) nuclear targeting sequences, for example, a nuclear localization sequence (NLS), e.g., as described above.
- NLS nuclear localization sequence
- the NLS is a bipartite NLS.
- an NLS facilitates the import of a protein comprising an NLS into the cell nucleus.
- the NLS is fused to the N-terminus of a Gene Writer described herein.
- the NLS is fused to the C-terminus of the Gene Writer. In some embodiments, the NLS is fused to the N-terminus or the C-terminus of a Cas domain. In some embodiments, a linker sequence is disposed between the NLS and theneighboring domain of the Gene Writer. In some embodiments, an NLS comprises the amino acid sequence MDSLLMNRRKFLYQFKNVRWAKGRRETYLC(SEQ ID NO 1585) 1591), or a functional fragment or variant thereof. Exemplary NLS sequences are also described in PCT/EP2000/011690, the contents of which are incorporated herein by reference for their disclosure of exemplary nuclear localization sequences.
- an NLS comprises an amino acid sequence as disclosed in Table 39.
- An NLS of this table may be utilized with one or more copies in a polypeptide in one or more locations in a polypeptide, e.g., 1, 2, 3 or more copies of an NLS in an N-terminal domain, within peptide domains, between peptide domains, in a C-terminal domain, or in a combination of locations, in order to improve subcellular localization to the nucleus.
- Multiple unique sequences may be used within a single polypeptide. Sequences may be naturally monopartite or bipartite, e.g., having one or two stretches of basic amino acids, or may be used as chimeric bipartite sequences.
- the NLS is a bipartite NLS.
- a bipartite NLS typically comprises two basic amino acid clusters separated by a spacer sequence (which may be, e.g., about 10 amino acids in length).
- a monopartite NLS typically lacks a spacer.
- An example of a bipartite NLS is the nucleoplasmin NLS, having the sequence KR[PAATKKAGQA]KKKK (SEQ ID NO: 1591), wherein the spacer is bracketed.
- Another exemplary bipartite NLS has the sequence PKKKRKVEGADKRTADGSEFESPKKKRKV (SEQ ID NO: 1593).
- a Gene WriterTM gene editor system polypeptide further comprises an intracellular localization sequence, e.g., a nuclear localization sequence and/or a nucleolar localization sequence.
- the nuclear localization sequence and/or nucleolar localization sequence may be amino acid sequences that promote the import of the protein into the nucleus and/or nucleolus, where it can promote integration of heterologous sequyence into the genome.
- a Gene Writer gene editor system polypeptide (e.g., a retrotransposase, e.g., a polypeptide according to any of Tables 1, 2, or 3 herein) further comprises a nucleolar localization sequence.
- the retrotransposase polypeptide is encoded on a first RNA
- the template RNA is a second, separate, RNA
- the nucleolar localization signal is encoded on the RNA encoding the retrotransposase polypeptide and not on the template RNA.
- the nucleolar localization signal is located at the N-terminus, C- terminus, or in an internal region of the polypeptide.
- a plurality of the same or different nucleolar localization signals are used.
- the nuclear localization signal is less than 5, 10, 25, 50, 75, or 100 amino acids in length.
- Various polypeptide nucleolar localization signals can be used. For example, Yang et al., Journal of Biomedical Science 22, 33 (2015), describe a nuclear localization signal that also functions as a nucleolar localization signal.
- the nucleolar localization signal may also be a nuclear localization signal.
- the nucleolar localization signal may overlap with a nuclear localization signal.
- the nucleolar localization signal may comprise a stretch of basic residues.
- the nucleolar localization signal may be rich in arginine and lysine residues. In some embodiments, the nucleolar localization signal may be derived from a protein that is enriched in the nucleolus. In some embodiments, the nucleolar localization signal may be derived from a protein enriched at ribosomal RNA loci. In some embodiments, the nucleolar localization signal may be derived from a protein that binds rRNA. In some embodiments, the nucleolar localization signal may be derived from MSP58. In some embodiments, the nucleolar localization signal may be a monopartite motif. In some embodiments, the nucleolar localization signal may be a bipartite motif.
- the nucleolar localization signal may consist of a multiple monopartite or bipartite motifs. In some embodiments, the nucleolar localization signal may consist of a mix of monopartite and bipartite motifs. In some embodiments, the nucleolar localization signal may be a dual bipartite motif. In some embodiments, the nucleolar localization motif may be a KRASSQALGTIPKRRSSSRFIKRKK (SEQ ID NO: 1530). In some embodiments, the nucleolar localization signal may be derived from nuclear factor- ⁇ B-inducing kinase.
- the nucleolar localization signal may be an RKKRKKK motif (SEQ ID NO: 1531) (described in Birbach et al., Journal of Cell Science, 117 (3615-3624), 2004). Since an endogenous nucleolar localization signal may help drive the Gene Writer polypeptide to the nucleolus for those polypeptides derived from retrotransposons naturally targeting the rDNA, e.g., R1, R2, R4, R8, R9, it may be beneficial to inactivate this signal when retargeting to a site outside of the rDNA.
- An endogenous nucleolar localization signal (NoLS) can be computationally predicted using a published algorithm trained on validated proteins that localize to the nucleolus (Scott, M.
- the predicted NoLS sequence is based on both amino acid sequence, amino acid sequence context, and predicted secondary structure of the retrotransposase.
- the identified sequence is typically rich with basic amino acids (Scott, M. S., et al, Nucleic Acids Research, 38(21), 7388– 7399 (2010)) and mutating these residues to simple side-chain, non-basic, amino acids or removing them from the polypeptide chain can prevent localization to the nucleolus (Yang, C. P., et. al., Journal of Biomedical Science, 22(1), 1–15. (2015), Martin, R. M., et.
- the NoLS sequence is located in the amino acid region of a retrotransposase that is between the reverse transcriptase domain and the restriction-like endonuclease domain.
- a predicted NoLS region contains lysine, arginine, histidine, and/or glutamine amino acids and nucleolar localization is inactivated by mutation of one or more of these residues to alanine and/or removal from the polypeptide.
- a nucleic acid described herein e.g., an RNA encoding a GeneWriter polypeptide, or a DNA encoding the RNA
- the microRNA binding site is used to increase the target-cell specificity of a GeneWriter system.
- the microRNA binding site can be chosen on the basis that is is recognized by a miRNA that is present in a non-target cell type, but that is not present (or is present at a reduced level relative to the non-target cell) in a target cell type.
- the miRNA a miRNA that is present in a non-target cell type
- the RNA encoding the GeneWriter polypeptide is present in a non-target cell, it would be bound by the miRNA, and when the RNA encoding the GeneWriter polypeptide is present in a target cell, it would not be bound by the miRNA (or bound but at reduced levels relative to the non-target cell).
- binding of the miRNA to the RNA encoding the GeneWriter polypeptide may reduce production of the GeneWriter polypeptide, e.g., by degrading the mRNA encoding the polypeptide or by interfering with translation. Accordingly, the heterologous object sequence would be inserted into the genome of target cells more efficiently than into the genome of non-target cells.
- a system having a microRNA binding site in the RNA encoding the GeneWriter polypeptide (or encoded in the DNA encoding the RNA) may also be used in combination with a template RNA that is regulated by a second microRNA binding site, e.g., as described herein in the section entitled “Template RNA component of Gene WriterTM gene editor system.”
- a miRNA is selected from Table 4 of WO2020014209, which is hereby incorporated by reference.
- the DNA encoding a Gene Writer polypeptide comprises a promoter sequence, e.g., a tissue specific promoter sequence.
- the tissue-specific promoter is used to increase the target-cell specificity of a Gene WriterTM system.
- the promoter can be chosen on the basis that it is active in a target cell type but not active in (or active at a lower level in) a non-target cell type.
- a system having a tissue-specific promoter sequence in the DNA of the polypeptide may also be used in combination with a microRNA binding site, e.g., in the template RNA or a nucleic acid encoding a Gene WriterTM protein, e.g., as described herein.
- a system having a tissue-specific promoter sequence in the DNA encoding the Gene Writer polypeptide may also be used in combination with a DNA encoding the RNA template driven by a tissue-specific promoter, e.g., to achieve higher levels of RNA template in target cells than in non-target cells.
- a tissue-specific promoter is selected from Table 3 of WO2020014209, which is hereby incorporated by reference.
- a skilled artisan can, based on the Accession numbers provided in Tables 1-3 determine the nucleic acid and corresponding polypeptide sequences of each retrotransposon and domains thereof, e.g., by using routine sequence analysis tools as Basic Local Alignment Search Tool (BLAST) or CD-Search for conserved domain analysis.
- BLAST Basic Local Alignment Search Tool
- CD-Search for conserved domain analysis.
- Other sequence analysis tools are known and can be found, e.g., at https://molbiol-tools.ca, for example, at https://molbiol- tools.ca/Motifs.htm.
- SEQ ID NOs 1-112 align with each row in Table 1
- SEQ ID NOs 113- 1015 align with the first 903 rows of Table 2.
- Tables 1-3 herein provide the sequences of exemplary transposons, including the amino acid sequence of the retrotransposase, and sequences of 5’ and 3’ untranslated regions to allow the retrotransposase to bind the template RNA, and the full transposon nucleic acid sequence.
- a 5’ UTR of any of Tables 1-3 allows the retrotransposase to bind the template RNA.
- a 3’ UTR of any of Tables 1-3 allows the retrotransposase to bind the template RNA.
- a polypeptide for use in any of the systems described herein can be a polypeptide of any of Tables 1-3 herein, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto.
- the system further comprises one or both of a 5’ or 3’ untranslated region of any of Tables 1-3 herein (or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto), e.g., from the same transposon as the polypeptide referred to in the preceding sentence, as indicated in the same row of the same table.
- the system comprises one or both of a 5’ or 3’ untranslated region of any of Tables 1-3 herein, e.g., a segment of the full transposon sequence that encodes an RNA that is capable of binding a retrotransposase, and/or the sub-sequence provided in the column entitled Predicted 5’ UTR or Predicted 3’ UTR.
- a polypeptide for use in any of the systems described herein can be a molecular reconstruction or ancestral reconstruction based upon the aligned polypeptide sequence of multiple retrotransposons.
- a 5’ or 3’ untranslated region for use in any of the systems described herein can be a molecular reconstruction based upon the aligned 5’ or 3’ untranslated region of multiple retrotransposons.
- a skilled artisan can, based on the Accession numbers provided herein, align polypeptides or nucleic acid sequences, e.g., by using routine sequence analysis tools as Basic Local Alignment Search Tool (BLAST) or CD- Search for conserved domain analysis.
- BLAST Basic Local Alignment Search Tool
- Molecular reconstructions can be created based upon sequence consensus, e.g. using approaches described in Ivics et al., Cell 1997, 501 – 510 ; Wagstaff et al., Molecular Biology and Evolution 2013, 88-99.
- the retrotransposon from which the 5’ or 3’ untranslated region or polypeptide is derived is a young or a recently active mobile element, as assessed via phylogenetic methods such as those described in Boissinot et al., Molecular Biology and Evolution 2000, 915-928.
- Table 3 shows exemplary Gene Writer proteins and associated sequences from a variety of retrotransposases, identified using data mining. Column 1 indicates the family to which the retrotransposon belongs. Column 2 lists the element name. Column 3 indicates an accession number, if any. Column 4 lists an organism in which the retrotransposase is found. Column 5 lists the DNA sequence of the retrotransposon.
- Column 6 lists the predicted 5’ untranslated region, and column 7 lists the predicted 3’ untranslated region; both are segments of the sequence of column 5 that are predicted to allow the template RNA to bind the retrotransposase of column 8. (It is understood that columns 5-7 show the DNA sequence, and that an RNA sequence according to any of columns 5-7 would typically include uracil rather than thymidine.)
- Column 8 lists the predicted retrotransposase sequence encoded in the retrotransposon of column 5.
- AKS A G L S AV D I RR Y H L KF GAEYARDVT TRTS A A L VA R E A L T R A V C T TAC AC GAC ACGT CGC CC C T GCG TGC GCCGT T T GGGGC C GGAGC ACC T GGGGT AGAGGAC AGCT C GT T TC T CT AGGGAAT C T TG TAA ACAAGGCAC AC T ATTT C TTC GCAGCT CC GGAAC G GGAG GGGT G GC T CC GGT AGT G AT GT C T C C GAGGT G T GAAC T C AC GAT AGGGC GAT AT T GATC T GT G AGT AC AGC C C C C AGAT T A T T AT C T A C T C AC T C T C C GC GT T C T T T AC C T C T T T AC C C A AAAGGA GC A GC C AC C ACT AAAGAT T G AGGT T GGT C AC C C T T T T GT C GT GC T C G C G
- a writing domain (e.g., RT domain) comprises an RNA-binding domain, e.g., that specifically binds to an RNA sequence.
- a template RNA comprises an RNA sequence that is specifically bound by the RNA-binding domain of the writing domain.
- Template nucleic acid binding domain The Gene Writer polypeptide typically contains regions capable of associating with the Gene Writer template nucleic acid (e.g., template RNA).
- the template nucleic acid binding domain is an RNA binding domain.
- the RNA binding domain is a modular domain that can associate with RNA molecules containing specific signatures, e.g., structural motifs, e.g., secondary structures present in the 3’ UTR in non-LTR retrotransposons.
- the template nucleic acid binding domain e.g., RNA binding domain
- the reverse transcription domain e.g., the reverse transcriptase-derived component has a known signature for RNA preference, e.g., secondary structures present in the 3’ UTR in non-LTR retrotransposons.
- the template nucleic acid binding domain e.g., RNA binding domain
- the DNA binding domain is contained within the DNA binding domain.
- the DNA binding domain is a CRISPR- associated protein that recognizes the structure of a template nucleic acid (e.g., template RNA) comprising a gRNA.
- a template nucleic acid e.g., template RNA
- the gRNA is a short synthetic RNA composed of a scaffold sequence that participates in CRISPR-associated protein binding and a user-defined ⁇ 20 nucleotide targeting sequence for a genomic target.
- the structure of a complete gRNA was described by Nishimasu et al. Cell 156, P935-949 (2014).
- the gRNA (also referred to as sgRNA for single-guide RNA) consists of crRNA- and tracrRNA-derived sequences connected by an artificial tetraloop.
- the crRNA sequence can be divided into guide (20 nt) and repeat (12 nt) regions, whereas the tracrRNA sequence can be divided into anti-repeat (14 nt) and three tracrRNA stem loops (Nishimasu et al. Cell 156, P935-949 (2014)).
- guide RNA sequences are generally designed to have a length of between 17 – 24 nucleotides (e.g., 19, 20, or 21 nucleotides) and be complementary to a targeted nucleic acid sequence. Custom gRNA generators and algorithms are available commercially for use in the design of effective guide RNAs.
- the gRNA comprises two RNA components from the native CRISPR system, e.g. crRNA and tracrRNA.
- the gRNA may also comprise a chimeric, single guide RNA (sgRNA) containing sequence from both a tracrRNA (for binding the nuclease) and at least one crRNA (to guide the nuclease to the sequence targeted for editing/binding).
- sgRNA single guide RNA
- a gRNA comprises a nucleic acid sequence that is complementary to a DNA sequence associated with a target gene.
- a polypeptide comprises a DNA-binding domain comprising a CRISPR-associated protein that associates with a gRNA that allows the DNA-binding domain to bind a target genomic DNA sequence.
- the gRNA is comprised within the template nucleic acid (e.g., template RNA), thus the DNA-binding domain is also the template nucleic acid binding domain.
- the polypeptide possesses RNA binding function in multiple domains, e.g., can bind a gRNA structure in a CRISPR-associated DNA binding domain and a 3’ UTR structure in a non-LTR retrotransposon derived reverse transcription domain.
- a Gene Writer polypeptide possesses the function of DNA target site cleavage via an endonuclease domain.
- the endonuclease domain is also a DNA-binding domain.
- the endonuclease domain is also a template nucleic acid (e.g., template RNA) binding domain.
- a polypeptide comprises a CRISPR-associated endonuclease domain that binds a template RNA comprising a gRNA, binds a target DNA sequence (e.g., with complementarity to a portion of the gRNA), and cuts the target DNA sequence.
- the endonuclease/DNA binding domain of an APE-type retrotransposon or the endonuclease domain of an RLE-type retrotransposon can be used or can be modified (e.g., by insertion, deletion, or substitution of one or more residues) in a Gene Writer system described herein.
- the endonuclease domain or endonuclease/DNA binding domain is altered from its natural sequence to have altered codon usage, e.g. improved for human cells.
- the endonuclease element is a heterologous endonuclease element, such as Fok1 nuclease, a type-II restriction l-like endonuclease (RLE-type nuclease), or another RLE-type endonuclease (also known as REL).
- RLE-type nuclease a type-II restriction l-like endonuclease
- REL RLE-type endonuclease
- the heterologous endonuclease activity has nickase activity and does not form double stranded breaks.
- the amino acid sequence of an endonuclease domain of a Gene Writer system described herein may be at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identical to the amino acid sequence of an endonuclease domain of a retrotransposon whose DNA sequence is referenced in Table 1, 2, or 3.
- BLAST Basic Local Alignment Search Tool
- the heterologous endonuclease is Fok1 or a functional fragment thereof.
- the heterologous endonuclease is a Holliday junction resolvase or homolog thereof, such as the Holliday junction resolving enzyme from Sulfolobus solfataricus––Ssol Hje (Govindaraju et al., Nucleic Acids Research 44:7, 2016).
- the heterologous endonuclease is the endonuclease of the large fragment of a spliceosomal protein, such as Prp8 (Mahbub et al., Mobile DNA 8:16, 2017).
- the heterologous endonuclease is derived from a CRISPR-associated protein, e.g., Cas9.
- the heterologous endonuclease is engineered to have only ssDNA cleavage activity, e.g., only nickase activity, e.g., be a Cas9 nickase.
- a Gene Writer polypeptide described herein may comprise a reverse transcriptase domain from an APE- or RLE-type retrotransposon and an endonuclease domain that comprises Fok1 or a functional fragment thereof.
- homologous endonuclease domains are modified, for example by site-specific mutation, to alter DNA endonuclease activity.
- endonuclease domains are modified to remove any latent DNA-sequence specificity.
- the endonuclease domain has nickase activity and does not form double stranded breaks.
- the endonuclease domain forms single stranded breaks at a higher frequency than double stranded breaks, e.g., at least 90%, 95%, 96%, 97%, 98%, or 99% of the breaks are single stranded breaks, or less than 10%, 5%, 4%, 3%, 2%, or 1% of the breaks are double stranded breaks.
- the endonuclease forms substantially no double stranded breaks.
- the enonuclease does not form detectable levels of double stranded breaks.
- the endonuclease domain has nickase activity that nicks the target site DNA of the to-be-edited strand; e.g., in some embodiments, the endonuclease domain cuts the genomic DNA of the target site near to the site of alteration on the strand that will be extended by the writing domain. In some embodiments, the endonuclease domain has nickase activity that nicks the target site DNA of the to-be-edited strand and does not nick the target site DNA of the non-edited strand.
- a polypeptide comprises a CRISPR-associated endonuclease domain having nickase activity and that does not form double stranded breaks
- said CRISPR-associated endonuclease domain nicks the target site DNA strand containing the PAM site (e.g., and does not nick the target site DNA strand that does not contain the PAM site).
- the endonuclease domain has nickase activity that nicks the target site DNA of the to-be-edited strand and the non-edited strand.
- a writing domain e.g., RT domain
- a polypeptide described herein polymerizes (e.g., reverse transcribes) from the heterologous object sequence of a template nucleic acid (e.g., template RNA)
- the cellular DNA repair machinery must repair the nick on the to-be-edited DNA strand.
- the target site DNA now contains two different sequences for the to- be-edited DNA strand: one corresponding to the original genomic DNA and a second corresponding to that polymerized from the heterologous object sequence.
- the additional nick is positioned at least 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, or 150 nucleotides 5’ or 3’ of the target site modification (e.g., the insertion, deletion, or substitution) or to the nick on the to-be-edited strand.
- the target site modification e.g., the insertion, deletion, or substitution
- an additional nick to the non-edited strand may promote second strand synthesis.
- the polypeptide comprises a single domain having endonuclease activity (e.g., a single endonuclease domain) and said domain nicks both the to-be-edited strand and the non-edited strand.
- the endonuclease domain may be a CRISPR-associated endonuclease domain
- the template nucleic acid e.g., template RNA
- the template nucleic acid comprises a gRNA that directs nicking of the to-be-edited strand and an additional gRNA that directs nicking of the non-edited strand.
- the polypeptide comprises a plurality of domains having endonuclease activity, and a first endonuclease domain nicks the to- be-edited strand and a second endonuclease domain nicks the non-edited strand (optionally, the first endonuclease domain does not (e.g., cannot) nick the non-edited strand and the second endonuclease domain does not (e.g., cannot) nick the to-be-edited strand).
- the endonuclease domain is capable of nicking a first strand and a second strand.
- the first and second strand nicks occur at the same position in the target site but on opposite strands.
- the second strand nick occurs in a staggered location, e.g., upstream or downstream, from the first nick.
- the endonuclease domain generates a target site deletion if the second strand nick is upstream of the first strand nick.
- the endonuclease domain generates a target site duplication if the second strand nick is downstream of the first strand nick.
- the endonuclease domain generates no duplication and/or deletion if the first and second strand nicks occur in the same position of the target site (e.g., as described in Gladyshev and Arkhipova Gene 2009, incorporated by reference herein in its entirety).
- the endonuclease domain has altered activity depending on protein conformation or RNA-binding status, e.g., which promotes the nicking of the first or second strand (e.g., as described in Christensen et al. PNAS 2006; incorporated by reference herein in its entirety).
- the endonuclease domain comprises a meganuclease, or a functional fragment thereof.
- the endonuclease domain comprises a homing endonuclease, or a functional fragment thereof.
- the endonuclease domain comprises a meganuclease from the LAGLIDADG (SEQ ID NO: 1594), GIY-YIG, HNH, His-Cys Box, or PD-(D/E) XK families, or a functional fragment or variant thereof, e.g., which possess conserved amino acid motifs, e.g., as indicated in the family names.
- the endonuclease domain comprises a meganuclease, or fragment thereof, chosen from, e.g., I-SmaMI (Uniprot F7WD42), I-SceI (Uniprot P03882), I-AniI (Uniprot P03880), I- DmoI (Uniprot P21505), I-CreI (Uniprot P05725), I-TevI (Uniprot P13299), I-OnuI (Uniprot Q4VWW5), or I-BmoI (Uniprot Q9ANR6).
- I-SmaMI Uniprot F7WD42
- I-SceI Uniprot P03882
- I-AniI Uniprot P03880
- I- DmoI Uniprot P21505
- I-CreI Uniprot P05725)
- I-TevI Uniprot P13299
- the meganuclease is naturally monomeric, e.g., I-SceI, I-TevI, or dimeric, e.g., I-CreI, in its functional form.
- the LAGLIDADG (SEQ ID NO: 1594) meganucleases with a single copy of the LAGLIDADG (SEQ ID NO: 1594)motif generally form homodimers, whereas members with two copies of the LAGLIDADG (SEQ ID NO: 1594)motif are generally found as monomers.
- a meganuclease that normally forms as a dimer is expressed as a fusion, e.g., the two subunits are expressed as a single ORF and, optionally, connected by a linker, e.g., an I- CreI dimer fusion (Rodriguez-Fornes et al. Gene Therapy 2020; incorporated by reference herein in its entirety).
- a meganuclease, or a functional fragment thereof is altered to favor nickase activity for one strand of a double-stranded DNA molecule, e.g., I-SceI (K122I and/or K223I) (Niu et al.
- a meganuclease or functional fragment thereof possessing this preference for single-strand cleavage is used as an endonuclease domain, e.g., with nickase activity.
- an endonuclease domain comprises a meganuclease, or a functional fragment thereof, which naturally targets or is engineered to target a safe harbor site, e.g., an I-CreI targeting SH6 site (Rodriguez-Fornes et al., supra).
- an endonuclease domain comprises a meganuclease, or a functional fragment thereof, with a sequence tolerant catalytic domain, e.g., I-TevI recognizing the minimal motif CNNNG (Kleinstiver et al. PNAS 2012).
- a target sequence tolerant catalytic domain is fused to a DNA binding domain, e.g., to direct activity, e.g., by fusing I-TevI to: (i) zinc fingers to create Tev- ZFEs (Kleinstiver et al. PNAS 2012), (ii) other meganucleases to create MegaTevs (Wolfs et al. Nucleic Acids Res 2014), and/or (iii) Cas9 to create TevCas9 (Wolfs et al. PNAS 2016).
- the endonuclease domain comprises a restriction enzyme, e.g., a Type IIS or Type IIP restriction enzyme.
- the endonuclease domain comprises a Type IIS restriction enzyme, e.g., FokI, or a fragment or variant thereof.
- the endonuclease domain comprises a Type IIP restriction enzyme, e.g., PvuII, or a fragment or variant thereof.
- a dimeric restriction enzyme is expressed as a fusion such that it functions as a single chain, e.g., a FokI dimer fusion (Minczuk et al. Nucleic Acids Res 36(12):3926-3938 (2008)).
- an endonuclease domain or DNA binding domain comprises a Cas protein, e.g., a Streptococcus pyogenes Cas9 (SpCas9) or a functional fragment or variant thereof.
- the endonuclease domain or DNA binding domain comprises a modified SpCas9.
- the modified SpCas9 comprises a modification that alters protospacer-adjacent motif (PAM) specificity.
- the PAM has specificity for the nucleic acid sequence 5’-NGT-3’.
- the modified SpCas9 comprises one or more amino acid substitutions, e.g., at one or more of positions L1111, D1135, G1218, E1219, A1322, of R1335, e.g., selected from L1111R, D1135V, G1218R, E1219F, A1322R, R1335V.
- the modified SpCas9 comprises the amino acid substitution T1337R and one or more additional amino acid substitutions, e.g., selected from L1111, D1135L, S1136R, G1218S, E1219V, D1332A, D1332S, D1332T, D1332V, D1332L, D1332K, D1332R, R1335Q, T1337, T1337L, T1337Q, T1337I, T1337V, T1337F, T1337S, T1337N, T1337K, T1337H, T1337Q, and T1337M, or corresponding amino acid substitutions thereto.
- additional amino acid substitutions e.g., selected from L1111, D1135L, S1136R, G1218S, E1219V, D1332A, D1332S, D1332T, D1332V, D1332L, D1332K, D1332R, R1335Q, T1337, T1337L,
- the modified SpCas9 comprises: (i) one or more amino acid substitutions selected from D1135L, S1136R, G1218S, E1219V, A1322R, R1335Q, and T1337; and (ii) one or more amino acid substitutions selected from L1111R, G1218R, E1219F, D1332A, D1332S, D1332T, D1332V, D1332L, D1332K, D1332R, T1337L, T1337I, T1337V, T1337F, T1337S, T1337N, T1337K, T1337R, T1337H, T1337Q, and T1337M, or corresponding amino acid substitutions thereto.
- a Gene Writer may comprise a Cas protein as listed in Table 40.
- the predicted or validated nickase mutations for installing Nickase activity in the Cas protein as shown in Table 40, are based on the signature of the SpCas9(N863A) mutation.
- system described herein comprises a GeneWriter protein of Table 3 and a Cas protein of Table 40 A.
- a GeneWriter protein of Table 3 is fused to a Cas protein of Table 40 A.
- Table 40A CRISPR/Cas Proteins, Species, and Mutations o.
- Table 40B provides parameters to define the necessary components for designing gRNA and/or Template RNAs to apply Cas variants listed in Table 3A for Gene Writing.
- Tier indicates preferred Cas variants if they are available for use at a given locus.
- the cut site indicates the validated or predicted protospacer adjacent motif (PAM) requirements, validated or predicted location of cut site (relative to the most upstream base of the PAM site).
- the gRNA for a given enzyme can be assembled by concatenating the crRNA, Tetraloop, and tracrRNA sequences, and further adding a 5’ spacer of a length within Spacer (min) and Spacer (max) that matches a protospacer at a target site.
- the predicted location of the ssDNA nick at the target is important for designing the 3’ region of a Template RNA that needs to anneal to the sequence immediately 5’ of the nick in order to initiate target primed reverse transcription.
- an endonuclease domain or DNA binding domain comprises a Cas domain, e.g., a Cas9 domain.
- the endonuclease domain or DNA binding domain comprises a nuclease-active Cas domain, a Cas nickase (nCas) domain, or a nuclease-inactive Cas (dCas) domain.
- the endonuclease domain or DNA binding domain comprises a nuclease-active Cas9 domain, a Cas9 nickase (nCas9) domain, or a nuclease-inactive Cas9 (dCas9) domain.
- the endonuclease domain or DNA binding domain comprises a Cas9 domain of Cas9 (e.g., dCas9 and nCas9), Cas12a/Cpfl, Cas12b/C2cl, Cas12c/C2c3, Cas12d/CasY, Cas12e/CasX, Cas12g, Cas12h, or Cas12i.
- the endonuclease domain or DNA binding domain comprises a Cas9 (e.g., dCas9 and nCas9), Cas12a/Cpfl, Cas12b/C2cl, Cas12c/C2c3, Cas12d/CasY, Cas12e/CasX, Cas12g, Cas12h, or Cas12i.
- the endonuclease domain or DNA binding domain comprises an S. pyogenes or an S. thermophilus Cas9, or a functional fragment thereof.
- the endonuclease domain or DNA binding domain comprises a Cas9 sequence, e.g., as described in Chylinski, Rhun, and Charpentier (2013) RNA Biology 10:5, 726-737; incorporated herein by reference.
- the endonuclease domain or DNA binding domain comprises the HNH nuclease subdomain and/or the RuvC1 subdomain of a Cas, e.g., Cas9, e.g., as described herein, or a variant thereof.
- the endonuclease domain or DNA binding domain comprises Cas12a/Cpfl, Cas12b/C2cl, Cas12c/C2c3, Cas12d/CasY, Cas12e/CasX, Cas12g, Cas12h, or Cas12i.
- the endonuclease domain or DNA binding domain comprises a Cas polypeptide (e.g., enzyme), or a functional fragment thereof.
- the Cas polypeptide (e.g., enzyme) is selected from Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas5d, Cas5t, Cas5h, Cas5a, Cas6, Cas7, Cas8, Cas8a, Cas8b, Cas8c, Cas9 (e.g., Csn1 or Csx12), Cas10, Cas10d, Cas12a/Cpfl, Cas12b/C2cl, Cas12c/C2c3, Cas12d/CasY, Cas12e/CasX, Cas12g, Cas12h, Cas12i, Csy1 , Csy2, Csy3, Csy4, Cse1, Cse2, Cse3, Cse4, Cse5e, Csc1, Csc2, Csa5, Csn1, Csn2, Csm1, Csm2, Csm3, Csm4, Csm
- the Cas9 comprises one or more substiutions, e.g., selected from H840A, D10A, P475A, W476A, N477A, D1125A, W1126A, and D1127A.
- the Cas9 comprises one or more mutations at positions selected from: D10, G12, G17, E762, H840, N854, N863, H982, H983, A984, D986, and/or A987, e.g., one or more substitutions selected from D10A, G12A, G17A, E762A, H840A, N854A, N863A, H982A, H983A, A984A, and/or D986A.
- the endonuclease domain or DNA binding domain comprises a Cas (e.g., Cas9) sequence from Corynebacterium ulcerans, Corynebacterium diphtheria, Spiroplasma syrphidicola, Prevotella intermedia, Spiroplasma taiwanense, Streptococcus iniae, Belliella baltica, Psychroflexus torquis, Streptococcus thermophilus, Listeria innocua, Campylobacter jejuni, Neisseria meningitidis, Streptococcus pyogenes, or Staphylococcus aureus, or a fragment or variant thereof.
- Cas e.g., Cas9 sequence from Corynebacterium ulcerans, Corynebacterium diphtheria, Spiroplasma syrphidicola, Prevotella intermedia, Spiroplasma taiwanense, Streptococcus in
- an endonuclease domain or DNA binding domain comprises a Cpf1 domain, e.g., comprising one or more substitutions, e.g., at position D917, E1006A, D1255 or any combination thereof, e.g., selected from D917A, E1006A, D1255A, D917A/E1006A, D917A/D1255A, E1006A/D1255A, and D917A/E1006A/D1255A.
- an endonuclease domain or DNA binding domain comprises spCas9, spCas9-VRQR (SEQ ID NO: 1696), spCas9- VRER (SEQ ID NO: 1697), xCas9 (sp), saCas9, saCas9-KKH, spCas9-MQKSER (SEQ ID NO: 1698), spCas9-LRKIQK (SEQ ID NO: 1699), or spCas9- LRVSQL (SEQ ID NO: 1700).
- an endonuclease domain or DNA binding domain comprises an amino acid sequence as listed in Table 37 below, or an amino acid sequence having at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity thereto.
- the endonuclease domain or DNA-binding domain comprises an amino acid sequence that has no more than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 differences (e.g., mutations) relative to any of the amino acid sequences described herein.
- Table 37 Each of the Reference Sequences are incorporated by reference in their entirety.
- a Gene Writing polypeptide has an endonuclease domain comprising a Cas9 nickase, e.g., Cas9 H840A.
- the Cas9 H840A has the following amino acid sequence: Cas9 nickase (H840A):
- a Gene Writing polypeptide comprises the RT domain from a retroviral reverse transcriptase, e.g., a wild-type M-MLV RT, e.g., comprising the following sequence: M-MLV (WT):
- a Gene Writing polypeptide comprises the RT domain from a retroviral reverse transcriptase, e.g., an M-MLV RT, e.g., comprising the following sequence: TLNIEDEHRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSI
- a Gene Writing polypeptide comprises the RT domain from a retroviral reverse transcriptase, e.g.
- the Gene Writing polypeptide further comprises one additional amino acid at the N-terminus of the sequence of amino acids 659-1329 of NP_057933, e.g., as shown below: Q Q ( Q ) Core RT (bold), annotated per above RNAseH (underlined), annotated per above
- the Gene Writing polypeptide further comprises one additional amino acid at the C-terminus of the sequence of amino acids 659-1329 of NP_057933.
- the Gene Writing polypeptide comprises an RNaseH1 domain (e.g., amino acids 1178-1318 of NP_057933).
- a retroviral reverse transcriptase domain e.g., M-MLV RT
- M-MLV RT may comprise one or more mutations from a wild-type sequence that may improve features of the RT, e.g., thermostability, processivity, and/or template binding.
- an M-MLV RT domain comprises, relative to the M-MLV (WT) sequence above, one or more mutations, e.g., selected from D200N, L603W, T330P, T306K, W313F, D524G, E562Q, D583N, P51L, S67R, E67K, T197A, H204R, E302K, F309N, L435G, N454K, H594Q, D653N, R110S, K103L, e.g., a combination of mutations, such as D200N, L603W, and T330P, optionally further including T306K and W313F.
- one or more mutations e.g., selected from D200N, L603W, T330P, T306K, W313F, D524G, E562Q, D583N, P51L, S67R, E67K, T197A, H204R, E302K
- an M-MLV RT used herein comprises the mutations D200N, L603W, T330P, T306K and W313F.
- the mutant M-MLV RT comprises the following amino acid sequence: M-MLV (PE2):
- a Gene Writer polypeptide comprises a flexible linker between the endonuclease and the RT domain, e.g., a linker comprising the amino acid sequence SGGSSGGSSGSETPGTSESATPESSGGSSGGSS (SEQ ID NO: 1601).
- an RT domain of a Gene Writer polypeptide may be located C-terminal to the endonuclease domain.
- an RT domain of a Gene Writer polypeptide may be located N- terminal to the endonuclease domain.
- a Gene Writer polypeptide comprises a dCas9 sequence comprising a D10A and/or H840A mutation, e.g., the following sequence: G D Y L E A L G G N D T E G E K E S N (
- a template RNA molecule for use in the system comprises, from 5’ to 3’ (1) a gRNA spacer; (2) a gRNA scaffold; (3) heterologous object sequence (4) 3’ homology domain.
- Is a Cas9 spacer of ⁇ 18-22 nt e.g., is 20 nt
- Is a gRNA scaffold comprising one or more hairpin loops, e.g., 1, 2, of 3 loopd for associating the template with a nickase Cas9 domain.
- the gRNA scaffold carries the sequence, from 5’ to 3’, G G
- the heterologous object sequence is, e.g., 7-74, e.g., 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, or 70-80 nt or, 80-90 nt in length.
- the first (most 5’) base of the sequence is not C.
- the 3’ homology domain that binds the target priming sequence after nicking occurs is e.g., 3-20 nt, e.g., 7-15 nt, e.g., 12-14 nt. In some embodiments, the 3’ homology domain has 40-60% GC content.
- a second gRNA associated with the system may help drive complete integration. In some embodiments, the second gRNA may target a location that is 0-200 nt away from the first-strand nick, e.g., 0-50, 50-100, 100-200 nt away from the first-strand nick.
- the second gRNA can only bind its target sequence after the edit is made, e.g., the gRNA binds a sequence present in the heterologous object sequence, but not in the initial target sequence.
- a Gene Writing system described herein is used to make an edit in HEK293, K562, U2OS, or HeLa cells.
- a Gene Writing system is used to make an edit in primary cells, e.g., primary cortical neurons from E18.5 mice.
- a reverse transcriptase or RT domain (e.g., as described herein) comprises a MoMLV RT sequence or variant thereof.
- the MoMLV RT sequence comprises one or more mutations selected from D200N, L603W, T330P, T306K, W313F, D524G, E562Q, D583N, P51L, S67R, E67K, T197A, H204R, E302K, F309N, L435G, N454K, H594Q, D653N, R110S, and K103L.
- the MoMLV RT sequence comprises a combination of mutations, such as D200N, L603W, and T330P, optionally further including T306K and/or W313F.
- an endonuclease domain (e.g., as described herein) comprises nCAS9, e.g., comprising the H840A mutation.
- the heterologous object sequence (e.g., of a system as described herein) is about 1-50, 50-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, 900-1000, or more, nucleotides in length.
- the RT and endonuclease domains are joined by a flexible linker, e.g., comprising the amino acid sequence S (SEQ ID NO: 1601).
- the endonuclease domain is N-terminal relative to the RT domain. In some embodiments, the endonuclease domain is C-terminal relative to the RT domain. In some embodiments, the system incorporates a heterologous object sequence into a target site by TPRT, e.g., as described herein. In some embodiments, a system or method described herein involves a CRISPR DNA targeting enzyme or system described in US Pat. App. Pub. No.20200063126, 20190002889, or 20190002875 (each of which is incorporated by reference herein in its entirety) or a functional fragment or variant thereof.
- a GeneWriter polypeptide or Cas endonuclease described herein comprises a polypeptide sequence of any of the applications mentioned in this paragraph
- a template RNA or guide RNA comprises a nucleic acid sequence of any of the applications mentioned in this paragraph.
- the template nucleic acid (e.g., template RNA) component of a Gene Writer genome editing system described herein typically is able to bind the Gene Writer genome editing protein of the system.
- the template nucleic acid e.g., template RNA
- the binding region may be a structured RNA region, e.g., having at least 1, 2 or 3 hairpin loops, capable of binding the Gene Writer genome editing protein of the system.
- the binding region may associate the template nucleic acid (e.g., template RNA) with any of the polypeptide modules.
- the binding region of the template nucleic acid e.g., template RNA
- the binding region of the template nucleic acid e.g., template RNA
- the reverse transcription domain of the polypeptide e.g., specifically bind to the RT domain.
- the template nucleic acid may contain a binding region derived from a non-LTR retrotransposon, e.g., a 3’ UTR from a non-LTR retrotransposon.
- the template nucleic acid e.g., template RNA
- the binding region may also provide DNA target recognition, e.g., a gRNA hybridizing to the target DNA sequence and binding the polypeptide, e.g., a Cas9 domain.
- the template nucleic acid e.g., template RNA
- the template nucleic acid may associate with multiple components of the polypeptide, e.g., DNA binding domain and reverse transcription domain.
- the template nucleic acid e.g., template RNA
- a system or method described herein comprises a single template nucleic acid (e.g., template RNA). In some embodiments a system or method described herein comprises a plurality of template nucleic acids (e.g., template RNAs).
- a system described herein comprises a first RNA comprising (e.g., from 5’ to 3’) a sequence that binds the Gene Writer polypeptide (e.g., the DNA-binding domain and/or the endonuclease domain, e.g., a gRNA) and a sequence that binds a target site (e.g., a non-edited strand of a site in a target genome), and a second RNA (e.g., a template RNA) comprising (e.g., from 5’ to 3’) optionally a sequence that binds the Gene Writer polypeptide (e.g., that specifically binds the RT domain), a heterologous object sequence, and a 3’ homology domain.
- a first RNA comprising (e.g., from 5’ to 3’) a sequence that binds the Gene Writer polypeptide (e.g., the DNA-binding domain and/or the endonuclea
- each nucleic acid when the system comprises a plurality of nucleic acids, each nucleic acid comprises a conjugating domain.
- a conjugating domain enables association of nucleic acid molecules, e.g., by hybridization of complementary sequences.
- a template nucleic acid molecule described herein comprises a 5’ homology region and/or a 3’ homology region.
- the 5’ homology region comprises a nucleic acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with a nucleic acid sequence comprised in a target nucleic acid molecule.
- the nucleic acid sequence in the target nucleic acid molecule is within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides of (e.g., 5’ relative to) a target insertion site, e.g., for a heterologous object sequence, e.g., comprised in the template nucleic acid molecule.
- the 3’ homology region comprises a nucleic acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with a nucleic acid sequence comprised in a target nucleic acid molecule.
- the nucleic acid sequence in the target nucleic acid molecule is within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides of (e.g., 3’ relative to) a target insertion site, e.g., for a heterologous object sequence, e.g., comprised in the template nucleic acid molecule.
- the 5’ homology region is heterologous to the remainder of the template nucleic acid molecule.
- the 3’ homology region is heterologous to the remainder of the template nucleic acid molecule.
- a template nucleic acid (e.g., template RNA) comprises a 3’ target homology domain.
- a 3’ target homology domain is disposed 3’ of the heterologous object sequence and is complementary to a sequence adjacent to a site to be modified by a system described herein, or comprises no more than 1, 2, 3, 4, or 5 mismatches to a sequence complementary to the sequence adjacent to a site to be modified by the system/Gene WriterTM.
- the 3’ homology region binds within 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides of a nick site in the target nucleic acid molecule.
- binding of the 3’ homology region to the target nucleic acid molecule permits initiation of target-primed reverse transcription (TPRT), e.g., with the 3’ homology region acting as a primer for TPRT.
- TPRT target-primed reverse transcription
- the 3’ target homology domain anneals to the target site, which provides a binding site and the 3’ hydroxyl for the initiation of TPRT by a Gene Writer polypeptide.
- the 3’ target homology domain is 3-5, 5-10, 10-30, 10-25, 10-20, 10-19, 10- 18, 10-17, 10-16, 10-15, 10-14, 10-13, 10-12, 10-11, 11-30, 11-25, 11-20, 11-19, 11-18, 11-17, 11-16, 11-15, 11-14, 11-13, 11-12, 12-30, 12-25, 12-20, 12-19, 12-18, 12-17, 12-16, 12-15, 12- 14, 12-13, 13-30, 13-25, 13-20, 13-19, 13-18, 13-17, 13-16, 13-15, 13-14, 14-30, 14-25, 14-20, 14-19, 14-18, 14-17, 14-16, 14-15, 15-30, 15-25, 15-18, 15-17, 15-16, 16-30, 16- 25, 16-20, 16-19, 16-18, 16-17, 17-30, 17-25, 17-20, 17-19, 17-18, 18-30, 18-25, 18-20, 18-19, 19-30, 19-25, 19-20, 20-30, 20-25, or 25-30 nt in length,
- the template nucleic acid may comprise a gRNA (e.g., pegRNA).
- the template nucleic acid e.g., template RNA
- the heterologous RNA binding domain is a CRISPR/Cas protein, e.g., Cas9.
- the region of the template nucleic acid, e.g., template RNA, comprising the gRNA adopts an underwound ribbon-like structure of gRNA bound to target DNA (e.g., as described in Mulepati et al. Science 19 Sep 2014:Vol.345, Issue 6203, pp.1479- 1484). Without wishing to be bound by theory, this non-canonical structure is thought to be facilitated by rotation of every sixth nucleotide out of the RNA-DNA hybrid.
- the region of the template nucleic acid, e.g., template RNA, comprising the gRNA may tolerate increased mismatching with the target site at some interval, e.g., every sixth base.
- the region of the template nucleic acid, e.g., template RNA, comprising the gRNA comprising homology to the target site may possess wobble positions at a regular interval, e.g., every sixth base, that do not need to base pair with the target site.
- a template nucleic acid, e.g., template RNA comprises a guide RNA (gRNA) with inducible activity.
- Inducible activity may be achieved by the template nucleic acid, e.g., template RNA, further comprising (in addition to the gRNA) a blocking domain, wherein the sequence of a portion of or all of the blocking domain is at least partially complementary to a portion or all of the gRNA.
- the blocking domain is thus capable of hybridizing or substantially hybridizing to a portion of or all of the gRNA.
- the blocking domain and inducibly active gRNA are disposed on the template nucleic acid, e.g., template RNA, such that the gRNA can adopt a first conformation where the blocking domain is hybridized or substantially hybridized to the gRNA, and a second conformationwhere the blocking domain is not hybridized or or not substantially hybridized to the gRNA.
- the gRNA in the first conformation the gRNA is unable to bind to the Gene Writer polypeptide (e.g., the template nucleic acid binding domain, DNA binding domain, or endonuclease domain (e.g., a CRISPR/Cas protein)) or binds with substantially decreased affinity compared to an otherwise similar template RNA lacking the blocking domain.
- the gRNA in the second conformation the gRNA is able to bind to the Gene Writer polypeptide (e.g., the template nucleic acid binding domain, DNA binding domain, or endonuclease domain (e.g., a CRISPR/Cas protein)).
- the Gene Writer polypeptide e.g., the template nucleic acid binding domain, DNA binding domain, or endonuclease domain (e.g., a CRISPR/Cas protein
- whether the gRNA is in the first or second conformation can influence whether the DNA binding or endonuclease activities of the Gene Writer polypeptide (e.g., of the CRISPR/Cas protein the Gene Writer polypeptide comprises) are active.
- hybridization of the gRNA to the blocking domain can be disrupted using an opener molecule.
- an opener molecule comprises an agent that binds to a portion or all of the gRNA or blocking domain and inhibits hybridization of the gRNA to the blocking domain.
- the opener molecule comprises a nucleic acid, e.g., comprising a sequence that is partially or wholly complementary to the gRNA, blocking domain, or both.
- the opener molecule at a selected time and/or location may allow for spatial and temporal control of the activity of the gRNA, CRISPR/Cas protein, or Gene Writer system comprising the same.
- the opener molecule is exogenous to the cell comprising the Gene Writer polypeptide and or template nucleic acid.
- the opener molecule comprises an endogenous agent (e.g., endogenous to the cell comprising the Gene Writer polypeptide and or template nucleic acid comprising the gRNA and blocking domain).
- an inducible gRNA, blocking domain, and opener molecule may be chosen such that the opener molecule is an endogenous agent expressed in a target cell or tissue, e.g., thereby ensuring activity of a Gene Writer system in the target cell or tissue.
- an inducible gRNA, blocking domain, and opener molecule may be chosen such that the opener molecule is absent or not substantially expressed in one or more non- target cells or tissues, e.g., thereby ensuring that activity of a Gene Writer system does not occur or substantially occur in the one or more non-target cells or tissues, or occurs at a reduced level compared to a target cell or tissue.
- Exemplary blocking domains, opener molecules, and uses thereof are described in PCT App.
- the template nucleic acid e.g., template RNA
- the template nucleic acid may comprise one or more UTRs (e.g. from an R2-type retrotransposon) and a gRNA.
- the UTR facilitates interaction of the template nucleic acid (e.g., template RNA) with the writing domain, e.g., reverse transcriptase domain, of the Gene Writer polypeptide.
- the gRNA facilitates interaction with the template nucleic acid binding domain (e.g., RNA binding domain) of the polypeptide.
- the gRNA directs the polypeptide to the matching target sequence, e.g., in a target cell genome.
- the template nucleic acid may contain only the reverse transcriptase binding motif (e.g.3’ UTR from R2) and the gRNA may be provided as a second nucleic acid molecule (e.g., second RNA molecule) for target site recognition.
- the template nucleic acid containing the RT-binding motif may exist on the same molecule as the gRNA, but be processed into two RNA molecules by cleavage activity (e.g. ribozyme).
- a template RNA may be customized to correct a given mutation in the genomic DNA of a target cell (e.g., ex vivo or in vivo, e.g., in a target tissue or organ, e.g., in a subject).
- the mutation may be a disease-associated mutation relative to the wild- type sequence.
- sets of empirical parameters help ensure optimal initial in silico designs of template RNAs or portions thereof).
- the following design parameters may be employed.
- design is initiated by acquiring approximately 500 bp (e.g., up to 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, or 700 bp, and optionally at least 20, 30, 40, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, or 650 bp) flanking sequence on either side of the mutation to serve as the target region.
- a template nucleic acid comprises a gRNA. Methodology for designing gRNAs is known to those of skill in the art.
- a gRNA comprises a sequence (e.g., a CRISPR spacer) that binds a target site .
- the sequence (e.g., a CRISPR spacer) that binds a target site for use in targeting a template nucleic acid to a target region is selected by considering the particular Gene Writer polypeptide (e.g., endonuclease domain or writing domain, e.g., comprising a CRISPR/Cas domain) being used (e.g., for Cas9, a protospacer-adjacent motif (PAM) of NGG immediately 3’ of a 20 nt gRNA binding region).
- the CRISPR spacer is selected by ranking first by whether the PAM will be disrupted by the Gene Writing induced edit. In some embodiments, disruption of the PAM may increase edit efficiency.
- the PAM can be disrupted by also introducing (e.g., as part of or in addition to another modification to a target site in genomic DNA) a silent mutation (e.g., a mutation that does not alter an amino acid residue encoded by the target nucleic acid sequence, if any) in the target site during Gene Writing.
- the CRISPR spacer is selected by ranking sequences by the proximity of their corresponding genomic site to the desired edit location.
- the gRNA comprises a gRNA scaffold.
- the gRNA scaffold used may be a standard scaffold (e.g., for Cas9, 5’- ’(SEQ ID NO: 1603)), or may contain one or more nucleotide substitutions.
- the heterologous object sequence has at least 90% identity, e.g., at least 90%, 95%, 98%, 99%, or 100% identity, or comprises no more than 1, 2, 3, 4, or 5 positions of non-identity to the target site 3’ of the first strand nick (e.g., immediately 3’ of the first strand nick or up to 1, 2, 3, 4, or 5 nucleotides 3’ of the first strand nick), with the exception of any insertion, substitution, or deletion that may be written into the target site by the Gene Writer.
- the 3’ target homology domain contains at least 90% identity, e.g., at least 90%, 95%, 98%, 99%, or 100% identity, or comprises no more than 1, 2, 3, 4, or 5 positions of non-identity to the target site 5’ of the first strand nick (e.g., immediately 5’ of the first strand nick or up to 1, 2, 3, 4, or 5 nucleotides 3’ of the first strand nick).
- the template possesses one or more sequences aiding in association of the template with the Gene Writer polypeptide.
- these sequences may be derived from retrotransposon UTRs.
- the UTRs may be located flanking the desired insertion sequence.
- a sequence with target site homology may be located outside of one or both UTRs.
- the sequence with target site homology can anneal to the target sequence to prime reverse transcription.
- the 5’ and/or 3’ UTR may be located terminal to the target site homology sequence, e.g., such that target primed reverse transcription excludes reverse transcription of the 5’ and/or 3’ UTR.
- the Gene Writer system may result in the insertion of a desired payload without any additional sequence (e.g. gene expression unit without UTRs used to bind the Gene Writer protein).
- the template nucleic acid e.g., template RNA
- the template nucleic acid can be designed to result in insertions, mutations, or deletions at the target DNA locus.
- the template nucleic acid e.g., template RNA
- the template nucleic acid may be designed to cause an insertion in the target DNA.
- the template nucleic acid e.g., template RNA
- the RNA template may be designed to write a deletion into the target DNA.
- the template nucleic acid may match the target DNA upstream and downstream of the desired deletion, wherein the reverse transcription will result in the copying of the upstream and downstream sequences from the template nucleic acid (e.g., template RNA) without the intervening sequence, e.g., causing deletion of the intervening sequence.
- the template nucleic acid e.g., template RNA
- the template nucleic acid may be designed to write an edit into the target DNA.
- the template RNA may match the target DNA sequence with the exception of one or more nucleotides, wherein the reverse transcription will result in the copying of these edits into the target DNA, e.g., resulting in mutations, e.g., transition or transversion mutations.
- a Gene Writer system is capable of producing an insertion into the target site of at least 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 nucleotides (and optionally no more than 500, 400, 300, 200, or 100 nucleotides).
- a Gene Writer system is capable of producing an insertion into the target site of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 nucleotides (and optionally no more than 500, 400, 300, 200, or 100 nucleotides).
- a Gene Writer system is capable of producing an insertion into the target site of at least 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5 or 10 kilobases (and optionally no more than 1, 5, 10, or 20 kilobases).
- a Gene Writer system is capable of producing a deletion of at least 81, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or 200 nucleotides (and optionally no more than 500, 400, 300, or 200 nucleotides).
- a Gene Writer system is capable of producing a deletion of at least 81, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or 200 nucleotides (and optionally no more than 500, 400, 300, or 200 nucleotides). In some embodiments, a Gene Writer system is capable of producing a deletion of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or 200 nucleotides (and optionally no more than 500, 400, 300, or 200 nucleotides).
- a Gene Writer system is capable of producing a deletion of at least 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5 or 10 kilobases (and optionally no more than 1, 5, 10, or 20 kilobases).
- a Gene Writer system is capable of producing a substitution into the target site of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 or more nucleotides.
- the substitution is a transition mutation.
- the substitution is a transversion mutation. In some embodiments, the substitution converts an adenine to a thymine, an adenine to a guanine, an adenine to a cytosine, a guanine to a thymine, a guanine to a cytosine, a guanine to an adenine, a thymine to a cytosine, a thymine to an adenine, a thymine to a guanine, a cytosine to an adenine, a cytosine to a guanine, or a cytosine to a thymine.
- an RNA component of the system e.g., a template RNA or a gRNA, e.g., as described herein
- an RNA component of the system comprises one or more nucleotide modifications.
- the modification pattern of a gRNA can significantly affect in vivo activity compared to unmodified or end-modified guides (e.g., as shown in Figure 1D from Finn et al. Cell Rep 22(9):2227-2235 (2016); incorporated herein by reference in its entirety). Without wishing to be bound by theory, this process may be due, at least in part, to a stabilization of the RNA conferred by the modifications.
- Non-limiting examples of such modifications may include 2'-O-methyl (2'-O-Me), 2'-0-(2-methoxyethyl) (2'-0-MOE), 2'- fluoro (2'-F), phosphorothioate (PS) bond between nucleotides, G-C substitutions, and inverted abasic linkages between nucleotides and equivalents thereof.
- the template RNA e.g., at the portion thereof that binds a target site
- the guide RNA comprises a 5' terminus region.
- the template RNA or the guide RNA does not comprise a 5' terminus region.
- the 5' terminus region comprises a CRISPR spacer region, e.g., as described with respect to sgRNA in Briner AE et al, Molecular Cell 56: 333-339 (2014) (incorporated herein by reference in its entirety; applicable herein, e.g., to all guide RNAs).
- the 5' terminus region comprises a 5' end modification.
- a 5' terminus region with or without a spacer region may be associated with a crRNA, trRNA, sgRNA and/or dgRNA.
- the CRISPR spacer region can, in some instances, comprise a guide region, guide domain, or targeting domain.
- a target domain or target sequence may comprise a sequence of nucleic acid to which the guide region/domain directs a nuclease for cleavage.
- a spyCas9 protein may be directed by a guide region/domain to a target sequence of a target nucleic acid molecule by the nucleotides present in the CRISPR spacer region.
- the template RNA e.g., at the portion thereof that binds a target site
- guide RNA e.g., as described herein, comprises any of the sequences shown in Table 4 of WO2018107028A1, incorporated herein by reference in its entirety.
- a guide RNA comprises one or more of the modifications of any of the sequences shown in Table 4 of WO2018107028A1, e.g., as identified therein by a SEQ ID NO.
- the nucleotides may be the same or different, and/or the modification pattern shown may be the same or similar to a modification pattern of a guide sequence as shown in Table 4 of WO2018107028A1.
- a modification pattern includes the relative position and identity of modifications of the gRNA or a region of the gRNA (e.g.5' terminus region, lower stem region, bulge region, upper stem region, nexus region, hairpin 1 region, hairpin 2 region, 3' terminus region).
- the modification pattern contains at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% of the modifications of any one of the sequences shown in the sequence column of Table 4 of WO2018107028A1, and/or over one or more regions of the sequence.
- the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the modification pattern of any one of the sequences shown in the sequence column of Table 4 of WO2018107028A1.
- the modification pattern is at least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical over one or more regions of the sequence shown in Table 4 of WO2018107028A1, e.g., in a 5 ' terminus region, lower stem region, bulge region, upper stem region, nexus region, hairpin 1 region, hairpin 2 region, and/or 3' terminus region.
- the modification pattern is least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the modification pattern of a sequence over the 5 ' terminus region.
- the modification pattern is least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical over the lower stem. In some embodiments, the modification pattern is least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical over the bulge. In some embodiments, the modification pattern is least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical over the upper stem.
- the modification pattern is least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical over the nexus. In some embodiments, the modification pattern is least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical over the hairpin 1. In some embodiments, the modification pattern is least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical over the hairpin 2.
- the modification pattern is least 50%, 55%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identical over the 3 ' terminus.
- the modification pattern differs from the modification pattern of a sequence of Table 4 of WO2018107028A1, or a region (e.g.5' terminus, lower stem, bulge, upper stem, nexus, hairpin 1, hairpin 2, 3' terminus) of such a sequence, e.g., at 0, 1, 2, 3, 4, 5, 6, or more nucleotides.
- the gRNA comprises modifications that differ from the modifications of a sequence of Table 4 of WO2018107028A1, e.g., at 0, 1, 2, 3, 4, 5, 6, or more nucleotides.
- the gRNA comprises modifications that differ from modifications of a region (e.g.5 ' terminus, lower stem, bulge, upper stem, nexus, hairpin 1, hairpin 2, 3' terminus) of a sequence of Table 4 of WO2018107028A1, e.g., at 0, 1, 2, 3, 4, 5, 6, or more nucleotides.
- the template RNA e.g., at the portion thereof that binds a target site
- the gRNA comprises a 2'-O-methyl (2'-O-Me) modified nucleotide.
- the gRNA comprises a 2'-O-(2-methoxy ethyl) (2'-O-moe) modified nucleotide.
- the gRNA comprises a 2'-fluoro (2'- F) modified nucleotide.
- the gRNA comprises a phosphorothioate (PS) bond between nucleotides.
- the gRNA comprises a 5' end modification, a 3' end modification, or 5' and 3' end modifications.
- the 5' end modification comprises a phosphorothioate (PS) bond between nucleotides.
- the 5' end modification comprises a 2'-O- methyl (2'-O-Me), 2'-O-(2-methoxy ethyl) (2'-O-MOE), and/or 2'-fluoro (2'-F) modified nucleotide.
- the 5' end modification comprises at least one phosphorothioate (PS) bond and one or more of a 2'-O-methyl (2'-O- Me), 2'-O-(2- methoxyethyl) (2'-O-MOE), and/or 2'-fluoro (2'-F) modified nucleotide.
- the end modification may comprise a phosphorothioate (PS), 2'-O-methyl (2'-O-Me) , 2'-O-(2- methoxyethyl) (2'-O- MOE), and/or 2'-fluoro (2'-F) modification.
- Equivalent end modifications are also encompassed by embodiments described herein.
- the template RNA or gRNA comprises an end modification in combination with a modification of one or more regions of the template RNA or gRNA. Additional exemplary modifications and methods for protecting RNA, e.g., gRNA, and formulae thereof, are described in WO2018126176A1, which is incorporated herein by reference in its entirety.
- structure-guided and systematic approaches are used to introduce modifications (e.g., 2′-OMe-RNA, 2′-F-RNA, and PS modifications) to a template RNA or guide RNA, for example, as described in Mir et al. Nat Commun 9:2641 (2016) (incorporated by reference herein in its entirety).
- the incorporation of 2′-F-RNAs increases thermal and nuclease stability of RNA:RNA or RNA:DNA duplexes, e.g., while minimally interfering with C3′-endo sugar puckering.
- 2′-F may be better tolerated than 2′-OMe at positions where the 2′-OH is important for RNA:DNA duplex stability.
- a crRNA comprises one or more modifications that do not reduce Cas9 activity, e.g., C10, C20, or C21 (fully modified), e.g., as dscribed in Supplementary Table 1 of Mir et al. Nat Commun 9:2641 (2016), incorporated herein by reference in its entirety.
- a tracrRNA comprises one or more modifications that do not reduce Cas9 activity, e.g., T2, T6, T7, or T8 (fully modified) of Supplementary Table 1 of Mir et al. Nat Commun 9:2641 (2016).
- a crRNA comprises one or more modifications (e.g., as described herein) may be paired with a tracrRNA comprising one or more modifications, e.g., C20 and T2.
- a gRNA comprises a chimera, e.g., of a crRNA and a tracrRNA (e.g., Jinek et al. Science 337(6096):816-821 (2012)).
- modifications from the crRNA and tracrRNA are mapped onto the single-guide chimera, e.g., to produce a modified gRNA with enhanced stability.
- gRNA molecules may be modified by the addition or subtraction of the naturally occurring structural components, e.g., hairpins.
- a gRNA may comprise a gRNA with one or more 3’ hairpin elements deleted, e.g., as described in WO2018106727, incorporated herein by reference in its entirety.
- a gRNA may contain an added hairpin structure, e.g., an added hairpin structure in the spacer region, which was shown to increase specificity of a CRISPR-Cas system in the teachings of Kocak et al. Nat Biotechnol 37(6):657-666 (2019). Additional modifications, including examples of shortened gRNA and specific modifications improving in vivo activity, can be found in US20190316121, incorporated herein by reference in its entirety. In some embodiments, structure-guided and systematic approaches (e.g., as described in Mir et al. Nat Commun 9:2641 (2016); incorporated herein by reference in its entirety) are employed to find modifications for the template RNA.
- the modifications are identified with the inclusion or exclusion of a guide region of the template RNA.
- a structure of polypeptide bound to template RNA is used to determine non- protein-contacted nucleotides of the RNA that may then be selected for modifications, e.g., with lower risk of disrupting the association of the RNA with the polypeptide.
- Secondary structures in a template RNA can also be predicted in silico by software tools, e.g., the RNAstructure tool available at rna.urmc.rochester.edu/RNAstructureWeb (Bellaousov et al.
- RNA molecules may be assembled by the connection of two or more (e.g., two, three, four, five, six, seven, eight, nine, ten, or more) RNA segments with each other.
- the disclosure provides methods for producing nucleic acid molecules, the methods comprising contacting two or more linear RNA segments with each other under conditions that allow for the 5′ terminus of a first RNA segment to be covalently linked with the 3′ terminus of a second RNA segment.
- the joined molecule may be contacted with a third RNA segment under conditions that allow for the 5’ terminus of the joined molecule to be covalently linked with the 3’ terminus of the third RNA segment.
- the method further comprises joining a fourth, fifth, or additional RNA segments to the elongated molecule. This form of assembly may, in some instances, allow for rapid and efficient assembly of RNA molecules.
- the present disclosure also provides compositions and methods for the connection (e.g., covalent connection) of crRNA molecules and tracrRNA molecules.
- guide RNA molecules with specificity for different target sites can be generated using a single tracrRNA molecule/segment connected to a target site specific crRNA molecule/segment (e.g., as shown in FIG.10 of US20160102322A1; incorporated herein by reference in its entirety).
- FIG.10 of US20160102322A1 shows four tubes with different crRNA molecules with crRNA molecule 3 being connected to a tracrRNA molecule to form a guide RNA molecule, thereby depicting an exemplary connection of two RNA segments to form a product RNA molecule.
- the disclosure also provides compositions and methods for the production of template RNA molecules with specificity for a Gene Writer polypeptide and/or a genomic target site.
- the method comprises: (1) identification of the target site and desired modification thereto, (2) production of RNA segments including an upstream homology segment, a heterologous object sequence segment, a Gene Writer polypeptide binding motif, and a gRNA segment, and/or (3) connection of the four or more segments into at least one molecule, e.g., into a single RNA molecule.
- some or all of the template RNA segments comprised in (2) are assembled into a template RNA molecule, e.g., one, two, three, or four of the listed components.
- the segments comprised in (2) may be produced in further segmented molecules, e.g., split into at least 2, at least 3, at least 4, or at least 5 or more sub-segments, e.g., that are subsequently assembled, e.g., by one or more methods described herein.
- RNA segments may be produced by chemical synthesis.
- RNA segments may be produced by in vitro transcription of a nucleic acid template, e.g., by providing an RNA polymerase to act on a cognate promoter of a DNA template to produce an RNA transcript.
- in vitro transcription is performed using, e.g., a T7, T3, or SP6 RNA polymerase, or a derivative thereof, acting on a DNA, e.g., dsDNA, ssDNA, linear DNA, plasmid DNA, linear DNA amplicon, linearized plasmid DNA, e.g., encoding the RNA segment, e.g., under transcriptional control of a cognate promoter, e.g., a T7, T3, or SP6 promoter.
- a combination of chemical synthesis and in vitro transcription is used to generate the RNA segments for assembly.
- the gRNA, upstream target homology, and Gene Writer polypeptide binding segments are produced by chemical synthesis and the heterologous object sequence segment is produced by in vitro transcription.
- in vitro transcription may be better suited for the production of longer RNA molecules.
- reaction temperature for in vitro transcription may be lowered, e.g., be less than 37°C (e.g., between 0-10C, 10-20C, or 20- 30C), to result in a higher proportion of full-length transcripts (Krieg Nucleic Acids Res 18:6463 (1990)).
- a protocol for improved synthesis of long transcripts is employed to synthesize a long template RNA, e.g., a template RNA greater than 5 kb, such as the use of e.g., T7 RiboMAX Express, which can generate 27 kb transcripts in vitro (Thiel et al. J Gen Virol 82(6):1273-1281 (2001)).
- modifications to RNA molecules as described herein may be incorporated during synthesis of RNA segments (e.g., through the inclusion of modified nucleotides or alternative binding chemistries), following synthesis of RNA segments through chemical or enzymatic processes, following assembly of one or more RNA segments, or a combination thereof.
- an mRNA of the system (e.g., an mRNA encoding a Gene Writer polypeptide) is synthesized in vitro using T7 polymerase-mediated DNA-dependent RNA transcription from a linearized DNA template, where UTP is optionally substituted with 1- methylpseudoUTP.
- the transcript incorporates 5′ and 3′ UTRs, e.g., G NO: 1604) and (SEQ ID NO: 1605), or functional fragments or variants thereof, and optionally includes a poly- A tail, which can be encoded in the DNA template or added enzymatically following transcription.
- a donor methyl group e.g., S-adenosylmethionine
- a donor methyl group is added to a methylated capped RNA with cap 0 structure to yield a cap 1 structure that increases mRNA translation efficiency (Richner et al. Cell 168(6): P1114-1125 (2017)).
- the transcript from a T7 promoter starts with a GGG motif. In some embodiments, a transcript from a T7 promoter does not start with a GGG motif.
- RNA segments may be connected to each other by covalent coupling.
- an RNA ligase e.g., T4 RNA ligase
- T4 RNA ligase may be used to connect two or more RNA segments to each other.
- a reagent such as an RNA ligase
- a 5′ terminus is typically linked to a 3′ terminus.
- there are two possible linear constructs that can be formed i.e., (1) 5′-Segment 1-Segment 2-3′ and (2) 5′-Segment 2-Segment 1-3′).
- intramolecular circularization can also occur.
- compositions and methods for the covalent connection of two nucleic acid (e.g., RNA) segments are disclosed, for example, in US20160102322A1 (incorporated herein by reference in its entirety), along with methods including the use of an RNA ligase to directionally ligate two single-stranded RNA segments to each other.
- RNA ligase One example of an end blocker that may be used in conjunction with, for example, T4 RNA ligase, is a dideoxy terminator.
- T4 RNA ligase typically catalyzes the ATP- dependent ligation of phosphodiester bonds between 5′-phosphate and 3′-hydroxyl termini.
- T4 RNA ligase when T4 RNA ligase is used, suitable termini must be present on the termini being ligated.
- One means for blocking T4 RNA ligase on a terminus comprises failing to have the correct terminus format.
- termini of RNA segments with a 5-hydroxyl or a 3′- phosphate will not act as substrates for T4 RNA ligase.
- Additional exemplary methods that may be used to connect RNA segments is by click chemistry (e.g., as described in U.S. Patent Nos.7,375,234 and 7,070,941, and US Patent Publication No.2013/0046084, the entire disclosures of which are incorporated herein by reference).
- one exemplary click chemistry reaction is between an alkyne group and an azide group (see FIG.11 of US20160102322A1, which is incorporated herein by reference in its entirety).
- Any click reaction may potentially be used to link RNA segments (e.g., Cu-azide- alkyne, strain-promoted-azide-alkyne, staudinger ligation, tetrazine ligation, photo-induced tetrazole-alkene, thiol-ene, NHS esters, epoxides, isocyanates, and aldehyde-aminooxy).
- RNA segments e.g., Cu-azide- alkyne, strain-promoted-azide-alkyne, staudinger ligation, tetrazine ligation, photo-induced tetrazole-alkene, thiol-ene, NHS esters, epoxides, isocyanates, and
- RNA segments may be connected using an Azide-Alkyne Huisgen Cycloaddition. reaction, which is typically a 1,3-dipolar cycloaddition between an azide and a terminal or internal alkyne to give a 1,2,3-triazole for the ligation of RNA segments.
- Azide-Alkyne Huisgen Cycloaddition. reaction which is typically a 1,3-dipolar cycloaddition between an azide and a terminal or internal alkyne to give a 1,2,3-triazole for the ligation of RNA segments.
- this ligation method may be that this reaction can initiated by the addition of required Cu(I) ions.
- halogens F—, Br—, I—
- one RNA molecule may be modified with thiol at 3′ (using disulfide amidite and universal support or disulfide modified support), and the other RNA molecule may be modified with acrydite at 5′ (using acrylic phosphoramidite), then the two RNA molecules can be connected by a Michael addition reaction.
- This strategy can also be applied to connecting multiple RNA molecules stepwise.
- RNA molecules are also provided.
- this may be useful when a desired RNA molecule is longer than about 40 nucleotides, e.g., such that chemical synthesis efficiency degrades, e.g., as noted in US20160102322A1 (incorporated herein by reference in its entirety).
- a tracrRNA is typically around 80 nucleotides in length.
- Such RNA molecules may be produced, for example, by processes such as in vitro transcription or chemical synthesis.
- RNA segments 1 and 2 when chemical synthesis is used to produce such RNA molecules, they may be produced as a single synthesis product or by linking two or more synthesized RNA segments to each other. In embodiments, when three or more RNA segments are connected to each other, different methods may be used to link the individual segments together. Also, the RNA segments may be connected to each other in one pot (e.g., a container, vessel, well, tube, plate, or other receptacle), all at the same time, or in one pot at different times or in different pots at different times. In a non-limiting example, to assemble RNA Segments 1, 2 and 3 in numerical order, RNA Segments 1 and 2 may first be connected, 5′ to 3′, to each other.
- a container, vessel, well, tube, plate, or other receptacle e.g., a container, vessel, well, tube, plate, or other receptacle
- RNA Segment 1 (about 30 nucleotides) is the target locus recognition sequence of a crRNA and a portion of Hairpin Region 1.
- RNA Segment 2 (about 35 nucleotides) contains the remainder of Hairpin Region 1 and some of the linear tracrRNA between Hairpin Region 1 and Hairpin Region 2.
- RNA Segment 3 (about 35 nucleotides) contains the remainder of the linear tracrRNA between Hairpin Region 1 and Hairpin Region 2 and all of Hairpin Region 2.
- RNA Segments 2 and 3 are linked, 5′ to 3′, using click chemistry. Further, the 5′ and 3′ end termini of the reaction product are both phosphorylated.
- the reaction product is then contacted with RNA Segment 1, having a 3′ terminal hydroxyl group, and T4 RNA ligase to produce a guide RNA molecule.
- a number of additional linking chemistries may be used to connect RNA segments according to method of the invention. Some of these chemistries are set out in Table 6 of US20160102322A1, which is incorporated herein by reference in its entirety.
- thermostable Gene Writers including proteins derived from avian retrotransposases.
- Exemplary avian transposase sequences in Table 3 include those of Taeniopygia guttata (zebra finch; transposon name R2-1_TG), Geospiza fortis (medium ground finch; transposon name R2-1_Gfo), Zonotrichia albicollis (white-throated sparrow; transposon name R2-1_ZA), and Tinamus guttatus (white-throated tinamou; transposon name R2-1_TGut).
- Thermostability may be measured, e.g., by testing the ability of a Gene Writer to polymerize DNA in vitro at a high temperature (e.g., 37°C) and a low temberature (e.g., 25°C).
- thermostable Gene Writer polypeptide has an activity, e.g., a DNA polymerization activity, at 37°C that is no less than 70%, 75%, 80%, 85%, 90%, or 95% of its activity at 25°C under otherwise similar conditions.
- a GeneWriter polypeptide (e.g., a sequence of Table 1, 2, or 3 or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity thereto) is stable in a subject chosen from a mammal (e.g., human) or a bird.
- a GeneWriter polypeptide described herein is functional at 37°C.
- a GeneWriter polypeptide described herein has greater activity at 37°C than it does at a lower temperature, e.g., at 30°C, 25°C, or 20°C.
- a GeneWriter polypeptide described herein has greater activity in a human cell than in a zebrafish cell. In some embodiments, a GeneWriter polypeptide is active in a human cell cultured at 37°C, e.g., using an assay of Example 6 or Example 7 herein.
- the assay comprises steps of: (1) introducing HEK293T cells into one or more wells of 6.4 mm diameter, at 10,000 cells/well, (2) incubating the cells at 37°C for 24 hr, (3) providing a transfection mixture comprising 0.5 ⁇ l if FuGENE® HD transfection reagent and 80ng DNA (wherein the DNA is a plasmid comprising, in order, (a) CMV promoter, (b) 100 bp of sequence homologous to the 100 bp upstream of the target site, (c) sequence encoding a 5’ untranslated region that binds the GeneWriter protein, (d) sequence encoding the GeneWriter protein, (e) sequence encoding a 3’ untranslated region that binds the GeneWriter protein (f) 100 bp of sequence homologous to the 100 bp downstream of the target site, and (g) BGH polyadenylation sequence) and 10 ⁇ l Opti-MEM and incubating for 15 min at room temperature
- the GeneWriter polypeptide results in insertion of the heterologous object sequence (e.g., the GFP gene) into the target locus (e.g., rDNA) at an average copy number of at least 0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.5, 3, 4, or 5 copies per genome.
- the heterologous object sequence e.g., the GFP gene
- target locus e.g., rDNA
- a cell described herein (e.g., a cell comprising a heterologous sequence at a target insertion site) comprises the heterologous object sequence at an average copy number of at least 0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.5, 3, 4, or 5 copies per genome.
- a GeneWriter causes integration of a sequence in a target RNA with relatively few truncation events at the terminus.
- a Gene Writer protein results in about 25-100%, 50-100%, 60-100%, 70-100%, 75-95%, 80%-90%, or 86.17% of integrants into the target site being non-truncated, as measured by an assay described herein, e.g., an assay of Example 6 and Figure 8.
- a Gene Writer protein results in at least about 30%, 40%, 50%, 60%, 70%, 80%, or 90% of integrants into the target site being non-truncated, as measured by an assay described herein.
- an integrant is classified as truncated versus non-truncated using an assay comprising amplification with a forward primer situated 565bp from the end of the element (e.g., a wild-type transposon sequence, e.g., of Taeniopygia guttata) and a reverse primer situated in the genomic DNA of the target insertion site, e.g., rDNA.
- a forward primer situated 565bp from the end of the element e.g., a wild-type transposon sequence, e.g., of Taeniopygia guttata
- a reverse primer situated in the genomic DNA of the target insertion site, e.g., rDNA.
- the number of full-length integrants in the target insertion site is greater than the number of integrants truncated by 300-565 nucleotides in the target insertion site, e.g., the number of full-length integrants is at least 1.1x, 1.2x, 1.5x, 2x, 3x, 4x, 5x, 6x, 7x, 8x, 9x, or 10x the number of the truncated integrants, or the number of full-length integrants is at least 1.1x-10x, 2x-10x, 3x-10x, or 5x-10x the number of the truncated integrants.
- a system or method described herein results in insertion of the heterologous object sequence only at one target site in the genome of the target cell. Insertion can be measured, e.g., using a threshold of above 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, e.g., as described in Example 8.
- a system or method described herein results in insertion of the heterologous object sequence wherein less than 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 10%, 20%, 30%, 40%, or 50% of insertions are at a site other than the target site, e.g., using an assay described herein, e.g., an assay of Example 8.
- a system or method described herein results in “scarless” insertion of the heterologous object sequence, while in some embodiments, the target site can show deletions or duplications of endogenous DNA as a result of insertion of the heterologous sequence.
- the system results in a scarless insertion, with no duplications or deletions in the surrounding genomic DNA.
- the system results in a deletion of less than 1, 2, 3, 4, 5, 10, 50, or 100 bp of genomic DNA upstream of the insertion.
- the system results in a deletion of less than 1, 2, 3, 4, 5, 10, 50, or 100 bp of genomic DNA downstream of the insertion.
- the system results in a duplication of less than 1, 2, 3, 4, 5, 10, 50, or 100 bp of genomic DNA upstream of the insertion.
- the system results in a duplication of less than 1, 2, 3, 4, 5, 10, 50, or 100 bp of genomic DNA downstream of the insertion.
- a GeneWriter described herein, or a DNA-binding domain thereof binds to its target site specifically, e.g., as measured using an assay of Example 21.
- the GeneWriter or DNA-binding domain thereof binds to its target site more strongly than to any other binding site in the human genome.
- the target site represents more than 50%, 60%, 70%, 80%, 90%, or 95% of binding events of the GeneWriter or DNA-binding domain thereof to human genomic DNA.
- a retrotransposase described herein comprises two connected subunits as a single polypeptide.
- two wild-type retrotransposases could be joined with a linker to form a covalently “dimerized” protein (see Figure 17).
- the nucleic acid coding for the retrotransposase codes for two retrotransposase subunits to be expressed as a single polypeptide.
- the subunits are connected by a peptide linker, such as has been described herein in the section entitled “Linker” and, e.g., in Chen et al Adv Drug Deliv Rev 2013.
- the two subunits in the polypeptide are connected by a rigid linker.
- the rigid linker consists of the motif (EAAAK)n (SEQ ID NO: 1534).
- the two subunits in the polypeptide are connected by a flexible linker.
- the flexible linker consists of the motif (Gly) n .
- the flexible linker consists of the motif ( ) ( Q : 1535).
- the rigid or flexible linker consists of 1, 2, 3, 4, 5, 10, 15, or more amino acids in length to enable retrotransposition.
- the linker consists of a combination of rigid and flexible linker motifs.
- a Gene Writer polypeptide may comprise a linker, e.g., a peptide linker, e.g., a linker as described in Table 38.
- Table 38 provides linker sequences for increasing expression, stability, and function of Gene Writer polypeptides comprising multiple functional domains. Table 38. Exemplary linker sequences
- the fusion protein may consist of a fully functional subunit and a second subunit lacking one or more functional domains.
- one subunit may lack reverse transcriptase functionality.
- one subunit may lack the reverse transcriptase domain.
- one subunit may possess only endonuclease activity.
- a GeneWriter described herein has a covalently dimerized configuration, e.g., as shown in any of Figs.17A-17F of PCT/US2019/048607, incorporated herein by reference. The proteins depicted are: Fig.17A: a wild-type full length enzyme.
- Fig. 17B two full-length enzymes (each comprising a DNA-binding domain, an RNA-binding domain, a reverse transcriptase domain, and an endonuclease domain) connected by a linker.
- Fig.17C a DNA binding domain and an RNA binding domain connected by a linker to a full- length enzyme.
- Fig.17D a DNA-binding domain and an RNA-binding domain connected by a linker to an RNA-binding domain, a reverse transcriptase domain, and an endonuclease domain.
- Fig.17E a DNA-binding domain connected by a first linker to an RNA-binding domain, which is connected by a second linker to a second RNA-binding domain, a reverse transcriptase domain, and an endonuclease domain.
- Fig.17F a DNA-binding domain connected by a first linker to an RNA-binding domain, which is connected by a second linker to a plurality of RNA- binding domains (in this figure, the molecule comprises three RNA-binding domains), which are connected by a linker to a reverse transcriptase domain and an endonuclease domain.
- each R2 binds UTRs in the template RNA.
- At least one module comprises a reverse transcriptase domain and an endonuclease domain.
- the protein comprises a plurality of RNA-binding domains.
- the modular system is split and is only active when it binds on DNA where the system uses two different DNA binding modules, e.g., a first protein comprising a first DNA binding module that is fused to an RNA binding module that recruits the RNA template for target primed reverse transcription, and second protein that comprises a second DNA binding module that binds at the site of intergration and is fused to the reverse transcription and endonuclease modules.
- the nucleic acid encoding the GeneWriter comprises an intein such that the GeneWriter protein is expressed from two separate genes and is fused by protein splicing after being translated.
- the GeneWriter is derived from a non- LTR protein, e.g., an R2 protein.
- one subunit may possess only an endonuclease domain.
- the two subunits comprising the single polypeptide may provide complimentary functions.
- one subunit may lack endonuclease functionality.
- one subunit may lack the endonuclease domain.
- one subunit may possess only reverse transcriptase activity.
- one subunit may possess only a reverse transcriptase domain. In some embodiments, one subunit may possess only DNA-dependent DNA synthesis functionality.
- Linkers In some embodiments, domains of the compositions and systems described herein (e.g., the endonuclease and reverse transcriptase domains of a polypeptide or the DNA binding domain and reverse transcriptase domains of a polypeptide) may be joined by a linker.
- a composition described herein comprising a linker element has the general form S1-L-S2, wherein S1 and S2 may be the same or different and represent two domain moieties (e.g., each a polypeptide or nucleic acid domain) associated with one another by the linker.
- a linker may connect two polypeptides. In some embodiments, a linker may connect two nucleic acid molecules. In some embodiments, a linker may connect a polypeptide and a nucleic acid molecule.
- a linker may be a chemical bond, e.g., one or more covalent bonds or non-covalent bonds.
- a linker may be flexible, rigid, and/or cleavable. In some embodiments, the linker is a peptide linker. Generally, a peptide linker is at least 2, 3, 4, 5, 6, 7, 8, 9, 10 or more amino acids in length, e.g., 2-50 amino acids in length, 2-30 amino acids in length.
- GS linker The most commonly used flexible linkers have sequences consisting primarily of stretches of Gly and Ser residues (“GS” linker).
- Flexible linkers may be useful for joining domains that require a certain degree of movement or interaction and may include small, non- polar (e.g. Gly) or polar (e.g. Ser or Thr) amino acids. Incorporation of Ser or Thr can also maintain the stability of the linker in aqueous solutions by forming hydrogen bonds with the water molecules, and therefore reduce unfavorable interactions between the linker and the other moieties.
- Examples of such linkers include those having the structure [GGS] >1 or [GGGS] >1 (SEQ ID NO: 1536). Rigid linkers are useful to keep a fixed distance between domains and to maintain their independent functions.
- Rigid linkers may also be useful when a spatial separation of the domains is critical to preserve the stability or bioactivity of one or more components in the agent.
- Rigid linkers may have an alpha helix-structure or Pro-rich sequence, (XP)n, with X designating any amino acid, preferably Ala, Lys, or Glu.
- Cleavable linkers may release free functional domains in vivo.
- linkers may be cleaved under specific conditions, such as the presence of reducing reagents or proteases. In vivo cleavable linkers may utilize the reversible nature of a disulfide bond.
- One example includes a thrombin-sensitive sequence (e.g., PRS) between the two Cys residues.
- PRS thrombin-sensitive sequence
- In vitro thrombin treatment of CPRSC results in the cleavage of the thrombin-sensitive sequence, while the reversible disulfide linkage remains intact.
- linkers are known and described, e.g., in Chen et al.2013. Fusion Protein Linkers: Property, Design and Functionality. Adv Drug Deliv Rev. 65(10): 1357–1369.
- In vivo cleavage of linkers in compositions described herein may also be carried out by proteases that are expressed in vivo under pathological conditions (e.g.
- the amino acid linkers are (or are homologous to) the endogenous amino acids that exist between such domains in a native polypeptide. In some embodiments the endogenous amino acids that exist between such domains are substituted but the length is unchanged from the natural length. In some embodiments, additional amino acid residues are added to the naturally existing amino acid residues between domains. In some embodiments, the amino acid linkers are designed computationally or screened to maximize protein function (Anad et al., FEBS Letters, 587:19, 2013).
- a polypeptide in addition to being fully encoded on a single transcript, can be generated by separately expressing two or more polypeptide fragments that reconstitute the holoenzyme.
- the Gene Writer polypeptide is generated by expressing as separate subunits that reassemble the holoenzyme through engineered protein-protein interactions.
- reconstitution of the holoenzyme does not involve covalent binding between subunits.
- Peptides may also fuse together through trans-splicing of inteins (Tornabene et al. Sci Transl Med 11, eaav4523 (2019)).
- the Gene Writer holoenzyme is expressed as separate subunits that are designed to create a fusion protein through the presence of split inteins in the subunits. In some embodiments, the Gene Writer holoenzyme is reconstituted through the formation of covalent linkages between subunits. In some embodiments, protein subunits reassemble through engineered protein-protein binding partners, e.g., SpyTag and SpyCatcher (Zakeri et al. PNAS 109, E690-E697 (2012)).
- an additional domain described herein e.g., a Cas9 nickase
- the breaking up of a Gene Writer polypeptide into subunits may aid in delivery of the protein by keeping the nucleic acid encoding each part within optimal packaging limits of a viral delivery vector, e.g., AAV (Tornabene et al. Sci Transl Med 11, eaav4523 (2019)).
- the Gene Writer polypeptide is designed to be dimerized through the use of covalent or non-covalent interactions as described above.
- R2 retrotransposase requires its template to contain a minimal 3’ UTR region in order to initiate TPRT (Luan and Eickbush Mol Cell Biol 15, 3882-91 (1995)).
- the Gene Writer polypeptide is derived from a retrotransposase with a required binding motif and the template RNA is designed to contain said binding motif, such that there is specific retrotransposition of only the desired template (see, e.g., Example 22).
- the Gene Writer polypeptide is derived from a retrotransposon selected from Table 3 and the 3’ UTR on the RNA template comprises the 3’ UTR from the same retrotransposon in Table 3.
- L1 retrotransposase facilitates the movement of non-autonomous Alu and SVA elements in the human genome (Craig, Mobile DNA III, ASM, ed.3 (2105)).
- Recent studies have mapped various transposable elements present in the human genome, including non-LTR retrotransposons (Kojima Mobile DNA 9 (2018)).
- a Gene Writer does not recognize and mobilize transposable elements or pseudoelements.
- a Gene Writer polypeptide does not lead to the mobilization of any endogenous human DNA.
- a Gene Writer is derived from a retrotransposase that is not present in the human genome.
- a Gene Writer derived from a retrotransposase present in the human genome is engineered such that it recognizes heterologous sequences in the template RNA and no longer recognizes the natural UTRs of the parental retrotransposon, e.g., has a heterologous RNA binding domain that does not associate with the 3’ UTR present in the human genome.
- a Gene Writer comprises an RNA binding domain that does not recognize any sequences present in the human genome.
- a tunable system may comprise at least one effector module that is responsive to at least one stimulus.
- the system may be, but is not limited to, a destabilizing domain (DD) system.
- DD destabilizing domain
- the tunable system may comprise a first effector module.
- the effector module may comprise a first stimulus response element (SRE) operably linked to at least one payload.
- the payload may be an immunotherapeutic agent.
- the first SRE of the composition may be responsive to or interact with at least one stimulus.
- the first SRE may comprise a destabilizing domain (DD).
- the DD may be derived from a parent protein or from a mutant protein having one, two, three, or more amino acid mutations compared to the parent protein.
- the parent protein may be selected from, but is not limited to, human protein FKBP, comprising the amino acid sequence of SEQ. ID NO.3 of PCT/US2018/020704, incorporated herein by reference in its entirety; human DHFR (hDHFR), comprising the amino acid sequence of SEQ. ID NO.2 of PCT/US2018/020704, incorporated herein by reference in its entirety; E. coli DHFR, comprising the amino acid sequence of SEQ.
- the tunable controls are applied to the Gene Writer polypeptide, such that, e.g., a DD and stimulus can be used to modulate template integration efficiency.
- the tunable controls are applied to one or more peptides encoded within the heterologous object sequence of the template, such that, e.g., a DD and stimulus can be used to modulate activity of a genomically integrated payload.
- the payload comprising the DD may be a therapeutic protein, e.g., a functional copy of an endogenously mutated gene.
- the payload comprising the DD may be a heterologous protein, e.g., a CAR.
- Gene WritersTM may be provided as either polypeptides, or nucleic acids encoding them.
- Nucleic acid features Elements of systems provided by the invention may be provided as nucleic acids, for example, a template nucleic acid (e.g., template RNA) as described, inter alia, in the claims and enumerated embodiments, as well as, in certain embodiments, a nucleic acid encoding a Gene WriterTM polypeptide (e.g., a retrotransposase).
- the nucleic acids are in operative association with additional genetic elements, such as tissue-specific expression-control sequence(s) (e.g., tissue-specific promoters and tissue-specific microRNA recognition sequences), as well as additional elements, such as inverted repeats (e.g., inverted terminal repeats, such as elements from or derived from viruses, e.g., AAV ITRs) and tandem repeats, homology regions (segments with various degrees of homology to a target DNA), UTRs (5’, 3’, or both 5’ and 3’ UTRs), and various combinations of the foregoing.
- tissue-specific expression-control sequence(s) e.g., tissue-specific promoters and tissue-specific microRNA recognition sequences
- additional elements such as inverted repeats (e.g., inverted terminal repeats, such as elements from or derived from viruses, e.g., AAV ITRs) and tandem repeats, homology regions (segments with various degrees of homology to a target DNA), UTRs
- tissue-specific expression-control sequence(s) refers to one or more of the sequences in: Table 3 of WO2020014209, incorporated herein by reference, omitting the last column thereof (SEQ ID NO reference); or Table 4 of WO2020014209, incorporated herein by reference, omitting the last column thereof (SEQ ID NO reference).
- a nucleic acid described herein comprises a promoter sequence, e.g., a tissue specific promoter.
- a tissue specific promoter is used to increase the target-cell specificity of a Gene WriterTM system.
- the promoter can be chosen on the basis that it is active in a target cell type but not active in (or active at a lower level in) a non-target cell type.
- a system having a tissue-specific promoter sequence in the retrotransposase DNA may also be used in combination with a microRNA binding site, e.g., encoded in the retrotransposase DNA, e.g., as described herein.
- a system having a tissue-specific promoter sequence in the retrotransposase DNA may also be used in combination with an RNA template containing a heterologous object sequence driven by a tissue-specific promoter, e.g., to achieve higher levels of integration and heterologous object sequence expression in target cells than in non-target cells.
- a nucleic acid described herein e.g., an RNA encoding a Gene WriterTM polypeptide, or a DNA encoding the RNA, or a template nucleic acid
- the microRNA binding site is used to increase the target-cell specificity of a Gene WriterTM system.
- the microRNA binding site can be chosen on the basis that it is recognized by a miRNA that is present in a non-target cell type, but that is not present (or is present at a reduced level relative to the non-target cell) in a target cell type.
- a miRNA that is present in a non-target cell type
- the miRNA it would be bound by the miRNA
- the RNA encoding the Gene WriterTM polypeptide it would not be bound by the miRNA (or bound but at reduced levels relative to the non-target cell).
- binding of the miRNA to the RNA encoding the Gene WriterTM polypeptide may reduce production of the Gene WriterTM polypeptide, e.g., by degrading the mRNA encoding the polypeptide or by interfering with translation. Accordingly, the heterologous object sequence would be inserted into the genome of target cells more efficiently than into the genome of non- target cells.
- a system having a microRNA binding site in the RNA encoding the Gene WriterTM polypeptide (or encoded in the DNA encoding the RNA) may also be used in combination with a template RNA regulated by a second microRNA binding site, e.g., as described herein in the section entitled “Template component of Gene WriterTM gene editor system.”
- a nucleic acid component of a system provided by the invention a sequence (e.g., retrotransposase or a heterologous object sequence) is flanked by untranslated regions (UTRs) that modify protein expression levels (sometimes referred to as UTRexp) ( Figures 11 and 15, Example 6).
- UTRs untranslated regions
- Figures 11 and 15, Example 6 The effects of various 5’ and 3’ UTRs on protein expression are known in the art.
- the coding sequence may be preceded by a 5’ UTR that modifies RNA stability or protein translation.
- the sequence may be followed by a 3’ UTR that modifies RNA stability or translation.
- the sequence may be preceded by a 5’ UTR and followed by a 3’ UTR that modify RNA stability or translation.
- the 5’ and/or 3’ UTR may be selected from the 5’ and 3’ UTRs of complement factor 3 (C3) (cactcctccccatcctctcctctgtccctctctgaccctgcactgtcccagcacc(SEQ ID NO: 1606)) or orosomucoid 1 (ORM1) (caggacacagccttggatcaggacagagacttgggggccatcctgcccctccaacccgacatgtgtacctcagctttttccctcacttgcat caataaagcttctgtgttggaacagctaa(SEQ ID NO: 1607)) (Asrani et al.
- C3 complement factor 3
- ORM1 or orosomucoid 1
- the 5’ UTR is the 5’ UTR from C3 and the 3’ UTR is the 3’ UTR from ORM1.
- a 5’ UTR and 3’ UTR for protein expression e.g., mRNA (or DNA encoding the RNA) for a Gene Writer polypeptide or heterologous object sequence, comprise optimized expression sequences.
- the 5’ UTR comprises NO: 1608)and/or the 3’ UTR comprising (SEQ ID NO: 1609), e.g., as described in Richner et al. Cell 168(6): P1114-1125 (2017), the sequences of which are incorporated herein by reference.
- a 5’ and/or 3’ UTR may be selected to enhance protein expression. In some embodiments, a 5’ and/or 3’ UTR may be selected to modify protein expression such that overproduction inhibition is minimized. In some embodiments, UTRs are around a coding sequence, e.g., outside the coding sequence and in other embodiments proximal to the coding sequence, In some embodiments additional regulatory elements (e.g., miRNA binding sites, cis-regulatory sites) are included in the UTRs.
- additional regulatory elements e.g., miRNA binding sites, cis-regulatory sites
- an open reading frame (ORF) of a Gene Writer system e.g., an ORF of an mRNA (or DNA encoding an mRNA) encoding a Gene Writer polypeptide or one or more ORFs of an mRNA (or DNA encoding an mRNA) of a heterologous object sequence, is flanked by a 5’ and/or 3’ untranslated region (UTR) that enhances the expression thereof.
- the 5’ UTR of an mRNA component (or transcript produced from a DNA component) of the system comprises the sequence 5’- ( Q NO: 1610).
- the 3’ UTR of an mRNA component (or transcript produced from a DNA component) of the system comprises the sequence 5’- GCCCCUCCUCCCCUUCCUGC CCCGU CCCCCGUGGUCUUUG U GUCUG 3’(SEQ ID NO: 1611).
- This combination of 5’ UTR and 3’ UTR has been shown to result in desirable expression of an operably linked ORF by Richner et al. Cell 168(6): P1114-1125 (2017), the teachings and sequences of which are incorporated herein by reference.
- a system described herein comprises a DNA encoding a transcript, wherein the DNA comprises the corresponding 5’ UTR and 3’ UTR sequences, with T substituting for U in the above-listed sequence).
- a DNA vector used to produce an RNA component of the system further comprises a promoter upstream of the 5’ UTR for initiating in vitro transcription, e.g, a T7, T3, or SP6 promoter.
- the 5’ UTR above begins with GGG, which is a suitable start for optimizing transcription using T7 RNA polymerase.
- GGG is a suitable start for optimizing transcription using T7 RNA polymerase.
- Circular RNAs in Gene Writing Systems have been found to occur naturally in cells and have been found to have diverse functions, including both non-coding and protein coding roles in human cells. It has been shown that a circRNA can be engineered by incorporating a self-splicing intron into an RNA molecule (or DNA encoding the RNA molecule) that results in circularization of the RNA, and that an engineered circRNA can have enhanced protein production and stability (Wesselhoeft et al. Nature Communications 2018). It is contemplated that it may be useful to employ circular and/or linear RNA states during the formulation, delivery, or Gene Writing reaction within the target cell.
- a Gene Writing system comprises one or more circular RNAs (circRNAs).
- a Gene Writing system comprises one or more linear RNAs.
- a nucleic acid as described herein e.g., a nucleic acid molecule encoding a Gene Writer polypeptide, or both
- a circular RNA molecule encodes the Gene WriterTM polypeptide.
- the circRNA molecule encoding the Gene WriterTM polypeptide is delivered to a host cell.
- a circular RNA molecule encodes a recombinase, e.g., as described herein.
- the circRNA molecule encoding the recombinase is delivered to a host cell.
- the circRNA molecule encoding the Gene Writer polypeptide is linearized (e.g., in the host cell) prior to translation.
- nucleic acid e.g., encoding a Gene Writer polypeptide, or a template RNA, or both
- the circRNA comprises one or more ribozyme sequences.
- the ribozyme sequence is activated for autocleavage, e.g., in a host cell, e.g., thereby resulting in linearization of the circRNA.
- the ribozyme is activated when the concentration of magnesium reaches a sufficient level for cleavage, e.g., in a host cell.
- the circRNA is maintained in a low magnesium environment prior to delivery to the host cell.
- the ribozyme is a protein-responsive ribozyme.
- the ribozyme is a nucleic acid-responsive ribozyme.
- the circRNA is linearized in the nucleus of a target cell.
- linearization of a circRNA in the nucleus of a cell involves components present in the nucleus of the cell, e.g., to activate a cleavage event.
- the B2 and ALU retrotransposons contain self-cleaving ribozymes whose activity is enhanced by interaction with the Polycomb protein, EZH2 (Hernandez et al. PNAS 117(1):415-425 (2020)).
- a ribozyme e.g., a ribozyme from a B2 or ALU element, that is responsive to a nuclear element, e.g., a nuclear protein, e.g., a genome-interacting protein, e.g., an epigenetic modifier, e.g., EZH2, is incorporated into a circRNA, e.g., of a Gene Writing system.
- nuclear localization of the circRNA results in an increase in autocatalytic activity of the ribozyme and linearization of the circRNA.
- an inducible ribozyme (e.g., in a circRNA as described herein) is created synthetically, for example, by utilizing a protein ligand-responsive aptamer design.
- a system for utilizing the satellite RNA of tobacco ringspot virus hammerhead ribozyme with an MS2 coat protein aptamer has been described (Kennedy et al. Nucleic Acids Res 42(19):12306- 12321 (2014), incorporated herein by reference in its entirety) that results in activation of the ribozyme activity in the presence of the MS2 coat protein.
- such a system responds to protein ligand localized to the cytoplasm or the nucleus.
- the protein ligand is not MS2.
- Methods for generating RNA aptamers to target ligands have been described, for example, based on the systematic evolution of ligands by exponential enrichment (SELEX) (Tuerk and Gold, Science 249(4968):505-510 (1990); Ellington and Szostak, Nature 346(6287):818-822 (1990); the methods of each of which are incorporated herein by reference) and have, in some instances, been aided by in silico design (Bell et al. PNAS 117(15):8486- 8493, the methods of which are incorporated herein by reference).
- an aptamer for a target ligand is generated and incorporated into a synthetic ribozyme system, e.g., to trigger ribozyme-mediated cleavage and circRNA linearization, e.g., in the presence of the protein ligand.
- circRNA linearization is triggered in the cytoplasm, e.g., using an aptamer that associates with a ligand in the cytoplasm.
- circRNA linearization is triggered in the nucleus, e.g., using an aptamer that associates with a ligand in the nucleus.
- the ligand in to the nucleus comprises an epigenetic modifier or a transcription factor.
- the ligand that triggers linearization is present at higher levels in on-target cells than off-target cells.
- a nucleic acid-responsive ribozyme system can be employed for circRNA linearization.
- biosensors that sense defined target nucleic acid molecules to trigger ribozyme activation are described, e.g., in Penchovsky (Biotechnology Advances 32(5):1015-1027 (2014), incorporated herein by reference).
- a ribozyme naturally folds into an inactive state and is only activated in the presence of a defined target nucleic acid molecule (e.g., an RNA molecule).
- a circRNA of a Gene Writing system comprises a nucleic acid-responsive ribozyme that is activated in the presence of a defined target nucleic acid, e.g., an RNA, e.g., an mRNA, miRNA, guide RNA, gRNA, sgRNA, ncRNA, lncRNA, tRNA, snRNA, or mtRNA.
- the nucleic acid that triggers linearization is present at higher levels in on-target cells than off-target cells.
- a Gene Writing system incorporates one or more ribozymes with inducible specificity to a target tissue or target cell of interest, e.g., a ribozyme that is activated by a ligand or nucleic acid present at higher levels in a target tissue or target cell of interest.
- the Gene Writing system incorporates a ribozyme with inducible specificity to a subcellular compartment, e.g., the nucleus, nucleolus, cytoplasm, or mitochondria.
- an RNA component of a Gene Writing system is provided as circRNA, e.g., that is activated by linearization.
- linearization of a circRNA encoding a Gene Writing polypeptide activates the molecule for translation.
- a signal that activates a circRNA component of a Gene Writing system is present at higher levels in on-target cells or tissues, e.g., such that the system is specifically activated in these cells.
- an RNA component of a Gene Writing system is provided as a circRNA that is inactivated by linearization.
- a circRNA encoding the Gene Writer polypeptide is inactivated by cleavage and degradation.
- a circRNA encoding the Gene Writing polypeptide is inactivated by cleavage that separates a translation signal from the coding sequence of the polypeptide.
- a signal that inactivates a circRNA component of a Gene Writing system is present at higher levels in off- target cells or tissues, such that the system is specifically inactivated in these cells.
- nucleic acid e.g., encoding a Gene Writer polypeptide, or encoding a template RNA, or both
- delivered to cells is covalently closed linear DNA, or so- called “doggybone” DNA.
- the bacteriophage N15 employs protelomerase to convert its genome from circular plasmid DNA to a linear plasmid DNA (Ravin et al. J Mol Biol 2001). This process has been adapted for the production of covalently closed linear DNA in vitro (see, for example, WO2010086626A1).
- a protelomerase is contacted with a DNA containing one or more protelomerase recognition sites, wherein protelomerase results in a cut at the one or more sites and subsequent ligation of the complementary strands of DNA, resulting in the covalent linkage between the complementary strands.
- nucleic acid e.g., encoding a Gene Writer polypeptide, or encoding a template RNA, or both
- nucleic acid is first generated as circular plasmid DNA containing a single protelomerase recognition site that is then contacted with protelomerase to yield a covalently closed linear DNA.
- nucleic acid e.g., encoding a transposase, or encoding a template RNA, or both
- flanked by protelomerase recognition sites on plasmid or linear DNA is contacted with protelomerase to generate a covalently closed linear DNA containing only the DNA contained between the protelomerase recognition sites.
- the approach of flanking the desired nucleic acid sequence by protelomerase recognition sites results in covalently closed circular DNA lacking plasmid elements used for bacterial cloning and maintenance.
- the plasmid or linear DNA containing the nucleic acid and one or more protelomerase recognition sites is optionally amplified prior to the protelomerase reaction, e.g., by rolling circle amplification or PCR.
- nucleic acid e.g., encoding a Gene Writer polypeptide, or encoding a template RNA, or both
- delivered to cells is closed-ended, linear duplex DNA (CELiD DNA or ceDNA).
- ceDNA is derived from the replicative form of the AAV genome (Li et al. PLoS One 2013).
- the nucleic acid e.g., encoding a Gene Writer polypeptide, or encoding a template RNA, or both
- ITRs e.g., AAV ITRs, wherein at least one of the ITRs comprises a terminal resolution site and a replication protein binding site (sometimes referred to as a replicative protein binding site).
- the ITRs are derived from an adeno-associated virus, e.g., AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, or a combination thereof.
- the ITRs are symmetric.
- the ITRs are asymmetric.
- at least one Rep protein is provided to enable replication of the construct.
- the at least one Rep protein is derived from an adeno-associated virus, e.g., AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, or a combination thereof.
- ceDNA is generated by providing a production cell with (i) DNA flanked by ITRs, e.g., AAV ITRs, and (ii) components required for ITR-dependent replication, e.g., AAV proteins Rep78 and Rep52 (or nucleic acid encoding the proteins).
- ceDNA is free of any capsid protein, e.g., is not packaged into an infectious AAV particle.
- ceDNA is formulated into LNPs (see, for example, WO2019051289A1).
- the ceDNA vector consists of two self-complementary sequences, e.g., asymmetrical or symmetrical or substantially symmetrical ITRs as defined herein, flanking said expression cassette, wherein the ceDNA vector is not associated with a capsid protein.
- the ceDNA vector comprises two self-complementary sequences found in an AAV genome, where at least one ITR comprises an operative Rep-binding element (RBE) (also sometimes referred to herein as “RBS”) and a terminal resolution site (trs) of AAV or a functional variant of the RBE.
- RBE operative Rep-binding element
- trs terminal resolution site
- nucleic acid e.g., encoding a Gene Writer polypeptide, or encoding a template RNA, or both
- nucleic acid delivered to cells is designed as minicircles, where plasmid backbone sequences not pertaining to Gene WritingTM are removed before administration to cells. Minicircles have been shown to result in higher transfection efficiencies and gene expression as compared to plasmids with backbones containing bacterial parts (e.g., bacterial origin of replication, antibiotic selection cassette) and have been used to improve the efficiency of transposition (Sharma et al Mol Ther Nucleic Acids 2013).
- the DNA vector encoding the Gene WriterTM polypeptide is delivered as a minicircle.
- the DNA vector encoding the Gene WriterTM template is delivered as a minicircle.
- the bacterial parts are flanked by recombination sites, e.g., attP/attB, loxP, FRT sites.
- the addition of a cognate recombinase results in intramolecular recombination and excision of the bacterial parts.
- the recombinase sites are recognized by phiC31 recombinase.
- the recombinase sites are recognized by Cre recombinase.
- the recombinase sites are recognized by FLP recombinase.
- minicircles can be generated by excising the desired construct, e.g., Gene Writer polypeptide expression cassette or template RNA expression cassette, from a viral backbone.
- desired construct e.g., Gene Writer polypeptide expression cassette or template RNA expression cassette
- minicircles are first formulated and then delivered to target cells.
- minicircles are formed from a DNA vector (e.g., plasmid DNA, rAAV, scAAV, ceDNA, doggybone DNA) intracellularly by co-delivery of a recombinase, resulting in excision and circularization of the recombinase recognition site-flanked nucleic acid, e.g., a nucleic acid encoding the Gene WriterTM polypeptide, or encoding an RNA template, or both.
- a DNA vector e.g., plasmid DNA, rAAV, scAAV, ceDNA, doggybone DNA
- Viral vectors and components thereof Viruses are a useful source of delivery vehicles for the systems described herein, in addition to a source of relevant enzymes or domains as described herein, e.g., as sources of polymerases and polymerase functions used herein, e.g., DNA-dependent DNA polymerase, RNA-dependent RNA polymerase, RNA-dependent DNA polymerase, DNA-dependent RNA polymerase, reverse transcriptase.
- Some enzymes, e.g., reverse transcriptases may have multiple activities, e.g., be capable of both RNA-dependent DNA polymerization and DNA-dependent DNA polymerization, e.g., first and second strand synthesis.
- the virus used as a Gene Writer delivery system or a source of components thereof may be selected from a group as described by Baltimore Bacteriol Rev 35(3):235-241 (1971).
- the virus is selected from a Group I virus, e.g., is a DNA virus and packages dsDNA into virions.
- the Group I virus is selected from, e.g., Adenoviruses, Herpesviruses, Poxviruses.
- the virus is selected from a Group II virus, e.g., is a DNA virus and packages ssDNA into virions.
- the Group II virus is selected from, e.g., Parvoviruses.
- the parvovirus is a dependoparvovirus, e.g., an adeno- associated virus (AAV).
- the virus is selected from a Group III virus, e.g., is an RNA virus and packages dsRNA into virions.
- the Group III virus is selected from, e.g., Reoviruses.
- one or both strands of the dsRNA contained in such virions is a coding molecule able to serve directly as mRNA upon transduction into a host cell, e.g., can be directly translated into protein upon transduction into a host cell without requiring any intervening nucleic acid replication or polymerization steps.
- the virus is selected from a Group IV virus, e.g., is an RNA virus and packages ssRNA(+) into virions.
- the Group IV virus is selected from, e.g., Coronaviruses, Picornaviruses, Togaviruses.
- the ssRNA(+) contained in such virions is a coding molecule able to serve directly as mRNA upon transduction into a host cell, e.g., can be directly translated into protein upon transduction into a host cell without requiring any intervening nucleic acid replication or polymerization steps.
- the virus is selected from a Group V virus, e.g., is an RNA virus and packages ssRNA(-) into virions.
- the Group V virus is selected from, e.g., Orthomyxoviruses, Rhabdoviruses.
- an RNA virus with an ssRNA(-) genome also carries an enzyme inside the virion that is transduced to host cells with the viral genome, e.g., an RNA-dependent RNA polymerase, capable of copying the ssRNA(-) into ssRNA(+) that can be translated directly by the host.
- the virus is selected from a Group VI virus, e.g., is a retrovirus and packages ssRNA(+) into virions.
- the Group VI virus is selected from, e.g., Retroviruses.
- the retrovirus is a lentivirus, e.g., HIV-1, HIV-2, SIV, BIV.
- the retrovirus is a spumavirus, e.g., a foamy virus, e.g., HFV, SFV, BFV.
- the ssRNA(+) contained in such virions is a coding molecule able to serve directly as mRNA upon transduction into a host cell, e.g., can be directly translated into protein upon transduction into a host cell without requiring any intervening nucleic acid replication or polymerization steps.
- the ssRNA(+) is first reverse transcribed and copied to generate a dsDNA genome intermediate from which mRNA can be transcribed in the host cell.
- an RNA virus with an ssRNA(+) genome also carries an enzyme inside the virion that is transduced to host cells with the viral genome, e.g., an RNA-dependent DNA polymerase, capable of copying the ssRNA(+) into dsDNA that can be transcribed into mRNA and translated by the host.
- the reverse transcriptase from a Group VI retrovirus is incorporated as the reverse transcriptase domain of a Gene Writer polypeptide.
- the virus is selected from a Group VII virus, e.g., is a retrovirus and packages dsRNA into virions.
- the Group VII virus is selected from, e.g., Hepadnaviruses.
- one or both strands of the dsRNA contained in such virions is a coding molecule able to serve directly as mRNA upon transduction into a host cell, e.g., can be directly translated into protein upon transduction into a host cell without requiring any intervening nucleic acid replication or polymerization steps.
- one or both strands of the dsRNA contained in such virions is first reverse transcribed and copied to generate a dsDNA genome intermediate from which mRNA can be transcribed in the host cell.
- an RNA virus with a dsRNA genome also carries an enzyme inside the virion that is transduced to host cells with the viral genome, e.g., an RNA-dependent DNA polymerase, capable of copying the dsRNA into dsDNA that can be transcribed into mRNA and translated by the host.
- the reverse transcriptase from a Group VII retrovirus is incorporated as the reverse transcriptase domain of a Gene Writer polypeptide.
- virions used to deliver nucleic acid in this invention may also carry enzymes involved in the process of Gene Writing.
- a retroviral virion may contain a reverse transcriptase domain that is delivered into a host cell along with the nucleic acid.
- an RNA template may be associated with a Gene Writer polypeptide within a virion, such that both are co-delivered to a target cell upon transduction of the nucleic acid from the viral particle.
- the nucleic acid in a virion may comprise DNA, e.g., linear ssDNA, linear dsDNA, circular ssDNA, circular dsDNA, minicircle DNA, dbDNA, ceDNA.
- the nucleic acid in a virion may comprise RNA, e.g., linear ssRNA, linear dsRNA, circular ssRNA, circular dsRNA.
- a viral genome may circularize upon transduction into a host cell, e.g., a linear ssRNA molecule may undergo a covalent linkage to form a circular ssRNA, a linear dsRNA molecule may undergo a covalent linkage to form a circular dsRNA or one or more circular ssRNA.
- a viral genome may replicate by rolling circle replication in a host cell.
- a viral genome may comprise a single nucleic acid molecule, e.g., comprise a non- segmented genome.
- a viral genome may comprise two or more nucleic acid molecules, e.g., comprise a segmented genome.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Organic Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Biochemistry (AREA)
- Microbiology (AREA)
- Medicinal Chemistry (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Crystallography & Structural Chemistry (AREA)
- Plant Pathology (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
- Cosmetics (AREA)
- Toys (AREA)
- Agricultural Chemicals And Associated Chemicals (AREA)
- Enzymes And Modification Thereof (AREA)
- Breeding Of Plants And Reproduction By Means Of Culturing (AREA)
Priority Applications (7)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2022552816A JP2023516692A (ja) | 2020-03-04 | 2021-03-04 | ゲノムを調節するための方法及び組成物 |
| CA3174537A CA3174537A1 (en) | 2020-03-04 | 2021-03-04 | Methods and compositions for modulating a genome |
| AU2021232005A AU2021232005A1 (en) | 2020-03-04 | 2021-03-04 | Methods and compositions for modulating a genome |
| EP21764112.5A EP4114940A4 (en) | 2020-03-04 | 2021-03-04 | Methods and compositions for modulating a genome |
| BR112022017713A BR112022017713A2 (pt) | 2020-03-04 | 2021-03-04 | Métodos e composições para modular um genoma |
| CN202180033116.3A CN116490610A (zh) | 2020-03-04 | 2021-03-04 | 调控基因组的方法和组合物 |
| US17/929,124 US20230242899A1 (en) | 2020-03-04 | 2022-09-01 | Methods and compositions for modulating a genome |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202062985291P | 2020-03-04 | 2020-03-04 | |
| US62/985,291 | 2020-03-04 | ||
| US202063035638P | 2020-06-05 | 2020-06-05 | |
| US63/035,638 | 2020-06-05 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/929,124 Continuation US20230242899A1 (en) | 2020-03-04 | 2022-09-01 | Methods and compositions for modulating a genome |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021178709A1 true WO2021178709A1 (en) | 2021-09-10 |
Family
ID=77612784
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2021/020933 Ceased WO2021178709A1 (en) | 2020-03-04 | 2021-03-04 | Methods and compositions for modulating a genome |
Country Status (7)
| Country | Link |
|---|---|
| US (1) | US20230242899A1 (https=) |
| EP (1) | EP4114940A4 (https=) |
| JP (1) | JP2023516692A (https=) |
| AU (1) | AU2021232005A1 (https=) |
| BR (1) | BR112022017713A2 (https=) |
| CA (1) | CA3174537A1 (https=) |
| WO (1) | WO2021178709A1 (https=) |
Cited By (35)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11447770B1 (en) | 2019-03-19 | 2022-09-20 | The Broad Institute, Inc. | Methods and compositions for prime editing nucleotide sequences |
| US11560566B2 (en) | 2017-05-12 | 2023-01-24 | President And Fellows Of Harvard College | Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation |
| WO2023064935A1 (en) * | 2021-10-15 | 2023-04-20 | Codexis, Inc. | Recombinant reverse transcriptase variants |
| WO2023069972A1 (en) * | 2021-10-19 | 2023-04-27 | Massachusetts Institute Of Technology | Genomic editing with site-specific retrotransposons |
| US11672874B2 (en) | 2019-09-03 | 2023-06-13 | Myeloid Therapeutics, Inc. | Methods and compositions for genomic integration |
| WO2023091987A3 (en) * | 2021-11-19 | 2023-06-15 | Emendobio Inc. | Novel omni crispr nucleases |
| WO2023141602A3 (en) * | 2022-01-21 | 2023-11-02 | Renagade Therapeutics Management Inc. | Engineered retrons and methods of use |
| EP4056705A4 (en) * | 2019-11-11 | 2023-12-27 | Joint Stock Company "Biocad" | USE OF CAS9 PROTEIN FROM THE BACTERIA PASTEURELLA PNEUMOTROPICA |
| US11866728B2 (en) | 2022-01-21 | 2024-01-09 | Renagade Therapeutics Management Inc. | Engineered retrons and methods of use |
| US11898179B2 (en) | 2017-03-09 | 2024-02-13 | President And Fellows Of Harvard College | Suppression of pain by gene editing |
| US11912985B2 (en) | 2020-05-08 | 2024-02-27 | The Broad Institute, Inc. | Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence |
| WO2024044723A1 (en) * | 2022-08-25 | 2024-02-29 | Renagade Therapeutics Management Inc. | Engineered retrons and methods of use |
| US11932884B2 (en) | 2017-08-30 | 2024-03-19 | President And Fellows Of Harvard College | High efficiency base editors comprising Gam |
| WO2024116156A1 (en) * | 2022-12-02 | 2024-06-06 | Exsilio Therapeutics Ltd | Recombinant proteins comprising non-ltr retrotransposon-derived polypeptides for gene delivery and insertion |
| WO2024086661A3 (en) * | 2022-10-19 | 2024-06-27 | Metagenomi, Inc. | Gene editing systems comprising reverse transcriptases |
| US12031129B2 (en) | 2018-08-28 | 2024-07-09 | Flagship Pioneering Innovations Vi, Llc | Methods and compositions for modulating a genome |
| US12037602B2 (en) | 2020-03-04 | 2024-07-16 | Flagship Pioneering Innovations Vi, Llc | Methods and compositions for modulating a genome |
| US12043852B2 (en) | 2015-10-23 | 2024-07-23 | President And Fellows Of Harvard College | Evolved Cas9 proteins for gene editing |
| WO2024220741A1 (en) | 2023-04-19 | 2024-10-24 | Flagship Pioneering Innovations Vii, Llc | Compositions and methods for the production of libraries |
| US12129495B2 (en) | 2021-10-15 | 2024-10-29 | Codexis, Inc. | Engineered DNA polymerase variants |
| WO2024192239A3 (en) * | 2023-03-15 | 2024-12-05 | Tessera Therapeutics, Inc. | Poly(a) tail sequences for use in methods and compositions for genome modulation |
| WO2024226784A3 (en) * | 2023-04-27 | 2024-12-12 | Genvivo, Inc. | Compositions and methods for therapeutic or vaccine delivery |
| WO2025006419A1 (en) | 2023-06-26 | 2025-01-02 | Flagship Pioneering Innovations Vii, Llc | Engineered plasmodia and related methods |
| US12281338B2 (en) | 2018-10-29 | 2025-04-22 | The Broad Institute, Inc. | Nucleobase editors comprising GeoCas9 and uses thereof |
| WO2025085519A1 (en) * | 2023-10-16 | 2025-04-24 | Typewriter Therapeutics, Inc. | R2 retrotransposons for gene writing |
| WO2025074310A3 (en) * | 2023-10-06 | 2025-05-15 | Exsilio Therapeutics Ltd | Engineered retrotransposable element proteins for gene delivery and insertion |
| US12303526B2 (en) | 2021-10-18 | 2025-05-20 | Flagship Pioneering Innovations Vii, Llc | DNA compositions and related methods |
| US12319925B2 (en) | 2021-05-11 | 2025-06-03 | Myeloid Therapeutics, Inc. | Methods and compositions for genomic integration |
| US12351837B2 (en) | 2019-01-23 | 2025-07-08 | The Broad Institute, Inc. | Supernegatively charged proteins and uses thereof |
| EP4347035A4 (en) * | 2021-06-01 | 2025-09-03 | Univ Massachusetts | CAS9 NICKING-MEDIATED GENE EDITING |
| US12435330B2 (en) | 2019-10-10 | 2025-10-07 | The Broad Institute, Inc. | Methods and compositions for prime editing RNA |
| EP4508210A4 (en) * | 2023-04-11 | 2025-10-29 | Beijing Astragenomics Tech Co Ltd | NON-LTR RETROTRANSPOSON SYSTEM AND ITS USE |
| US12473543B2 (en) | 2019-04-17 | 2025-11-18 | The Broad Institute, Inc. | Adenine base editors with reduced off-target effects |
| US12473573B2 (en) | 2013-09-06 | 2025-11-18 | President And Fellows Of Harvard College | Switchable Cas9 nucleases and uses thereof |
| US12529041B2 (en) | 2018-09-07 | 2026-01-20 | Beam Therapeutics Inc. | Compositions and methods for delivering a nucleobase editing system |
Families Citing this family (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116209756A (zh) | 2020-03-04 | 2023-06-02 | 旗舰先锋创新Vi有限责任公司 | 调控基因组的方法和组合物 |
| US20230203192A1 (en) * | 2020-05-20 | 2023-06-29 | Flagship Pioneering, Inc. | Compositions and methods for producing human polyclonal antibodies |
| AU2022343268A1 (en) | 2021-09-08 | 2024-03-28 | Flagship Pioneering Innovations Vi, Llc | Methods and compositions for modulating a genome |
| CA3231712A1 (en) | 2021-09-08 | 2023-03-16 | Flagship Pioneering Innovations Vi, Llc | Pah-modulating compositions and methods |
| WO2023108153A2 (en) | 2021-12-10 | 2023-06-15 | Flagship Pioneering Innovations Vi, Llc | Cftr-modulating compositions and methods |
| WO2024077267A1 (en) * | 2022-10-07 | 2024-04-11 | The Broad Institute, Inc. | Prime editing methods and compositions for treating triplet repeat disorders |
| WO2025059596A1 (en) * | 2023-09-13 | 2025-03-20 | Tessera Therapeutics, Inc. | Lipid nanoparticles for delivery of therapeutic payloads to cells |
| WO2025199534A1 (en) * | 2024-03-22 | 2025-09-25 | Typewriter Therapeutics, Inc. | New r2 retrotransposons for gene writing |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190134197A1 (en) * | 2016-05-27 | 2019-05-09 | Griffith University | Arthrogenic alphavirus vaccine |
| WO2020047124A1 (en) * | 2018-08-28 | 2020-03-05 | Flagship Pioneering, Inc. | Methods and compositions for modulating a genome |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2625189B1 (en) * | 2010-10-01 | 2018-06-27 | ModernaTX, Inc. | Engineered nucleic acids and methods of use thereof |
| EP3155116A4 (en) * | 2014-06-10 | 2017-12-27 | Massachusetts Institute Of Technology | Method for gene editing |
| CN115491373A (zh) * | 2015-10-30 | 2022-12-20 | 爱迪塔斯医药公司 | 治疗单纯疱疹病毒的crispr/cas相关方法及组合物 |
| JP2021530212A (ja) * | 2018-07-13 | 2021-11-11 | ザ リージェンツ オブ ザ ユニバーシティ オブ カリフォルニアThe Regents Of The University Of California | レトロトランスポゾンベースの送達媒体及びその使用方法 |
-
2021
- 2021-03-04 BR BR112022017713A patent/BR112022017713A2/pt unknown
- 2021-03-04 AU AU2021232005A patent/AU2021232005A1/en active Pending
- 2021-03-04 JP JP2022552816A patent/JP2023516692A/ja active Pending
- 2021-03-04 CA CA3174537A patent/CA3174537A1/en active Pending
- 2021-03-04 WO PCT/US2021/020933 patent/WO2021178709A1/en not_active Ceased
- 2021-03-04 EP EP21764112.5A patent/EP4114940A4/en active Pending
-
2022
- 2022-09-01 US US17/929,124 patent/US20230242899A1/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190134197A1 (en) * | 2016-05-27 | 2019-05-09 | Griffith University | Arthrogenic alphavirus vaccine |
| WO2020047124A1 (en) * | 2018-08-28 | 2020-03-05 | Flagship Pioneering, Inc. | Methods and compositions for modulating a genome |
Non-Patent Citations (5)
| Title |
|---|
| ANZALONE, AV ET AL.: "Search-and-replace genome editing without double-strand breaks or donor DNA", NATURE, vol. 576, no. 7785, December 2019 (2019-12-01), pages 149 - 157, XP036953141, DOI: 10.1038/s41586-019-1711-4 * |
| BOLUKBASI, MF ET AL.: "DNA-binding domain fusions enhance the targeting range and precision of Cas9", NATURE METHODS, vol. 12, no. 12, December 2015 (2015-12-01), pages 1150 - 1156, XP055382765, DOI: 10.1038/nmeth.3624 * |
| CHEN, B ET AL.: "Dynamic Imaging of Genomic Loci in Living Human Cells by an Optimized CRISPR/Cas System", CELL, vol. 155, no. 7, 19 December 2013 (2013-12-19), pages 1479 - 1491, XP028806611, DOI: 10.1016/j. cell . 2013.12.00 1 * |
| LOSE, F ET AL.: "Variation in the RAD51 gene and familial breast cancer", BREAST CANCER RESEARCH, vol. 8, no. 3, 8 June 2006 (2006-06-08), XP021020721, DOI: 10.1186/bcr1415 * |
| See also references of EP4114940A4 * |
Cited By (46)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12473573B2 (en) | 2013-09-06 | 2025-11-18 | President And Fellows Of Harvard College | Switchable Cas9 nucleases and uses thereof |
| US12043852B2 (en) | 2015-10-23 | 2024-07-23 | President And Fellows Of Harvard College | Evolved Cas9 proteins for gene editing |
| US12516308B2 (en) | 2017-03-09 | 2026-01-06 | President And Fellows Of Harvard College | Suppression of pain by gene editing |
| US11898179B2 (en) | 2017-03-09 | 2024-02-13 | President And Fellows Of Harvard College | Suppression of pain by gene editing |
| US11560566B2 (en) | 2017-05-12 | 2023-01-24 | President And Fellows Of Harvard College | Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation |
| US11932884B2 (en) | 2017-08-30 | 2024-03-19 | President And Fellows Of Harvard College | High efficiency base editors comprising Gam |
| US12398392B2 (en) | 2018-08-28 | 2025-08-26 | Flagship Pioneering Innovations Vi, Llc | Methods and compositions for modulating a genome |
| US12031129B2 (en) | 2018-08-28 | 2024-07-09 | Flagship Pioneering Innovations Vi, Llc | Methods and compositions for modulating a genome |
| US12529041B2 (en) | 2018-09-07 | 2026-01-20 | Beam Therapeutics Inc. | Compositions and methods for delivering a nucleobase editing system |
| US12281338B2 (en) | 2018-10-29 | 2025-04-22 | The Broad Institute, Inc. | Nucleobase editors comprising GeoCas9 and uses thereof |
| US12351837B2 (en) | 2019-01-23 | 2025-07-08 | The Broad Institute, Inc. | Supernegatively charged proteins and uses thereof |
| US12509680B2 (en) | 2019-03-19 | 2025-12-30 | The Broad Institute, Inc. | Methods and compositions for prime editing nucleotide sequences |
| US12570972B2 (en) | 2019-03-19 | 2026-03-10 | The Broad Institute, Inc. | Methods and compositions for prime editing nucleotide sequences |
| US11795452B2 (en) | 2019-03-19 | 2023-10-24 | The Broad Institute, Inc. | Methods and compositions for prime editing nucleotide sequences |
| US11643652B2 (en) | 2019-03-19 | 2023-05-09 | The Broad Institute, Inc. | Methods and compositions for prime editing nucleotide sequences |
| US11447770B1 (en) | 2019-03-19 | 2022-09-20 | The Broad Institute, Inc. | Methods and compositions for prime editing nucleotide sequences |
| US12281303B2 (en) | 2019-03-19 | 2025-04-22 | The Broad Institute, Inc. | Methods and compositions for prime editing nucleotide sequences |
| US12473543B2 (en) | 2019-04-17 | 2025-11-18 | The Broad Institute, Inc. | Adenine base editors with reduced off-target effects |
| US11672874B2 (en) | 2019-09-03 | 2023-06-13 | Myeloid Therapeutics, Inc. | Methods and compositions for genomic integration |
| US12435330B2 (en) | 2019-10-10 | 2025-10-07 | The Broad Institute, Inc. | Methods and compositions for prime editing RNA |
| EP4056705A4 (en) * | 2019-11-11 | 2023-12-27 | Joint Stock Company "Biocad" | USE OF CAS9 PROTEIN FROM THE BACTERIA PASTEURELLA PNEUMOTROPICA |
| US12065669B2 (en) | 2020-03-04 | 2024-08-20 | Flagship Pioneering Innovations Vi, Llc | Methods and compositions for modulating a genome |
| US12565666B2 (en) | 2020-03-04 | 2026-03-03 | Flagship Pioneering Innovations Vi, Llc | Methods and compositions for modulating a genome |
| US12037602B2 (en) | 2020-03-04 | 2024-07-16 | Flagship Pioneering Innovations Vi, Llc | Methods and compositions for modulating a genome |
| US12031126B2 (en) | 2020-05-08 | 2024-07-09 | The Broad Institute, Inc. | Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence |
| US11912985B2 (en) | 2020-05-08 | 2024-02-27 | The Broad Institute, Inc. | Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence |
| US12319925B2 (en) | 2021-05-11 | 2025-06-03 | Myeloid Therapeutics, Inc. | Methods and compositions for genomic integration |
| EP4347035A4 (en) * | 2021-06-01 | 2025-09-03 | Univ Massachusetts | CAS9 NICKING-MEDIATED GENE EDITING |
| WO2023064935A1 (en) * | 2021-10-15 | 2023-04-20 | Codexis, Inc. | Recombinant reverse transcriptase variants |
| US12129495B2 (en) | 2021-10-15 | 2024-10-29 | Codexis, Inc. | Engineered DNA polymerase variants |
| US12303526B2 (en) | 2021-10-18 | 2025-05-20 | Flagship Pioneering Innovations Vii, Llc | DNA compositions and related methods |
| WO2023069972A1 (en) * | 2021-10-19 | 2023-04-27 | Massachusetts Institute Of Technology | Genomic editing with site-specific retrotransposons |
| WO2023091987A3 (en) * | 2021-11-19 | 2023-06-15 | Emendobio Inc. | Novel omni crispr nucleases |
| US11866728B2 (en) | 2022-01-21 | 2024-01-09 | Renagade Therapeutics Management Inc. | Engineered retrons and methods of use |
| US12054739B2 (en) | 2022-01-21 | 2024-08-06 | Renagade Therapeutics Management Inc. | Engineered retrons and methods of use |
| WO2023141602A3 (en) * | 2022-01-21 | 2023-11-02 | Renagade Therapeutics Management Inc. | Engineered retrons and methods of use |
| WO2024044723A1 (en) * | 2022-08-25 | 2024-02-29 | Renagade Therapeutics Management Inc. | Engineered retrons and methods of use |
| WO2024086661A3 (en) * | 2022-10-19 | 2024-06-27 | Metagenomi, Inc. | Gene editing systems comprising reverse transcriptases |
| WO2024116156A1 (en) * | 2022-12-02 | 2024-06-06 | Exsilio Therapeutics Ltd | Recombinant proteins comprising non-ltr retrotransposon-derived polypeptides for gene delivery and insertion |
| WO2024192239A3 (en) * | 2023-03-15 | 2024-12-05 | Tessera Therapeutics, Inc. | Poly(a) tail sequences for use in methods and compositions for genome modulation |
| EP4508210A4 (en) * | 2023-04-11 | 2025-10-29 | Beijing Astragenomics Tech Co Ltd | NON-LTR RETROTRANSPOSON SYSTEM AND ITS USE |
| WO2024220741A1 (en) | 2023-04-19 | 2024-10-24 | Flagship Pioneering Innovations Vii, Llc | Compositions and methods for the production of libraries |
| WO2024226784A3 (en) * | 2023-04-27 | 2024-12-12 | Genvivo, Inc. | Compositions and methods for therapeutic or vaccine delivery |
| WO2025006419A1 (en) | 2023-06-26 | 2025-01-02 | Flagship Pioneering Innovations Vii, Llc | Engineered plasmodia and related methods |
| WO2025074310A3 (en) * | 2023-10-06 | 2025-05-15 | Exsilio Therapeutics Ltd | Engineered retrotransposable element proteins for gene delivery and insertion |
| WO2025085519A1 (en) * | 2023-10-16 | 2025-04-24 | Typewriter Therapeutics, Inc. | R2 retrotransposons for gene writing |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4114940A1 (en) | 2023-01-11 |
| CA3174537A1 (en) | 2021-09-10 |
| EP4114940A4 (en) | 2024-09-04 |
| JP2023516692A (ja) | 2023-04-20 |
| US20230242899A1 (en) | 2023-08-03 |
| BR112022017713A2 (pt) | 2022-11-16 |
| AU2021232005A1 (en) | 2022-09-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2021178709A1 (en) | Methods and compositions for modulating a genome | |
| US12157898B2 (en) | Methods and compositions for modulating a genome | |
| US12065669B2 (en) | Methods and compositions for modulating a genome | |
| CN114040970B (zh) | 使用腺苷脱氨酶碱基编辑器编辑疾病相关基因的方法,包括遗传性疾病的治疗 | |
| US20230348939A1 (en) | Methods and compositions for modulating a genome | |
| US12544458B2 (en) | PAH-modulating compositions and methods | |
| AU2020221355A1 (en) | Compositions and methods for treating glycogen storage disease type 1a | |
| US20240200104A1 (en) | Ltr transposon compositions and methods | |
| CA3225808A1 (en) | Context-specific adenine base editors and uses thereof | |
| US20250312376A1 (en) | Compositions and methods for modulating a genome in t cells, induced pluripotent stem cells, and respiratory epithelial cells | |
| WO2022183210A1 (en) | Tissue-specific methods and compositions for modulating a genome | |
| CN116490610A (zh) | 调控基因组的方法和组合物 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21764112 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2022552816 Country of ref document: JP Kind code of ref document: A Ref document number: 3174537 Country of ref document: CA |
|
| REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112022017713 Country of ref document: BR |
|
| ENP | Entry into the national phase |
Ref document number: 2021232005 Country of ref document: AU Date of ref document: 20210304 Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202217056849 Country of ref document: IN |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2021764112 Country of ref document: EP Effective date: 20221004 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202180033116.3 Country of ref document: CN |
|
| ENP | Entry into the national phase |
Ref document number: 112022017713 Country of ref document: BR Kind code of ref document: A2 Effective date: 20220902 |